Introduced in Feature Pack 2

WebSphere Commerce search best practices

Follow the best practices to ensure that your WebSphere Commerce search implementation works efficiently. Best practices include those for specific roles, such as site developers, administrators, and business users.

Site developers

  • Know your use cases. Based on your site's usage pattern, you can fine tune the search code path and payload to maximize your throughput. For more information, see Disabling search expression providers and result filters in the search configuration file (wc-search.xml).
  • Avoid programming against the database directly from the storefront. That is, the back office data store is the database, while the storefront data store is the search index.
  • Avoid declaring too many expression providers, query pre and postprocessors, or result filters to any search profile. That is, isolate each search profile by usage at the storefront and only include the required processing logic in each search profile. Unnecessary processing and filtering increases the overall search response time.
  • Group similar usages into one index field and assign the appropriate analyzers to them.
  • Use inheritance when reusing search profile properties.
  • Include only the necessary index fields for searching, and in the result set.
  • Avoid declaring too many facets for each search operation.
  • Use search expression providers for modifying search expressions.
  • Use search result filters for modifying Business Object Document (BOD) responses.
  • Feature Pack 7 or laterUse search query postprocessors for modifying REST responses.
  • Use pagination to reduce payload.
When caching:
  • Increase cache sizes to suit your production environment. That is, cache sizes are not optimized for production by default. Cache sizes should be as large as possible, offering a greater chance that searches are performed against cached search results (fq parameters), rather than new query results (q parameters). The more Solr can cache, the better.
  • Use fragment caching for each result displayed to increase the cache hit ratio across different search requests.
  • Use time-based cache invalidation for keyword search results to simplify cache policy management.
  • Cache invalidation can only be performed after re-indexing and after index replication has been completed.
  • Invalidating large amounts of cached content becomes difficult when timing index replication.

Site administrators

When setting up the production system network:
  • Configure WebSphere Commerce to use load balancing and failover, where code is installed on the primary node. A load balancer can be used on top of the web server for WebSphere Commerce and WebSphere Commerce search. Both a load balancer and web server can help route traffic for workload balancing. The WebSphere Commerce application on the primary node can be added to the WebSphere Commerce cluster to handle live traffic. The deployment manager (DMGR) is used to federate to managed nodes and the repeaters.
  • A true failover configuration should be used with a load balancer for WebSphere Commerce and WebSphere Commerce search. That is, when the CPU is pegged on any one of the search servers, the WebSphere Commerce threads could timeout as it waits for a response. This leads to timed out requests at the storefront when shoppers are accessing the site. Having a proper hardware load balancer on top of the web server reduces this risk, as it can better manage heartbeats and route accordingly. That is, in addition to load balancing, it also routes based on response times.
  • Use more than one repeater in the setup to provide failover support for index replication, in case one of the indexing repeaters become unavailable. That is, have changes pushed to all the repeaters and have one of them as the master. The master is then configured to replicate to all the Solr subordinate servers in production. When the master repeater becomes unavailable, the backup repeater can immediately take over for index replication.
  • A storage area network (SAN) is generally more resilient than direct access storage (DAS). When SAN is employed, it is possible to have all Solr subordinate servers mount to the same search index directory, therefore reducing the overall workload on the repeater for replication.
  • Although the repeater is optional, it is recommended to deploy at least one repeater to your production environment. The purpose of the repeater is to act as an index snapshot of what is currently in production, so that emergency fixes can be applied. Since the data in the staging index might contain non-production-ready changes, the staging index cannot be used for applying emergency fixes. It is not recommended to perform indexing directly on the Solr subordinate servers that are in production, as they are serving up storefront live traffic.
  • Do not create workspace Solr cores in the production environment, as this can significantly increase the overall amount of memory used on the search server.
For more information about the recommended configurations when implementing a search cluster, see WebSphere Commerce search server: Advanced configuration.
When administering the site:
  • Perform index optimization only when system usage is low.
  • Do not restart any of the SOLR search servers while you are performing replication. Doing so will cause the servers to lose synchronization, and require another restart to resynchronize them. If the servers are not synchronized each may return a different result for a given query.
  • Introduced in Feature Pack 3Use the UpdateSearchIndex scheduler task in production when synchronizing.
  • Introduced in Feature Pack 3Configure the UpdateSearchIndex scheduler task in production to run more frequently for delta updates and perform a clean full build less often when the system usage is low.
  • Introduced in Feature Pack 3Production search statistics can be moved or replicated into staging and can accumulate over time. Therefore, periodically archive old statistics data to achieve better response times in the Management Center Search Statistics tooling.
  • Introduced in Feature Pack 3Do not crawl into any catalog related pages. When crawling the WebSphere Commerce site, configure all URLs to be crawled into the StaticContentSitemap, which allows SEO-enabled URL tags.

Storage considerations

Consider the following storage factors:
  • Conserve storage space by cleaning up old or backed up search indexes that are no longer in use. For example, the solrhome/data directory might contain multiple time-stamped index directories after replication that can safely be removed, unless they are being used explicitly for backup purposes.

    The solrhome/data/index directory contains the active index files by default. Or, another location if otherwise specified in an solrhome/data/index.properties file.

  • Consider the following factors when using Storage Area Network (SAN) or Direct-attached storage (DAS) as your storage space:
    When considering SAN:
    Hardware failure
    Highly resilient with no single point of failure and unlikely to have any business impact.
    Network failure
    Highly resilient with no single point of failure and unlikely to have any business impact, assuming a proper clustering of search servers. The load balancer automatically routes traffic to the server that has sufficient capacity to handle full peak load.
    Scalability
    Provides much greater scalability than local disks.
    Performance
    Outperforms local disks in terms of Input/Output Operations Per Second (IOPS), given the fibre channel is used instead of NFS.
    Total Cost of Ownership
    Higher than DAS.
    Flexibility
    Moving LPARS, JVMs or indexes is relatively straightforward when using partition management software such as the Live Partition Mobility on AIX.
    Adding additional disks should not affect operation.
    When considering DAS:
    Hardware failure
    Disks are mirrored. Single disk failures should have no impact.
    Some server models might have the capability to hot-swap disks.
    Network failure
    Disks are local and unlikely impacted by network issues.
    Scalability
    Limited by the amount of hardware the physical server can hold.
    Performance
    Although it is likely easier to configure local disks, SAN on average should be relatively faster than local disk.
    Total Cost of Ownership
    Lower than SAN.
    Flexibility
    Moving LPARS, JVMs or indexes requires rebuilding.
    Adding disks might involve downtime to re-configure the RAID array.

Marketing business users

  • Use catalog filters to filter site content. For example, including or excluding content based on category, product, attribute, or property.
  • Use search rules for reordering products in the storefront.
  • Group all search rules with the same trigger into the same rule.
  • Use separate search rules when appropriate, for example, when there is a need to separately control or track the search rule.
  • When using store preview, use the search summary to debug search rules, and use the relevancy score of each search result to fine tune the boost factor.

Catalog business users

  • Use synonyms to increase search scope, while replacement terms can be used to reduce search scope.
  • Understand the scope of search administration, for example, stage propagation verses emergency fixes, to determine the business impact. This helps in determining when changes are available in production.
  • Avoid indexing other languages into the same index for a given locale.
  • Avoid loading a large number of Catalog objects into the workspace. Instead, load production-ready data to avoid indexing twice; once under the workspace, and another under the base schema upon approval. For more information, see mergeFactor.
  • When launching store preview and triggering reindexing, it is designed to be used to preview a small number of catalog changes. Depending on the size of your catalog, up to 10000 to 15000 changes might be previewed in approximately 30 minutes. It is recommended to load and preview a large number of changes in the base schema.
Feature Pack 5 or later

Considerations when selecting searchable and facetable attributes (Site administrators and Marketing business users)

The following workflow applies to attribute dictionary attributes that are marked as searchable or facetable in WebSphere Commerce search.
  • Marking Attribute dictionary attributes as searchable or facetable:
    • Feature Pack 5You must use the Management Center. Loading attributes as searchable or facetable using Dataload or catalog upload is not supported by default.
    • Feature Pack 6 or laterYou can use the Management Center. You can also load attributes as searchable or facetable using Dataload or catalog upload.
    The searchable and facetable flag changed in a workspaces environment is saved to the workflow content; not the approved content.
  • The searchable or facetable flag that is associated with the attribute dictionary attribute cannot be managed by workflow in the Management Center. Once the attribute is marked searchable or facetable in Management Center, the change is immediately committed into the approved content.
  • When the searchable or facetable flag is selected for an attribute dictionary attribute, it cannot be cleared. That is, links are created for it throughout the WebSphere Commerce database and search index. For reliability and consistency, these links remain intact despite clearing the searchable or facetable check boxes; once specified, you cannot change these settings.
  • When an attribute dictionary attribute is marked as searchable or facetable, business users can start working with them in workspaces, such as associating them with products, and are able to preview them in the storefront.
  • If necessary, IT administrators can follow the steps in Resetting searchable attributes to reset searchable and facetable attributes.