Search terminology

WebSphere Commerce Search uses a number of specialized terms. A concise list of the most commonly used terms is provided to get you started.

Introduction to Search terminology

Search terms usually refer to one of three different conceptual components of the product: the index, the runtime engine that processes requests, or the system architecture itself. The most commonly used terms for each component are provided in this topic.

Index terms

This index is a large flat table that contains data fields that are optimized for search performance. Search query strings are compared with entries in the index, and positive results are returned to the customer.

Row/Document
A set of data that describes a particular catalog object. For example, each row or document in the CatalogEntry core corresponds to a specific catalog entry.
Field
Rows or documents in the core are composed of fields, which hold specific information about the catalog object. For example, the name field is used to hold the name information for the category in a row (or document) from the CatalogGroup core.
Core
A Solr index that contains Solr documents for a specific purpose. Some of the commonly used cores are,
  • CatalogEntry core is used to store data about the catalog entries in the catalog.
  • CatalogGroup core is used to store data about the categories in the catalog.
  • Unstructured core is used to store attachment data for catalog entries in the catalog (images, PDF files, and other attachments).
  • Inventory core is used to store inventory data for the catalog entries in the catalog.
  • Price core is used to store price data for the catalog entries in the catalog.
Index
The index is composed of all of the search cores that are associated to a master catalog. Common indexes include:
  • MC_10001 is an index that contains a CatalogEntry, CatalogGroup, and Unstructured core.
  • MC_10101 is an index that contains a CatalogEntry, CatalogGroup, Unstructured, and Inventory core.
Full indexing
Rebuilding the entire index from scratch by using the manual indexing utilities such as di-preprocess, di-buildindex, or the UpdateSearchIndex scheduled job.
Delta indexing
Updating the current index with the changes that are captured in TI_DELTA_CATENTRY using the manual indexing utilities, such as di-preprocess, di-buildindex or the UpdateSearchIndex scheduled job.
Parallel preprocessing and distributed indexing (sharding)
A process for parallelizing indexing by breaking up the index into shards and indexing each shard at the same time (in separate threads). For example, consider a core that contains 2 million catalog entries. Rather than sequentially indexing all of the catalog entries in a single thread, the system can use parallel preprocessing and distributed indexing to split indexing across multiple shards. With 10 shards in use, each shard can index and store 200,000 catalog entries at the same time.
Crawler
Commerce utility for crawling unmanaged content for indexing into the unstructured index (ex. HTML files).
Extension index
A core that extends the CatalogEntry core to store specific data for the catalog entries. For example, the inventory index extends the CatalogEntry core to store inventory information for each catalog entry. Since this information is separated into a different core, you can rebuild this small core often and quickly. This core allows you to keep Inventory counts up-to-date, while indexing your potentially large CatalogEntry core once a day.

The runtime environment

The Search runtime consists of the Solr application and associated WebSphere Commerce utilities and processes.

Deep search sequencing
Sorting products for category navigation by using the product's sequence value, and the sequence value of its parent category.
Shallow sequencing
Sorting products for category navigation by using the product's sequence value.
Search profile
Abstraction of a specific search scenario, which is defined in wc-search.xml. The search profile contains the fields that are being searched, expression providers, query preprocessors and postprocessors to use, and other relevant information. For example, searching for products and retrieving a specific category returns different information and require searching for different data. In such a case, you should be using different search profiles for these scenarios. IBM_findCategoryByIdentifier is a search profile that can be used to retrieve category information based on a specific catgroup_id. You can use the IBM_findProductsBySearchTerm profile to retrieve product information based on a search term.
Expression provider
Used to modify the control parameters available for the search request. For example, if you want to override the sort that is being used for the search request, you can use an expression provider to modify the _wcf.search.sort control parameter. Expression providers allow modifications to control parameter values before they are read by query preprocessors and added to the query.
Query preprocessor
Used to modify the query before it is processed by WebSphere Commerce Search. For example, if you want to filter on catalog entries that have a manufacturer name, you can use a query preprocessor to add a query parameter like fq=mfName:*. Can use control parameters that are provided for the search request to add data to the query (for instance to add a sort parameter based on the value in the _wcf.search.sort control parameter).
Query postprocessor
Used to modify the query results before it is returned as the search response. A query postprocessor can be used to add products to the search response based on a particular condition (for instance, if a specific manufacturer exists in the search results).
Autosuggest
Type ahead function used in the search bar to complete your currently typed phrase with possible matches. For example, shir can match on shirt.
Spellcheck
Used when a search returns 0 (or only a few depending on your configuration) to figure out what the intended search was. For example, searching for "cofe" returns 0 results, but the spell check function suspects that you meant to search for "coffee" (which has many more matches). The result is returned in the "Did you mean..." section of the page.
Facets
Filters for reducing the search results to make them more relevant to the user's expectations. For example, a size facet can be used to display only those search results that are available in a particular size.
Descriptive attribute
Used to describe a catalog entry. For example, you can assign a t-shirt a descriptive attribute like material, with a value of cotton. Can be used as a facet if the attribute is considered facetable.
Defining attribute
Used to define a characteristic for a catalog entry. For example, you can assign a t-shirt a defining attribute like size, with a value of Large. Can be used as a facet if the attribute is considered facetable.
Search rule
Used to influence the ordering or content of a search based on specific triggers. For example, if a user searches for coffee, you can boost the relevancy of products that are made by manufacturer Coffee King.
Search term association
Used to modify or add search terms in the search query, or redirect the user to a specific page. Synonyms are used to add words to the search phrase (if X is searched for, also search for Y). Replacements are used to replace words in the search phrase (if X is searched for, instead search for Y). Landing Pages are used to direct the user to a specific page if a specific search term is in the search phrase. For instance, if X is searched for, redirect the user to page Y.
Search result grouping
Used to search across groups of catalog entries, returning the group representative when there is a match on any results in the group. By default, the group representatives are products and each group is made up of a product and its associated items. You can search against the product and its items, and return the product for display when there is a match on the product or any of its items.

Architecture

The architecture of the search product includes the major pieces of integrated software and hardware. Architecture includes the servers, data pipes that connect them, and the communication protocols they use.

REST-based search
Search requests are sent to the Search server as a REST URL. Most of the search scenario is processed on the Search server itself, and the search results are returned as a JSON response.
BOD-based search
Search requests are sent to the Commerce server as a BOD request (constructed through XML). Most of the search scenario is processed on the Commerce server, with the search query sent to the Search server for processing. The result is returned to the Commerce server as a BOD response (constructed through XML).
Standard configuration
The Search server is deployed locally to the Commerce server.
Advanced configuration
The Search server is federated and clustered, and managed by the deployment manager (DMGR).
Managed configuration
The Search server is federated and clustered, like the advanced configuration process. Solr template files (master, repeater, and subordinate) are managed by the deployment manager (DMGR), allowing these configuration files to be modified in one location and pushed across all the corresponding nodes.