Crawling HCL Commerce site content

You can crawl HCL Commerce site content in starter stores using the REST API.

Before you begin

Ensure that the Search server is started, and that you have completed the following tasks.
  • HCL Commerce DeveloperBuild the HCL Commerce Search index.
  • Important: Ensure that you configure the site content crawler configuration files for your site:
    • droidConfig.xml
    • filters.txt
    For more information, see Site content crawler configuration.
    Note: If you are crawling content in a clustered topology, the crawler must be run from a staging environment. That is, crawling should not be performed in a production environment. If the production content must be crawled, configure the crawler to visit the production site rather than running directly from production environment. This method simplifies the setup by restricting the crawler to run in an HCL Commerce staging environment and update the index in the repeater.

Procedure

  1. You can run the utility from the following URL on the HCL Commerce Search server:
    http://searchHost:port/search/admin/resources/crawler?action=start&langId=langId&storeId=storeId&catalogId=catalogId
    Where the method is GET and authentication is spiuser. action is the action that the crawler should perform. The possible values are:
    start
    Starts the crawler.
    Required parameters:
    langId
    Language identifier that you want to use in building an unstructured index. For example, langId=-1.
    storeId
    The store ID that you want to use in building an unstructured index. For example, storeId=10501.
    catalogId
    The catalog identifier that you want to use in building an unstructured index. For example, catalogId=10001.
    status
    Shows the crawler status.
    stop
    Stops the crawler.
  2. Ensure that the utility runs successfully.
    Running the utility with all the parameters involves the following factors:
    • Crawling and downloading the crawled pages in HTML format into the destination directory.
    • Updating the database with the created manifest.txt file.
    • Invoking the indexer.
    Each of the these tasks status messages is reported separately.
    Depending on the passed parameters, you can check that the utility runs successfully by:
    1. Verifying that the crawled pages are downloaded into the destination directory, searchServerPath\search\index\crawler\cache\date\number, where date is the date that the utility was run, and number is the number of runs on that date, starting with 1.
    2. Verifying that SRCHCONFEXT table with indexsubtype='WebContent' has been updated with the correct manifest.txt location.
    3. If setting auto index to true: Verifying that the crawled pages are also indexed.

What to do next

After you crawl HCL Commerce site content, you can verify the changes in the storefront.