Crawling HCL Commerce site content

You can crawl HCL Commerce site content in starter stores using the REST API.

Before you begin

Ensure that the Search server is started, and that you have completed the following tasks.

Build the HCL Commerce Search index.
Important: Ensure that you configure the site content crawler configuration files for your site:
- droidConfig.xml
- filters.txt
For more information, see Site content crawler configuration.
Note: If you are crawling content in a clustered topology, the crawler must be run from a staging environment. That is, crawling should not be performed in a production environment. If the production content must be crawled, configure the crawler to visit the production site rather than running directly from production environment. This method simplifies the setup by restricting the crawler to run in an HCL Commerce staging environment and update the index in the repeater.

Procedure

You can run the utility from the following URL on the HCL Commerce Search server:
```
http://searchHost:port/search/admin/resources/crawler?action=start&langId=langId&storeId=storeId&catalogId=catalogId
```
Where the method is GET and authentication is spiuser. action is the action that the crawler should perform. The possible values are:

start

Starts the crawler.
Required parameters:

langId

Language identifier that you want to use in building an unstructured index. For example, langId=-1.

storeId

The store ID that you want to use in building an unstructured index. For example, storeId=10501.

catalogId

The catalog identifier that you want to use in building an unstructured index. For example, catalogId=10001.

status

Shows the crawler status.

stop

Stops the crawler.
Ensure that the utility runs successfully.
Running the utility with all the parameters involves the following factors:
- Crawling and downloading the crawled pages in HTML format into the destination directory.
- Updating the database with the created manifest.txt file.
- Invoking the indexer.
Each of the these tasks status messages is reported separately.
Depending on the passed parameters, you can check that the utility runs successfully by:
1. Verifying that the crawled pages are downloaded into the destination directory, searchServerPath\search\index\crawler\cache\date\number, where date is the date that the utility was run, and number is the number of runs on that date, starting with 1.
2. Verifying that SRCHCONFEXT table with indexsubtype='WebContent' has been updated with the correct manifest.txt location.
3. If setting auto index to true: Verifying that the crawled pages are also indexed.

What to do next

After you crawl HCL Commerce site content, you can verify the changes in the storefront.