Performing a background crawl
You can use a SearchService command to perform a background crawl of the Search seedlists without creating a Search index.
Before you begin
See Starting the wsadmin client for information about how to start the wsadmin command-line tool.
About this task
Procedure
-
Start the wsadmin client from one of the following directories on the system on which you
installed the Deployment Manager:
Linux:
app_server_root\profiles\dm_profile_root\bin
Windows:
where app_server_root is the WebSphere® Application Server installation directory and dm_profile_root is the Deployment Manager profile directory, typically dmgr01.app_server_root/profiles/dm_profile_root/bin
You must start the client from this directory or subsequent commands that you enter do not execute correctly.
- After the wsadmin command environment
has initialized, enter the following command to initialize the Search
environment and start the Search script interpreter:
execfile("searchAdmin.py")
If prompted to specify a service to connect to, type 1 to pick the first node in the list. Most commands can run on any node. If the command writes or reads information to or from a file using a local file path, you must pick the node where the file is stored.When the command is run successfully, the following message displays:Search Administration initialized
- Enter the following command:
- SearchService.startBackgroundCrawl(String persistenceLocation, String components)
Crawls the seedlists for the specified applications and then saves the seedlists to the specified location. This command does not build an index.
The command takes the following parameters:- persistenceLocation
- A string that specifies the path to which the seedlists are to be saved.
- components
- A string that specifies the applications whose seedlists are to be crawled. The following values
are valid:
- activities
- all_configured
- blogs
- calendar
- communities
- dogear
- ecm_files
- files
- forums
- people_finder
- profiles
- status_updates
- wikis
For example:SearchService.startBackgroundCrawl("/opt/IBM/Connections/backgroundCrawl", "activities, forums, communities, wikis")
What to do next
- Extract file content. For more information, see Extracting file content.
- Create a background index. For more information, see Creating a background index.
- Create a foreground index. For more information, see Recreating
the Search index.
If you want to create a foreground index, copy the persisted seedlists from the persistence location that you specified when you used the startBackgroundIndex command to the CRAWLER_PAGE_PERSISTENCE_DIR directory on the node that is doing the indexing.
In a multi-node system, you might want to copy the seedlists to the CRAWLER_PAGE_PERSISTENCE_DIR directory on all nodes. Alternatively, you can set the CRAWLER_PAGE_PERSISTENCE_DIR variable to a network location and copy the persisted seedlists from the persistence location you specified to that location.