How a security scan for WebSphere® portals works
To navigate a site, a web crawler needs to be able to identify particular URLs to unique pages. WebSphere® Portal URLs pose a unique situation for automatic crawling because they contain navigational state information that is encoded using server-side information.
The challenge arises because the same navigational state does not generate the same URL. Because of this encoded and dynamic nature of WebSphere® Portal URLs, the traditional URL identification process of a web crawler does not work and might result in a never-ending scan scenario. For the same reason, detecting a logout link during the explore phase is not possible, which might cause the scan to repeatedly go out of session.
The way that scan data, such as the name of the pages explored by the web crawler, is displayed in the reports is also a challenge when scanning WebSphere® Portal sites. For traditional websites, web crawlers often use the URLs as page names because they uniquely identify a page in a human-readable format. Because WebSphere® Portal URLs are encoded, it is difficult to identify a page by looking at the URLs.
The scan uses REST services provided by WebSphere® Portal to decode the navigational state and use this state to identify visited URLs. It sends the encoded WebSphere® Portal URL to a decoding web service which returns the decoded navigational state.