Broken Links report
This report provides information about broken links found on the site.
Why it matters
It is critical to access your entire site to ensure thorough security testing is being conducted. If you notice your Security Issues report is not showing any security issues, it might be because there are broken links preventing the scan engine from reaching the entire site.
How broken links are triggered
There are many reasons a link might be reported as broken:- The destination target has been moved or deleted.
- The source document has been moved, rendering the URL not valid.
- The URL is poorly formed (includes unnecessary spaces).
- The HTML uses incorrect syntax, such as open tags, closing tags out of order, and coding errors that might not appear in a browser.
- The server configuration has been changed.
Different types of broken link errors
This table lists the different broken link errors that might appear in the report:
Term | Definition |
---|---|
File Not Found | Occurs when a URL points to file on the server that does not exist. These errors are typically caused by typos in the URL or when the destination file has been deleted or renamed |
Other | Could
not open the file because the device was
not ready. Unauthorized user cannot access document. The server understood the request, but is refusing to fulfill it. |
Cannot Connect | Occurs when the target server does not respond to the request from the browser. These errors are often because of a server that is down or too busy. |
Host Not Found | Occurs when a URL points to a server that cannot be found using its host/domain name (Fully Qualified Domain Name). This type of error typically indicates a problem with the DNS system or connectivity to the DNS system or general Internet connectivity problems. |
Timeout | Occurs when an existing server responds but does not return the data fast enough and the browser times out. |
Remediation and best practices for broken links
- Verify the extensions of the URL.
- Verify the spelling. Even the smallest typing mistake can break a link.
- Check for renamed filenames.
- Check that there are no tabs, line breaks or punctuation marks in the URL that do not belong there.
- Make sure you are using relative links instead of absolute links.
- Visit the homepage of the external link and search for the link through their search facility.
Under certain conditions, links might be reported as broken even when they appear to work correctly. This section describes some of those conditions and provides suggestions for fixing them.
- Links that contain line breaks: If a link contains a line
break, this line break is correctly interpreted as a space and the
link is reported as broken. For example,
<p>This is a test line to show a <a href="testbro<br />kenlink.htm">Broken Link</a></p>
Recommendation: Although some browsers tend to overlook the line break and enable the link to work, the line break should be removed to conform to HTML standards.
- Incorrect proxy settings: For it to work reliably, the Broken Links report depends on proxy information being set up correctly. An incorrect proxy setting might cause external links to be reported as broken even though they are working.
- Broken URLs in JavaScript™: If
a reference to an external JavaScript file
contains a relative path, such as
<SCRIPT SRC= "globaljava.js"...>
, then the URLs in this external file might be reported as being broken when they are not. This type of broken link occurs when the file is parsed, rather than executed. Relative URLs within this file will be interpreted as relative to the JavaScript file, not relative to the URL that included the JavaScript file. Script writers cannot always be sure where the files will be referenced. - Broken links not found in the scan: Some broken links can be redirected to a Custom Error page. This practice is typically used by websites that want to display a personalized explanation to the user when a broken link is encountered. As this custom error page is an actual valid page, the scan does not receive the 404 (broken) HTTP return code when the page is requested. Instead it gets a 200 (OK) returned because the scan thinks the page is fine.
- External links appearing in the "Other" category: External links can appear in this category because the authentication credentials are not accepted. If you are using custom rules to check third-party sites, and one of the pages being checked requires authentication, the authentication credentials entered on the Connections page are used. If those authentication credentials are not acceptable to a third-party page being scanned, then the page will appear as a broken link.
- HTML DOM appearing as context: The HTML DOM (document object model) is an Internet Explorer interface that lets you access the page that would be loaded in IE, and is only used when there is JavaScript executed on a page. As a result, if the "context" of the link is HTML DOM, then it is a link that was found using JavaScript execute.
- JavaScript appearing as context: False positives can happen because you need to set JavaScript to either "Execute Javascript to discover URLs and dynamic content" (the typical case) or "Parse Javascript to discover URLs" (in a few cases).
- Large number of timeouts: If your reports are indicating a large number of "timeout" links and it seems that these are false, try adjusting your content scan job settings.
Information you should know about this report
- External domain URLs will appear in the report if they are redirects from a page on an internal domain. The page it redirects to will be parsed and will be considered as a "page" in the reports.
- If
your site uses frames, HCL® Software
Services or your Product Administrator can make the PageComponent
data sets available so you can use them to group your report results:
- PageComponent: Useful for identifying the files that make up a web page, such as gif, js, html or frames.
- PageComponent ID: A unique ID assigned during a scan to identify this particular component of the page. Open the About this PageComponent report to see more details about this particular PageComponent.