Expressions

You can use both DOS expressions and regular expressions to specify patterns within text entry fields of property sheets. Job property sheets accept regular expressions, but they do not accept DOS expressions. Most report pack property sheets, with the exception of XRules, do not accept regular expressions but they do accept DOS expressions. A single regular expression engine is used throughout the product, which uses the syntax of Microsoft's .Net 2.0.

Some examples of job property sheets that accept regular expressions include Exclusions, Servers and Domains, and XRules.

Using expressions, you can define a group of URLs that meet specific requirements within one statement. For example, you can use the DOS expression *MyProduct*.htm to specify that you want a particular product's pages excluded from a report pack.

DOS syntax

* represents any character or string of characters, where *a* yields all files with the character `a" in the name.

? represents a single character, where ??c yields every three-character name ending with the character `c".

Regular expression syntax

Regular expressions must be prefixed with regexp: so the scan job can distinguish it from a plain text entry. The syntax used for regular expressions follows that of .Net 2.0.

The following table describes some of the special characters and commands supported for regular expressions. A more complete list of supported syntax can be found in the .Net 2.0 regular expression engine documentation provided by Microsoft.

Table 1. Valid Regular Expression Syntax and Commands 
Syntax Description
^ Match the beginning of the string. This syntax assumes there are no characters before the ^ in the string. For example, say that you have two URLs: http://www.example.com/support/index.asp and http://support.example.com/index.asp?URL=http://www.example.com. The expression ^http://www\.example\.com would only match the first URL, not the second.
$ Match the end of the string.
? P? - An optional occurrence (zero or one) of P.
* P* - 0 or more repetitions of P.
+ P + - 1 or more repetitions of P.
| P |Q - Either P or Q.
\ \X - Escapes one of the characters ()?*+|\.[]^$, that otherwise would be interpreted as a special character.
. A period matches any single character.
[ ] A range of characters in the brackets, such as c[a-o]t, or any one of the characters, c[aou]t.
( ) (ab|cd)?ef - Groups a series of regular expressions together, which in this case would match abef, cdef, and ef.
- If you want to express the matching characters using a range instead of the characters themselves, you can separate the beginning and ending characters in the range using the hyphen (-) character. The character value of the individual characters determines their relative order within a range.
<operator1>&&

<operator2>

<operator1>and

<operator2>

A logical AND.
<operator1>||<operator2>

<operator1>or

<operator2>

A logical OR.
!<operator1>not

<operator2>

A logical NOT. It requires a right operand. It must be preceded by a binary operator if there is an expression to the left of the operator. For example, "A and not b" is correct, but "A not b" is incorrect.
"<string>" A literal sting. Operands that contain commands can be placed inside a string delimited by double quotes. For example, "(Hello and Goodbye)". Precede a double quote within a literal string with a back slash.

Examples of regular expressions

.* any sequence of characters

.*\. any sequence of characters followed by . (a period)

Note: Be very careful with regular expressions that match using .*?{1,x} or w+ to capture text around a match. They can increase the scan time considerably because they test every character in the document to see if it is a match for the entire expression.

In exclusions: regexp:.*\.watch.*fire.* would exclude http://www.watchoutforfiresinforest.com as well as http://www.watch.com/products/firestone/index.asp.