com.rational.test.util.regex

Class Regex

  • java.lang.Object
    • com.rational.test.util.regex.Regex


  • public class Regex
    extends java.lang.Object
    This is an efficient, lightweight regular expression evaluator/matcher class. it is implemented using org.apache.regexp. Regular expressions are pattern descriptions which enable sophisticated matching of strings. In addition to being able to match a string against a pattern, you can also extract parts of the match. This is especially useful in text parsing. Details on the syntax of regular expression patterns are given below.

    To compile a regular expression (RE), you can simply construct an RE matcher object from the string specification of the pattern, like this:

    
         Regex r = new Regex("a*b");
    
     

    Once you have done this, you can call either of the RE.match methods to perform matching on a String. For example:

    
         boolean matched = r.matches("aaaab");
    
     
    will cause the boolean matched to be set to true because the pattern "a*b" matches the string "aaaab".

    If you were interested in the number of a's which matched the first part of our example expression, you could change the expression to "(a*)b". Then when you compiled the expression and matched it against something like "xaaaab", you would get results like this:

    
         RE r = new Regex("(a*)b");   // Compile expression
         boolean matched = r.matches("xaaaab");   // Match against "xaaaab"
    
     
    String wholeExpr = r.getMatch(0); // wholeExpr will be 'aaaab' String insideParens = r.getMatch(1); // insideParens will be 'aaaa'
    You can also refer to the contents of a parenthesized expression within a regular expression itself. This is called a 'backreference'. The first backreference in a regular expression is denoted by \1, the second by \2 and so on. So the expression:
    
         ([0-9]+)=\1
    
     
    will match any string of the form n=n (like 0=0 or 2=2).

    The full regular expression syntax accepted by RE is described here:

    
     
    Characters
    unicodeChar Matches any identical unicode character \ Used to quote a meta-character (like '*') \\ Matches a single '\' character \0nnn Matches a given octal character \xhh Matches a given 8-bit hexadecimal character \\uhhhh Matches a given 16-bit hexadecimal character \t Matches an ASCII tab character \n Matches an ASCII newline character \r Matches an ASCII return character \f Matches an ASCII form feed character
    Character Classes
    [abc] Simple character class [a-zA-Z] Character class with ranges [^abc] Negated character class
    Standard POSIX Character Classes
    [:alnum:] Alphanumeric characters. [:alpha:] Alphabetic characters. [:blank:] Space and tab characters. [:cntrl:] Control characters. [:digit:] Numeric characters. [:graph:] Characters that are printable and are also visible. (A space is printable, but not visible, while an `a' is both.) [:lower:] Lower-case alphabetic characters. [:print:] Printable characters (characters that are not control characters.) [:punct:] Punctuation characters (characters that are not letter, digits, control characters, or space characters). [:space:] Space characters (such as space, tab, and formfeed, to name a few). [:upper:] Upper-case alphabetic characters. [:xdigit:] Characters that are hexadecimal digits.
    Non-standard POSIX-style Character Classes
    [:javastart:] Start of a Java identifier [:javapart:] Part of a Java identifier
    Predefined Classes
    . Matches any character other than newline \w Matches a "word" character (alphanumeric plus "_") \W Matches a non-word character \s Matches a whitespace character \S Matches a non-whitespace character \d Matches a digit character \D Matches a non-digit character
    Boundary Matchers
    ^ Matches only at the beginning of a line $ Matches only at the end of a line \b Matches only at a word boundary \B Matches only at a non-word boundary
    Greedy Closures
    A* Matches A 0 or more times (greedy) A+ Matches A 1 or more times (greedy) A? Matches A 1 or 0 times (greedy) A{n} Matches A exactly n times (greedy) A{n,} Matches A at least n times (greedy) A{n,m} Matches A at least n but not more than m times (greedy)
    Reluctant Closures
    A*? Matches A 0 or more times (reluctant) A+? Matches A 1 or more times (reluctant) A?? Matches A 0 or 1 times (reluctant)
    Logical Operators
    AB Matches A followed by B A|B Matches either A or B (A) Used for subexpression grouping
    Backreferences
    \1 Backreference to 1st parenthesized subexpression \2 Backreference to 2nd parenthesized subexpression \3 Backreference to 3rd parenthesized subexpression \4 Backreference to 4th parenthesized subexpression \5 Backreference to 5th parenthesized subexpression \6 Backreference to 6th parenthesized subexpression \7 Backreference to 7th parenthesized subexpression \8 Backreference to 8th parenthesized subexpression \9 Backreference to 9th parenthesized subexpression

    All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they match as many elements of the string as possible without causing the overall match to fail. If you want a closure to be reluctant (non-greedy), you can simply follow it with a '?'. A reluctant closure will match as few elements of the string as possible when finding matches. {m,n} closures don't currently support reluctancy.

    • Field Summary

      Fields 
      Modifier and Type Field and Description
      static int MATCH_CASEINDEPENDENT
      MATCH flag to indicate that matching should be case-independent (folded)
      static int MATCH_MULTILINE
      MATCH flag to indicate that newlines should match as BOL/EOL (^ and $)
      static int MATCH_NORMAL
      MATCH flag that specifies normal, case-sensitive matching behaviour.
    • Constructor Summary

      Constructors 
      Constructor and Description
      Regex(java.lang.String pattern)
      Constructs a regular expression based on the specified pattern.
      Regex(java.lang.String pattern, boolean supportMatches)
      Constructs a regular expression based on the specified pattern.
      Regex(java.lang.String pattern, int matchFlags)
      Constructs a regular expression based on the specified pattern.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method and Description
      java.lang.String getMatch()
      Returns the part of the string that matched the pattern.
      java.lang.String getMatch(int index)
      Returns the substring that matched one of the parenthesized subexpressions of the regular expression.
      int getMatchCount()
      Returns the number of parenthesized subexpressions in the regular expression.
      int getMatchFlags() 
      boolean matches(java.lang.String stringToCompare)
      Determines whether the regular expression pattern matches the provided string.
      java.lang.String toString() 
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • MATCH_NORMAL

        public static final int MATCH_NORMAL
        MATCH flag that specifies normal, case-sensitive matching behaviour.
        See Also:
        Constant Field Values
      • MATCH_CASEINDEPENDENT

        public static final int MATCH_CASEINDEPENDENT
        MATCH flag to indicate that matching should be case-independent (folded)
        See Also:
        Constant Field Values
      • MATCH_MULTILINE

        public static final int MATCH_MULTILINE
        MATCH flag to indicate that newlines should match as BOL/EOL (^ and $)
        See Also:
        Constant Field Values
    • Constructor Detail

      • Regex

        public Regex(java.lang.String pattern)
              throws RegexSyntaxException
        Constructs a regular expression based on the specified pattern. As no MATCH flags are specified, MATCH_NORMAL is assumed.
        Throws:
        RegexSyntaxException
      • Regex

        public Regex(java.lang.String pattern,
                     int matchFlags)
              throws RegexSyntaxException
        Constructs a regular expression based on the specified pattern. Values for matchFlags can be either MATCH_NORMAL or a combination of MATCH_MULTILINE and MATCH_CASEINDEPENDENT.
        Throws:
        RegexSyntaxException
      • Regex

        public Regex(java.lang.String pattern,
                     boolean supportMatches)
              throws RegexSyntaxException
        Constructs a regular expression based on the specified pattern. When matches are supported, the getMatch method may be called to return the parenthesized section of the regular expression or the entire matched string.
        Throws:
        RegexSyntaxException
    • Method Detail

      • matches

        public boolean matches(java.lang.String stringToCompare)
        Determines whether the regular expression pattern matches the provided string.
      • getMatch

        public java.lang.String getMatch()
        Returns the part of the string that matched the pattern. This is the same as getMatch(0).
      • getMatch

        public java.lang.String getMatch(int index)
        Returns the substring that matched one of the parenthesized subexpressions of the regular expression. The index indicates which substring should be returned. An index of 0 returns the entire matched string, an index of 1 returns the matched substring for the first parenthesized subexpression of the regular expression, etc.
      • getMatchCount

        public int getMatchCount()
        Returns the number of parenthesized subexpressions in the regular expression.
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • getMatchFlags

        public int getMatchFlags()