Java Eclipse Linux Operating Systems Web Technology Software Software Engineering Computing Societies

mykek.com Java Community: Regular Expression Tutorial Part 1

Regular Expression is a powerful new feature introduced in J2SE 1.4. Regular expressions is one of the features that make languages such as Perl and Python so useful and powerful for processing text and in applications such as CGI.

This tutorial will demonstrate how the Pattern and Matcher classes can be used to implement a simple variable substition system, illustrating some of the key concepts along the way.

Useful Regular Expression References

Variable Substitution

In many applications, it is useful to be able create a file from a template. This allows the user to customize the overall structure of the file by modifying the template, allowing for greater flexibility in the application. At runtime, the application will instantiate the template by filling in details with dynamic information available to the system.

Variable substitution is a simple yet powerful way of allowing dynamic information to be specified. A special syntax is provided to allow one to specify variable substitution is desired at a particular location in a file. In many Unix shell scripts, this would be a string of the form ${variable_name}. When the shell encounters such a string in the input stream, it would replace ${variable_name} with the value associated with the variable variable_name.

For example, suppose the variable userID is currently set to mike. The following template

Welcome, ${userID}

would become

Welcome, mike

The character $ has special meaning in this scheme. In order to allow the literal $ to be used, it is useful to allow "escaping" of these special characters. In many Unix shell script, the \ character serves as the escape character. If a string that looks like a variable substitution string is preceded by the escape character, it would be treated as aliteral string and no substitution would take place.

For example, the following template

\${userID} is ${userID}

would become

${userID} is mike

in the instantiated file.

It turns out regular expression could allow one to implement variable substitutions as described about fairly easy. The rest of this tutorial would focus on how one could apply the concepts of regular expression to accomplish this task.

Please note that for more complicated grammars, it may be more appropriate to implementing them using Lex & Yacc, or the Java equivalents: JavaCC and Byacc/J.

Test Cases for the variable substitution

We will create a method substituteString that will take a template string, and return a string with all variables substituted. In keeping with the the Test First rule of Extreme Programming, we will start with the test cases.

For the purpose of this tutorial, we will use a simple Java main routine for the purpose of unit testing. In a real project, technologies such as JUnit may be more appropriate instead.

	public static void main(String [] args) {
		setValue("abc", "ABC");
		setValue("defg", "DEFG");
		printSubstitutedString("abc");
		printSubstitutedString("a${abc}");
		printSubstitutedString("${abc}");
		printSubstitutedString("${abc}${defg}");
		printSubstitutedString("${abc}z${defg}");
		printSubstitutedString("abc${abc}${defg}xyz");
		printSubstitutedString("abc${abc}xy${defg}xyz");
		printSubstitutedString("abc\\${abc}xy\\\\${defg}xyz\\\\\\${abc}xy");
		printSubstitutedString("\\${abc}xy\\\\${defg}xyz\\\\\\${abc}xy");
		printSubstitutedString("\\\\${abc}xy\\\\${defg}xyz\\\\\\${abc}xy");
		printSubstitutedString("\\\\${abc}\\\\${defg}xyz\\\\\\${abc}xy");
		printSubstitutedString("\\\\\\${abc}xy\\\\${defg}xyz\\\\\\${abc}xy");
		printSubstitutedString("\\a\\${abc}xy\\\\${defg}xyz\\\\\\${abc}xy\\\\\\lmn");
	}
	
	private static void printSubstitutedString(String line) {
		System.out.println("\"" + line + "\" becomes \"" + substituteString(line) + "\"");
	}

The method setValue is used to associate a given value with the specified variable. The implementation is left as an exercise for the reader. (A simple implementation may involve the HashMap class.) The main method uses the setValue method to associate the value "ABC" with the variable abc, and the value "DEFG" with the variable defg.

The method printSubstituedString is a convenience method that calls the method substituteString (which we shall examine in detail in the rest of this tutorial) to perform variable substitution on the input string. The input string and the substituted string are then displayed in standard out as output. The main method calls printSubstitutedString method for each of the test cases.

Test CaseTemplate StringSubstituted String
1abcabc
2a${abc}aABC
3${abc}ABC
4${abc}${defg}ABCDEFG
5${abc}z${defg}ABCzDEFG
6abc${abc}${defg}xyzabcABCDEFGxyz
7abc${abc}xy${defg}xyzabcABCxyDEFGxyz
8abc\${abc}xy\\${defg}xyz\\\${abc}xyabc${abc}xy\DEFGxyz\${abc}xy
9\${abc}xy\\${defg}xyz\\\${abc}xy");${abc}xy\DEFGxy\${abc}xy
10\\${abc}xy\\${defg}xyz\\\${abc}xy");\ABCxy\DEFGxyz\${abc}xy
11\\${abc}\\${defg}xyz\\\${abc}xy");\ABC\DEFGxyz\${abc}xy
12\\\${abc}xy\\${defg}xyz\\\${abc}xy");\${abc}xy\DEFGxyz\${abc}xy
13\a\${abc}xy\\${defg}xyz\\\${abc}xy\\\lmn");\a${abc}xy\DEFGxyz\${abc}xy\\lmn

The table above summarizes the test cases contained in the main method and the expected results for each. Note that the \ characters serves as an escape character for Java, the variable substitution syntax, and for regular expressions in general. This could make the Java code involving these grammars a little difficult read, whenever the \ character is involved (since it needs to be repeatedly escaped).

Page 1 2 3 Next

Valid XHTML 1.0!


Written by Mike Kwong