Title: Regular Expressions in .NET
1Regular Expressions in .NET
- Ashraya R. Mathur
- CS 795 - .NET Security
2Outline
- Introduction to Regular Expressions
- Regular Expression Syntax
- Validation in ASP.NET
- Regular Expressions in .NET Programming
- Demonstrations
- Conclusion
3What are Regular Expressions?
- Definition
- A Regular Expression is a series of characters
that are transformed into an algorithm that
matches and manipulates text - Allow you to
- Extract, edit, replace, or delete text substrings
- Add the extracted strings to a collection in
order to generate a report - Are a universally valuable skill applicable in
.NET, Java, Perl, PHP, JavaScript, and many other
programming languages
4Common Regular Expression Uses
- Form and Data Validation
- Query-String Validation
- Data Clean-up / Reformatting
- Data search and retrieval
- HTML / XML Information Retrieval
- Parsing Log Files
5Regular Expressions Syntax
- Simple Expressions
- Simplest Regular Expression - the literal string
- Quantifiers
- , which describes "0 or more occurrences,
- , which describes "1 or more occurrences", and
- ?, which describes "0 or 1 occurrence".
- Explicit Quantifiers - x,y, which allow an
exact number or range to be specified - Quantifiers always refer to the pattern
immediately preceding (to the left of) the
quantifier
6Regular Expressions Syntax(contd)
- Metacharacters
- include the following . ( ) and \
- . matches a single character
- and mark the start and end positions of a
line of text. Ex aa-zb - () are used to group an expression. Ex (abc)
- A class of characters from which the pattern
can match one. Ex a-z, A-Z, 0-9 - indicates an either-or situation Ex abcd
- \ used as an escape character. Ex c\\
7Sample Regular Expressions
Pattern Description
\d5 5 numeric digits, US ZIP code.
(\d5(-\d4)? Same as previous, but more efficient. Optional US ZIP4 format
\w_at_a-z?\.a-z2,3 Simple email validation expression
\d3-\d2-\d4 Social Security Number Validation
\d1,2\/\d1,2\/\d4 Date Format Validation
(\w-\.)\w-(/\w- ./?)? URL Validation
/\.\/ Matches the contents of a C-style comment / /
8Validation in ASP.NET
- RegularExpressionValidator Validation Control
- Allows you to validate inputs by providing a
regular expression which must match the input. - The regular expression pattern is specified by
setting the ValidationExpression property of the
control. - Key properties
- ControlToValidate
- ErrorMessage (for the ValidationSummary)
- ltaspRegularExpressionValidator runat"server"
iddate1 ControlToValidateTextBox1"
ErrorMessage"Invalid Date" ValidationExpression
"\d1,2\/\d1,2\/\d4" /gt
9Regular Expressions in .NET Programming
- .NET Base Classes
- Namespace System.Text.RegularExpressions
- Can use from any .NET language
- Implements the Traditional NFA RegEX Engine
- As does Java, Perl, PHP etc..
- Almost all patterns will work the same
- .NET is only one to implement Named Captures
10The RegEx NamespaceSystem.Text.RegularExpressions
- RegEx
- Match
- MatchCollection
- Group
- GroupCollection
- Capture
- CaptureCollection
- RegExCompilationInfo
11The Regex Base Class
- The Regex class represents a single regular
expression - It is immutable, which means once you create it,
you cannot change it - To create a Regex object in C, you can first
define it and then instantiate it with the
regular expression pattern, as shown here
Regex myRegex myRegex new Regex(RegularExpressionPattern)
12The Regex Base Class (Contd)
- Match Searches a given string and returns a
single Match object for the first text that is
matched by the regular expression pattern - Matches Searches a given string and returns a
MatchCollection object for all locations that are
matched by the pattern stored in the Regex object - IsMatch Returns True if the provided string
contains the pattern - Split Splits the given string into an array of
substrings using the regular expression pattern
as the delimiter - Replace Replaces any instances of text that
match the pattern in the Regex object with the
provided expression
13Demonstration 1
- private void btnRun_Click(object sender,
System.EventArgs e) -
- //Use the RegEx object to determine if there is
a match here we use a //single RegEx object
passing in the pattern and option to ignore case - Regex rxMatch new Regex(txtRegEx.Text,
RegexOptions.IgnoreCase) - //determine if there is a match using the user
input - bool blnResultrxMatch.IsMatch(txtText.Text)
- //display those results to the user
- MessageBox.Show("The Result is "
blnResult.ToString(),"RegEx Demo") -
14Match and Match Collection
- Allows us to obtain the details of each match
made via a regular expression - Match-represents a single match made
- MatchCollection-a collection of Match Objects
- When the Match method of the Regex object is
used, it returns a Match object that contains the
matching text - The MatchCollection object contains a series of
Match objects, each representing a single
substring from the string searched
15Demonstration 2
- private void btnRun_Click(object sender,
System.EventArgs e) -
- //Use the RegEx object to determine if there is a
match here we use a - //single RegEx object passing in the pattern and
option to ignore case - Regex rxMatch new Regex(txtRegEx.Text,
RegexOptions.IgnoreCase) - Match mtMatch
- MatchCollection mtCol
- mtMatch rxMatch.Match(txtText.Text)
- mtColrxMatch.Matches(txtText.Text)
- MessageBox.Show("There are " mtCol.Count "
matche(s) found.","RegEx Demos")
16Demonstration 2 (contd)
- //if there are more than 0 matches, show them
- if (mtCol.Countgt0)
- //use the Match object here
- do
- //we want the match.value and position in the
string - MessageBox.Show("Result at position string "
mtMatch.Index.ToString() " "
mtMatch.Value.ToString(),"RegEx Demos") - mtMatchmtMatch.NextMatch()
- while (mtMatch.Success)
-
-
17Group and GroupCollection
- Capturing ()
- The captured subsequence may be used later in the
expression, via a back reference, and may also be
retrieved from the matcher once the match
operation is complete - Non-Capturing (?)
- Named Capture (.NET only) (?ltnamegt)
- Uses names for the captured groups instead of
numbers - Substitutions
- Specialized Replace via groups
18Backreferences Advanced Grouping
- Backreferences
- Allows you to match the same characters as a
previous group - Match repeated words
- (\ba-zA-Z \b)\s\1
- Advanced Grouping
- Positive Look-Ahead Assertion (?)
- Negative Look-Ahead Assertion (?!)
- Positive Look-Behind Assertion (?lt)
- Negative Look-Behind Assertion (?lt!)
- Non-Backtracking (?gt)
19Replacing Substrings
- The Replace method of Regex is used to replace
matched portions of a given string with the
specified replacement. - Example using backrefrence named capture
- NewDateYMD Regex.Replace( OldDateMDY,
\b(?ltmonthgt\d1,2)/(?ltdaygt\d1,2)/(?ltyeargt\d2,
4)\b, year-month-day)
20Demonstration 3
- private void btnCapture_Click(object sender,
System.EventArgs e) -
- //a basic pattern that will capture any word w/
4 characters - string strRegExPattern"(A-Za-z4)"
- Regex rxGroups new Regex(strRegExPattern,RegexO
ptions.IgnoreCase) - //Match Object-gtUsing a group here
- Match mtGroup rxGroups.Match(txtCapture.Text)
- //get all of the groups that exist
- do
- MessageBox.Show(mtGroup.Groups1.Value, "RegEx
Demos") - mtGroupmtGroup.NextMatch()
- while (mtGroup.Success)
21Demonstration 3 (contd)
- private void btnNamedCapture_Click(object sender,
System.EventArgs e) -
- //a basic pattern that will capture any word w/ 4
characters - //and the ability to use named capturing
- string strRegExPattern"(?ltwordgtA-Za-z4)"
- Regex rxGroups new Regex(strRegExPattern,RegexOp
tions.IgnoreCase) - Match mtGroup rxGroups.Match(txtCapture.Text)
- do
- //show the match using the named reference "word"
- MessageBox.Show(mtGroup.Result("word"), "RegEx
Demos") - mtGroupmtGroup.NextMatch()
- while (mtGroup.Success)
22Demonstration 3 (contd)
- private void btnBack_Click(object sender,
System.EventArgs e) -
- //Use the RegEx object to determine if there is
a - //duplicate word here using the
(\ba-zA-Z\b)\s\1 pattern - Regex rxMatch new Regex(txtRegEx.Text,RegexOp
tions.IgnoreCase) - //string to replace the text into
- //replace the repeated word /w nothing 1
- string strReplacerxMatch.Replace(txtBack.Text,"
1") - //show the results
- MessageBox.Show(strReplace,"RegEx Demos")
23References
- Regular Expression Library http//regexlib.com/
- Regular Expressions Information Website
http//www.regular-expressions.info/dotnet.html - Regular Expressions in .NET MSDN Library
- Professional Visual Studio 2005
- Andrew Parsons and Nick Randolph
24Questions?