Regular Expressions: Difference between revisions
No edit summary |
No edit summary |
||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
A regular | A regular expression (or regex for short) is a standard way of using text to form a search to match patterns. | ||
Similar to using an asterisk like this: <code>*.jpg</code> in a search box to find all JPEG files, you can use a regular expression (along with something like [[grep]]) to match much more complex patterns. | Similar to using an asterisk like this: <code>*.jpg</code> in a search box to find all JPEG files, you can use a regular expression (along with something like [[grep]]) to match much more complex patterns. | ||
Line 8: | Line 8: | ||
to search for any e-mail addresses in a file | to search for any e-mail addresses in a file | ||
== Cheat Sheet == | == Cheat Sheet == | ||
a - Literal character, like the letter, "a". Every character is literal except these twelve: <code> \ ^ $ . | ? * + ( ) [ { </code> | |||
<code>.</code> (dot) - a single character. | |||
<code>?</code> - the preceding character matches 0 or 1 times only. | |||
<code>*</code> - the preceding character matches 0 or more times. | |||
<code>+</code> - the preceding character matches 1 or more times. | |||
<code>{n}</code> - the preceding character matches exactly n times. | |||
<code>{n,m}</code> - the preceding character matches at least n times and not more than m times. Example: <code>a{2,4}</code> match the character at least twice, but not more than four times. | |||
<code>[agd]</code> - the character is one of those included within the square brackets. | |||
<code>[^agd]</code> - the character is not one of those included within the square brackets. | |||
<code>[c-f]</code> - the dash within the square brackets operates as a range. In this case it means either the letters c, d, e or f. You can use numbers to specify a range of numbers as well. | |||
<code>()</code> - allows us to group several characters to behave as one. | |||
<code>|</code> (pipe symbol) - the logical OR operation. | |||
<code>^</code> - matches the beginning of the line. | |||
<code>$</code> - matches the end of the line. | |||
<code>\</code> - escapes a special character. For example, if you want to see if a file has a question mark in it, you can't use the question mark symbol because it has a special meaning. So, we escape (tell regex to ignore it's special meaning and treat it as a literal character) it by putting a backslash in front of it. Like this: <code> \? </code> | |||
== Anchor Characters == | |||
Regular expressions examine the text between separators. If you want to search for a pattern that is at one end or the other, you use anchors. The character <code> ^ </code> is the starting anchor, and the character <code> $ </code> is the end anchor.<br /> | |||
Note that ^ and $ are only anchors if the are used at the start (^) or end ($) of a pattern. | |||
'''Examples:''' <br /> | |||
Pattern Matches<br /> | |||
<code> ^A </code> "A" at the beginning of a line<br /> | |||
<code> A$ </code> "A" at the end of a line<br /> | |||
<code> A^ </code> "A^" anywhere on a line<br /> | |||
<code> $A </code> "$A" anywhere on a line<br /> | |||
<code> ^^ </code> "^" at the beginning of a line<br /> | |||
<code> $$ </code> "$" at the end of a line<br /> | |||
== Shorthand Characters == | |||
<code> \s </code> will match whitespaces (a space, tab, or line break)<br /> | |||
<code> \d </code> will match digits (0-9, you cal also use [0-9])<br /> | |||
<code> \w </code> will match word characters (A-z, a-z, and _ (underscore))<br /> | |||
== Word Boundaries == | |||
These identify the boundaries associated with words. | |||
<code> \< </code> used for beginning of the word | |||
<code> \> </code> used for end of the word | |||
<code> \b </code> used for either beginning or end of the word | |||
== References == | == References == | ||
Line 19: | Line 79: | ||
<li>[http://ryanstutorials.net/linuxtutorial/grep.php Ryan's Tutorials Grep and Regular Expressions]</li> | <li>[http://ryanstutorials.net/linuxtutorial/grep.php Ryan's Tutorials Grep and Regular Expressions]</li> | ||
<li>[http://www.regular-expressions.info/ Regular Expressions Info]</li> | <li>[http://www.regular-expressions.info/ Regular Expressions Info]</li> | ||
<li>[http://www.grymoire.com/Unix/Regular.html grymoire Regular Expressions tutorial]</li> | |||
</ol> | </ol> |
Latest revision as of 19:12, 20 December 2018
A regular expression (or regex for short) is a standard way of using text to form a search to match patterns.
Similar to using an asterisk like this: *.jpg
in a search box to find all JPEG files, you can use a regular expression (along with something like grep) to match much more complex patterns.
For example, you could use:
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b
to search for any e-mail addresses in a file
Cheat Sheet
a - Literal character, like the letter, "a". Every character is literal except these twelve: \ ^ $ . | ? * + ( ) [ {
.
(dot) - a single character.
?
- the preceding character matches 0 or 1 times only.
*
- the preceding character matches 0 or more times.
+
- the preceding character matches 1 or more times.
{n}
- the preceding character matches exactly n times.
{n,m}
- the preceding character matches at least n times and not more than m times. Example: a{2,4}
match the character at least twice, but not more than four times.
[agd]
- the character is one of those included within the square brackets.
[^agd]
- the character is not one of those included within the square brackets.
[c-f]
- the dash within the square brackets operates as a range. In this case it means either the letters c, d, e or f. You can use numbers to specify a range of numbers as well.
()
- allows us to group several characters to behave as one.
|
(pipe symbol) - the logical OR operation.
^
- matches the beginning of the line.
$
- matches the end of the line.
\
- escapes a special character. For example, if you want to see if a file has a question mark in it, you can't use the question mark symbol because it has a special meaning. So, we escape (tell regex to ignore it's special meaning and treat it as a literal character) it by putting a backslash in front of it. Like this: \?
Anchor Characters
Regular expressions examine the text between separators. If you want to search for a pattern that is at one end or the other, you use anchors. The character ^
is the starting anchor, and the character $
is the end anchor.
Note that ^ and $ are only anchors if the are used at the start (^) or end ($) of a pattern.
Examples:
Pattern Matches
^A
"A" at the beginning of a line
A$
"A" at the end of a line
A^
"A^" anywhere on a line
$A
"$A" anywhere on a line
^^
"^" at the beginning of a line
$$
"$" at the end of a line
Shorthand Characters
\s
will match whitespaces (a space, tab, or line break)
\d
will match digits (0-9, you cal also use [0-9])
\w
will match word characters (A-z, a-z, and _ (underscore))
Word Boundaries
These identify the boundaries associated with words.
\<
used for beginning of the word
\>
used for end of the word
\b
used for either beginning or end of the word