Regular Expressions: Difference between revisions

Latest revision as of 19:12, 20 December 2018

A regular expression (or regex for short) is a standard way of using text to form a search to match patterns.

Similar to using an asterisk like this: *.jpg in a search box to find all JPEG files, you can use a regular expression (along with something like grep) to match much more complex patterns.

For example, you could use:

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b

to search for any e-mail addresses in a file

Cheat Sheet

a - Literal character, like the letter, "a". Every character is literal except these twelve: \ ^ $ . | ? * + ( ) [ {

. (dot) - a single character.

? - the preceding character matches 0 or 1 times only.

* - the preceding character matches 0 or more times.

+ - the preceding character matches 1 or more times.

{n} - the preceding character matches exactly n times.

{n,m} - the preceding character matches at least n times and not more than m times. Example: a{2,4} match the character at least twice, but not more than four times.

[agd] - the character is one of those included within the square brackets.

[^agd] - the character is not one of those included within the square brackets.

[c-f] - the dash within the square brackets operates as a range. In this case it means either the letters c, d, e or f. You can use numbers to specify a range of numbers as well.

() - allows us to group several characters to behave as one.

| (pipe symbol) - the logical OR operation.

^ - matches the beginning of the line.

$ - matches the end of the line.

\ - escapes a special character. For example, if you want to see if a file has a question mark in it, you can't use the question mark symbol because it has a special meaning. So, we escape (tell regex to ignore it's special meaning and treat it as a literal character) it by putting a backslash in front of it. Like this: \?

Anchor Characters

Regular expressions examine the text between separators. If you want to search for a pattern that is at one end or the other, you use anchors. The character ^ is the starting anchor, and the character $ is the end anchor.
Note that ^ and $ are only anchors if the are used at the start (^) or end ($) of a pattern.

Examples:
Pattern Matches
^A "A" at the beginning of a line
A$ "A" at the end of a line
A^ "A^" anywhere on a line
$A "$A" anywhere on a line
^^ "^" at the beginning of a line
$$ "$" at the end of a line

Shorthand Characters

\s will match whitespaces (a space, tab, or line break)
\d will match digits (0-9, you cal also use [0-9])
\w will match word characters (A-z, a-z, and _ (underscore))

Word Boundaries

These identify the boundaries associated with words.

\< used for beginning of the word

\> used for end of the word

\b used for either beginning or end of the word

@@ Line 1: / Line 1: @@
-A regular expressions is a standard way of using text to form a search to match patterns.
+A regular expression (or regex for short) is a standard way of using text to form a search to match patterns.
 Similar to using an asterisk like this: <code>*.jpg</code> in a search box to find all JPEG files, you can use a regular expression (along with something like [[grep]]) to match much more complex patterns.
@@ Line 8: / Line 8: @@
 to search for any e-mail addresses in a file
 == Cheat Sheet ==
+a - Literal character, like the letter, "a". Every character is literal except these twelve: <code> \ ^ $ . | ? * + ( ) [ { </code>
+<code>.</code> (dot) - a single character.
+<code>?</code> - the preceding character matches 0 or 1 times only.
+<code>*</code> - the preceding character matches 0 or more times.
+<code>+</code> - the preceding character matches 1 or more times.
+<code>{n}</code> - the preceding character matches exactly n times.
+<code>{n,m}</code> - the preceding character matches at least n times and not more than m times. Example: <code>a{2,4}</code> match the character at least twice, but not more than four times.
+<code>[agd]</code> - the character is one of those included within the square brackets.
+<code>[^agd]</code> - the character is not one of those included within the square brackets.
+<code>[c-f]</code> - the dash within the square brackets operates as a range. In this case it means either the letters c, d, e or f. You can use numbers to specify a range of numbers as well.
+<code>()</code> - allows us to group several characters to behave as one.
+<code>|</code> (pipe symbol) - the logical OR operation.
+<code>^</code> - matches the beginning of the line.
+<code>$</code> - matches the end of the line.
+<code>\</code> - escapes a special character. For example, if you want to see if a file has a question mark in it, you can't use the question mark symbol because it has a special meaning. So, we escape (tell regex to ignore it's special meaning and treat it as a literal character) it by putting a backslash in front of it. Like this: <code> \? </code>
+== Anchor Characters ==
+Regular expressions examine the text between separators. If you want to search for a pattern that is at one end or the other, you use anchors. The character <code> ^ </code> is the starting anchor, and the character <code> $ </code> is the end anchor.<br />
+Note that ^ and $ are only anchors if the are used at the start (^) or end ($) of a pattern.
+'''Examples:''' <br />
+Pattern 	Matches<br />
+<code> ^A </code> &nbsp; &nbsp; "A" at the beginning of a line<br />
+<code> A$ </code> &nbsp; &nbsp; "A" at the end of a line<br />
+<code> A^ </code> &nbsp; &nbsp; "A^" anywhere on a line<br />
+<code> $A </code> &nbsp; &nbsp; "$A" anywhere on a line<br />
+<code> ^^ </code> &nbsp; &nbsp; "^" at the beginning of a line<br />
+<code> $$ </code> &nbsp; &nbsp; "$" at the end of a line<br />
+== Shorthand Characters ==
+<code> \s </code> &nbsp; &nbsp; will match whitespaces (a space, tab, or line break)<br />
+<code> \d </code> &nbsp; &nbsp; will match digits (0-9, you cal also use [0-9])<br />
+<code> \w </code> &nbsp; &nbsp; will match word characters (A-z, a-z, and _ (underscore))<br />
+== Word Boundaries ==
+These identify the boundaries associated with words.
+<code> \&lt; </code> &nbsp; &nbsp; used for beginning of the word
+<code> \&gt; </code> &nbsp; &nbsp; used for end of the word
+<code> \b </code> &nbsp; &nbsp; used for either beginning or end of the word
 == References ==
@@ Line 19: / Line 79: @@
      <li>[http://ryanstutorials.net/linuxtutorial/grep.php Ryan's Tutorials Grep and Regular Expressions]</li>
      <li>[http://www.regular-expressions.info/ Regular Expressions Info]</li>
+    <li>[http://www.grymoire.com/Unix/Regular.html grymoire Regular Expressions tutorial]</li>
 </ol>

Regular Expressions: Difference between revisions

Latest revision as of 19:12, 20 December 2018

Contents

Cheat Sheet

Anchor Characters

Shorthand Characters

Word Boundaries

References

Navigation menu

Regular Expressions: Difference between revisions

Latest revision as of 19:12, 20 December 2018

Cheat Sheet

Anchor Characters

Shorthand Characters

Word Boundaries

References

Navigation menu

Search