Regular Expressions: Difference between revisions

From Psygen Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
A regular expressions is a standard way of using text to form a search to match patterns.
A regular expression (or regex for short) is a standard way of using text to form a search to match patterns.


Similar to using an asterisk like this: <code>*.jpg</code> in a search box to find all JPEG files, you can use a regular expression (along with something like [[grep]]) to match much more complex patterns.
Similar to using an asterisk like this: <code>*.jpg</code> in a search box to find all JPEG files, you can use a regular expression (along with something like [[grep]]) to match much more complex patterns.
Line 8: Line 8:


to search for any e-mail addresses in a file
to search for any e-mail addresses in a file


== Cheat Sheet ==
== Cheat Sheet ==


a - Literal character, like the letter, "a". Every character is literal except these twelve: <code> \ ^ </code>
<code>.</code> (dot) - a single character.
<code>?</code> - the preceding character matches 0 or 1 times only.
<code>*</code> - the preceding character matches 0 or more times.
<code>+</code> - the preceding character matches 1 or more times.
<code>{n}</code> - the preceding character matches exactly n times.
<code>{n,m}</code> - the preceding character matches at least n times and not more than m times. Example: <code>a{2,4}</code> match the character at least twice, but not more than four times.
<code>[agd]</code> - the character is one of those included within the square brackets.
<code>[^agd]</code> - the character is not one of those included within the square brackets.
<code>[c-f]</code> - the dash within the square brackets operates as a range. In this case it means either the letters c, d, e or f. You can use numbers to specify a range of numbers as well.
<code>()</code> - allows us to group several characters to behave as one.
<code>|</code> (pipe symbol) - the logical OR operation.
<code>^</code> - matches the beginning of the line.
<code>$</code> - matches the end of the line.


<code>\</code> - escapes a special character. For example, if you want to see if a file has a question mark in it, you can't use the question mark symbol because it has a special meaning. So, we escape (tell regex to ignore it's special meaning and treat it as a literal character) it by putting a backslash in front of it. Like this: <code> \? </code>





Revision as of 03:11, 23 December 2016

A regular expression (or regex for short) is a standard way of using text to form a search to match patterns.

Similar to using an asterisk like this: *.jpg in a search box to find all JPEG files, you can use a regular expression (along with something like grep) to match much more complex patterns.

For example, you could use:

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b

to search for any e-mail addresses in a file


Cheat Sheet

a - Literal character, like the letter, "a". Every character is literal except these twelve: \ ^

. (dot) - a single character.

? - the preceding character matches 0 or 1 times only.

* - the preceding character matches 0 or more times.

+ - the preceding character matches 1 or more times.

{n} - the preceding character matches exactly n times.

{n,m} - the preceding character matches at least n times and not more than m times. Example: a{2,4} match the character at least twice, but not more than four times.

[agd] - the character is one of those included within the square brackets.

[^agd] - the character is not one of those included within the square brackets.

[c-f] - the dash within the square brackets operates as a range. In this case it means either the letters c, d, e or f. You can use numbers to specify a range of numbers as well.

() - allows us to group several characters to behave as one.

| (pipe symbol) - the logical OR operation.

^ - matches the beginning of the line.

$ - matches the end of the line.

\ - escapes a special character. For example, if you want to see if a file has a question mark in it, you can't use the question mark symbol because it has a special meaning. So, we escape (tell regex to ignore it's special meaning and treat it as a literal character) it by putting a backslash in front of it. Like this: \?


References

  1. Ryan's Tutorials Grep and Regular Expressions
  2. Regular Expressions Info