em Scripts PHP

Tudo sobre expressões regulares do MySQL ao PHP

Motivo de adoração por uns e terror por outros as expressões regulares são ferramentas que podem auxiliar e descomplicar a vida de muitos developers. Neste material que separmos em inglês você poderá aprender como funciona e aprender mais sobre as famosas REGEX.

MYSQL REGULAR EXPRESSIONS CHEAT SHEET

MySQL Regular Expressions
Regular Expressions in MySQL are used within the REGEXP and RLIKE sections of WHERE clauses in the selection of records for display, update or deletion. They use Henry Spencer’s implementation, which is aimed at conformance with POSIX standard 1003.2, extended version.

Operator TypeExamplesDescription
Literal Characters
Match a character exactly
a A y 6 % @Letters, digits and many special
characters match exactly
\$ \^ \+ \\ \?Precede other special characters
with a \ to cancel their regex special meaning
\n \t \rLiteral new line, tab, return
\cJ \cGControl codes
\xa3Hex codes for any character
Anchors and assertions^Field starts with
$Field ends with
[[:<:]]Word starts with
[[:>:]]Word ends with
Character groups
any 1 character from the group
[aAeEiou]any character listed from [ to ]
[^aAeEiou]any character except aAeEio or u
[a-fA-F0-9]any hex character (0 to 9 or a to f)
.any character at all
[[:space:]]any space character (space \n \r or \t)
[[:alnum:]]any alphanumeric character (letter or digit)
Counts
apply to previous element
+1 or more (“some”)
*0 or more (“perhaps some”)
?0 or 1 (“perhaps a”)
{4}exactly 4
{4,}4 or more
{4,8}between 4 and 8
Add a ? after any count to turn it sparse (match as few as possible) rather than have it default to greedy
Alternation|either, or
Grouping( )group for count and save to variable

The above list shows the most commonly used elements of MySQL regular expressions, and is not exhaustive.POSIX CHARACTER CLASS DEFINITIONS

POSIX Character Class Definitions

Value

Meaning

[:digit:]Only the digits 0 to 9
[:alnum:]Any alphanumeric character 0 to 9 OR A to Z or a to z.
[:alpha:]Any alpha character A to Z or a to z.
[:blank:]Space and TAB characters only.
[:xdigit:]Hexadecimal notation 0-9, A-F, a-f.
[:punct:]Punctuation symbols . , ” ‘ ? ! ; : # $ % & ( ) * + – / < > = @ [ ] \ ^ _ { } | ~
[:print:]Any printable character.
[:space:]Any whitespace characters (space, tab, NL, FF, VT, CR). Many system abbreviate as \s.
[:graph:]Exclude whitespace (SPACE, TAB). Many system abbreviate as \W.
[:upper:]Any alpha character A to Z.
[:lower:]Any alpha character a to z.
[:cntrl:]Control Characters NL CR LF TAB VT FF NUL SOH STX EXT EOT ENQ ACK SO SI DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC IS1 IS2 IS3 IS4 DEL.

These are always used inside square brackets in the form [[:alnum:]] or combined as [[:digit:]a-d]

Example

Replace URLs with links

$text = ereg_replace("[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]",
"\\0", $text);
?>

EREG_REPLACE — REPLACE REGULAR EXPRESSION

ereg_replace — Replace regular expression

This function scans string for matches to pattern , then replaces the matched text
with replacement
string ereg_replace ( string $pattern , string $replacement , string $string )

pattern – A POSIX extended regular expression.

replacement – If pattern contains parenthesized substrings, replacement may contain substrings
of the form \\digit, which will be replaced by the text matching the digit’th parenthesized
substring; \\0 will produce the entire contents of string. Up to nine substrings may be used.
Parentheses may be nested, in which case they are counted by the opening parenthesis.

string -The input string.

Return Values
The modified string is returned. If no matches are found in string , then it will be returned
unchanged.
Example#1

$str= "This is a test";
echo str_replace(" is", " was", $str);
echo ereg_replace("( )is", "\\1was", $str);
echo ereg_replace("(( )is)", "\\2was", $str);
?>
prints “This was a test” three times:
Example#2

$s = "Coding PHP is fun.";
$pattern = "(.*)PHP(.*)";
$replacement = " They say \\1other languages\\2";
print ereg_replace($pattern, $replacement, $s);
?>

Output:
They say Coding other languages is fun.
Explanation:
"PHP" is replaced with "other languages", and the sentence is changed a little, using \1 and \2 to access the parts within parentheses.
Example#3
Replace URLs with links

$text = ereg_replace("[[:alpha:]]+://[^<>[:space:]]+[[:alnum:]/]",
"\\0", $text);
?>



EREGI — CASE INSENSITIVE REGULAR EXPRESSION MATCH

eregi — Case insensitive regular expression match
This function is identical to ereg() except that it ignores case distinction when matching alphabetic characters.
int eregi ( string $pattern , string $string [, array &$regs ] )
pattern – Case insensitive regular expression.
string – The input string.
regs – If matches are found for parenthesized substrings of pattern and the function is called
with the third argument regs , the matches will be stored in the elements of the array regs .
$regs[1] will contain the substring which starts at the first left parenthesis; $regs[2] will
contain the substring starting at the second, and so on. $regs[0] will contain a copy of the
complete string matched.
Example#1

= 'XYZ';
if (eregi('z', $string)) {
echo "'$string' contains a 'z' or 'Z'!";
}
?>

EREG — REGULAR EXPRESSION MATCH

 ereg — Regular expression match

Searches a string for matches to the regular expression given in pattern in a case-sensitive way.

int ereg ( string $pattern , string $string [, array &$regs ] )
pattern – Case sensitive regular expression.
string- The input string.

regs -If matches are found for parenthesized substrings of pattern and the function is called with the third argument regs , the matches will be stored in the elements of the array regs .$regs[1] will contain the substring which starts at the first left parenthesis; $regs[2] will contain the substring starting at the second, and so on. $regs[0] will contain a copy of the complete string matched.

Return Values

Returns the length of the matched string if a match for pattern was found in string , or FALSE if no matches were found or an error occurred.
If the optional parameter regs was not passed or the length of the matched string is 0, this function returns 1.

Example#1
The following code snippet takes a date in ISO format (YYYY-MM-DD) and prints it in DD.MM.YYYY format:

if (ereg ("([0-9]{4})-([0-9]{1,2})-([0-9]{1,2})", $date, $regs)) {
echo "$regs[3].$regs[2].$regs[1]";
} else {
echo "Invalid date format: $date";
}
?>
Example#2
Check if string only contains letters and numbers.
if (ereg("[^A-Za-z0-9]", $string)) {
echo "Error: String can only contain letters and numbers!";
exit();
}
Example#3
This is intended to validate fully specified (international) phone numbers without forcing the user to use the full international format and giving them maximum reasonable flexibility including an optional extension number.
Allows numbers plus any of: space():.ext,+-
Example: +44(0)113 249-0442 ext:1234
The code matches any combination of the allowed character set.
php
$phoneNumber="+44(0)113 249-0442 ext:1234";
$regex="[0-9 ():.ext,+-]{".strlen($phoneNumber)."}";
if(ereg($regex,$phoneNumber)){
echo “ok”;
}else {
echo “invalid phone number”;
}
?>

PHP REGEX CHEAT SHEET

Special Sequences
  • \w – Any “word” character (a-z 0-9 _)
  • \W – Any non “word” character
  • \s – Whitespace (space, tab CRLF)
  • \S – Any non whitepsace character
  • \d – Digits (0-9)
  • \D – Any non digit character
  • . – (Period) – Any character except newline

Meta Characters

  • ^ – Start of subject (or line in multiline mode)
  • $ – End of subject (or line in multiline mode)
  • [ – Start character class definition
  • ] – End character class definition
  • | – Alternates, eg (a|b) matches a or b
  • ( – Start subpattern
  • ) – End subpattern
  • \ – Escape character

Quantifiers

  • n* – Zero or more of n
  • n+ – One or more of n
  • n ? – Zero or one occurrences of n
  • {n} – n occurrences exactly
  • {n,} – At least n occurrences
  • {,m} – At most m occurrences
  • {n,m} – Between n and m occurrences (inclusive)

Pattern Modifiers

  • i – Case Insensitive
  • m – Multiline mode – ^ and $ match start and end of lines
  • s – Dotall – . class includes newline
  • x – Extended– comments and whitespace
  • e – preg_replace only – enables evaluation of replacement as PHP code
  • S – Extra analysis of pattern
  • U – Pattern is ungreedy
  • u – Pattern is treated as UTF-8

Point based assertions

  • \b – Word boundary
  • \B – Not a word boundary
  • \A – Start of subject
  • \Z – End of subject or newline at end
  • \z – End of subject
  • \G – First matching position in subject

Assertions

  • (?=) – Positive look ahead assertion foo(?=bar) matches foo when followed by bar
  • (?!) – Negative look ahead assertion foo(?!bar) matches foo when not followed by bar
  • (?<=) – Positive look behind assertion (?<=foo)bar matches bar when preceded by foo
  • (? – Negative look behind assertion (?
  • (?>) – Once-only subpatterns (?>\d+)bar Performance enhancing when bar not present
  • (?(x)) – Conditional subpatterns
  • (?(3)foo|fu)bar – Matches foo if 3rd subpattern has matched, fu if not
  • (?#) – Comment (?# Pattern does x y or z)

HOW TO NEGATE A CHARACTER CLASS

Using the ^ (‘carrot’) to negate a character class:

Metacharacters [] denote a “character class”. Within a ‘character class’ you can have what is called a ‘range’, which I have been using in the examples above. Lets take a look:
^[c-f]$
This means “the beginning of the line followed by a lowercase letter between c and f (c,d,e, or f), followed by the end of the line.”

The carrot character, ^, can also be used inside a character class to “negate” it. negating a character class basically means “not this”. Lets take a look..
Eg: ^(.*)[^a-z5-9]*$
This basically means “the beginning of the line followed by zero or more of any character followed by zero or more of any digit or character that is NOT lowercase or 5-9.”.