PCRE 函数
在线手册:中文  英文

preg_replace

(PHP 4, PHP 5)

preg_replace执行一个正则表达式的搜索和替换

说明

mixed preg_replace ( mixed $pattern , mixed $replacement , mixed $subject [, int $limit = -1 [, int &$count ]] )

搜索subject中匹配pattern的部分, 以replacement进行替换。

参数

pattern

要搜索的模式。可以使一个字符串或字符串数组。

可以使用一些PCRE修饰符, 包括'e'(PREG_REPLACE_EVAL),可以为这个函数指定。

replacement

用于替换的字符串或字符串数组。如果这个参数是一个字符串,并且pattern 是一个数组,那么所有的模式都使用这个字符串进行替换。如果patternreplacement 都是数组,每个pattern使用replacement中对应的 元素进行替换。如果replacement中的元素比pattern中的少, 多出来的pattern使用空字符串进行替换。

replacement中可以包含后向引用\\n 或(php 4.0.4以上可用)$n,语法上首选后者。 每个 这样的引用将被匹配到的第n个捕获子组捕获到的文本替换。 n 可以是0-99,\\0$0代表完整的模式匹配文本。 捕获子组的序号计数方式为:代表捕获子组的左括号从左到右, 从1开始数。如果要在replacement 中使用反斜线,必须使用4个("\\\\",译注:因为这首先是php的字符串,经过转义后,是两个,再经过 正则表达式引擎后才被认为是一个原文反斜线)。

当在替换模式下工作并且后向引用后面紧跟着需要是另外一个数字(比如:在一个匹配模式后紧接着增加一个原文数字), 不能使用\\1这样的语法来描述后向引用。比如, \\11将会使 preg_replace() 不能理解你希望的是一个\\1后向引用紧跟一个原文1,还是 一个\\11后向引用后面不跟任何东西。 这种情况下解决方案是使用\${1}1。 这创建了一个独立的$1后向引用, 一个独立的原文1

当使用e修饰符时, 这个函数会转义一些字符(即:'"\ 和 NULL) 然后进行后向引用替换。当这些完成后请确保后向引用解析完后没有单引号或 双引号引起的语法错误(比如: 'strlen(\'$1\')+strlen("$2")')。确保符合PHP的 字符串语法,并且符合eval语法。因为在完成替换后, 引擎会将结果字符串作为php代码使用eval方式进行评估并将返回值作为最终参与替换的字符串。

subject

要进行搜索和替换的字符串或字符串数组。

如果subject是一个数组,搜索和替换回在subject 的每一个元素上进行, 并且返回值也会是一个数组。

limit

每个模式在每个subject上进行替换的最大次数。默认是 -1(无限)。

count

如果指定,将会被填充为完成的替换次数。

返回值

如果subject是一个数组, preg_replace()返回一个数组, 其他情况下返回一个字符串。

如果匹配被查找到,替换后的subject被返回,其他情况下 返回没有改变的 subject。如果发生错误,返回 NULL

更新日志

版本 说明
5.1.0 增加参数count.
4.0.4 增加replacement参数中的'$n'用法。
4.0.2 增加了参数limit

范例

Example #1 使用后向引用紧跟数值原文

<?php
$string 
'April 15, 2003';
$pattern '/(\w+) (\d+), (\d+)/i';
$replacement '${1}1,$3';
echo 
preg_replace($pattern$replacement$string);
?>

以上例程会输出:

April1,2003

Example #2 preg_replace()中使用基于索引的数组

<?php
$string 
'The quick brown fox jumped over the lazy dog.';
$patterns = array();
$patterns[0] = '/quick/';
$patterns[1] = '/brown/';
$patterns[2] = '/fox/';
$replacements = array();
$replacements[2] = 'bear';
$replacements[1] = 'black';
$replacements[0] = 'slow';
echo 
preg_replace($patterns$replacements$string);
?>

以上例程会输出:

The bear black slow jumped over the lazy dog.

对模式和替换内容按key进行排序我们可以得到期望的结果。

<?php
ksort
($patterns);
ksort($replacements);
echo 
preg_replace($patterns$replacements$string);
?>

以上例程会输出:

The slow black bear jumped over the lazy dog.

Example #3 替换一些值

<?php
$patterns 
= array ('/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/',
                   
'/^\s*{(\w+)}\s*=/');
$replace = array ('\3/\4/\1\2''$\1 =');
echo 
preg_replace($patterns$replace'{startDate} = 1999-5-27');
?>

以上例程会输出:

$startDate = 5/27/1999

Example #4 使用修饰符'e'

<?php
preg_replace
("/(<\/?)(\w+)([^>]*>)/e"
             
"'\\1'.strtoupper('\\2').'\\3'"
             
$html_body);
?>

这可以捕获输入文本中所有的html标签

Example #5 剥离空白字符

这个例子剥离多余的空白字符

<?php
$str 
'foo   o';
$str preg_replace('/\s\s+/'' '$str);
// 将会改变为'foo o'
echo $str;
?>

Example #6 使用参数count

<?php
$count 
0;

echo 
preg_replace(array('/\d/''/\s/'), '*''xp 4 to', -$count);
echo 
$count//3
?>

以上例程会输出:

xp***to
3

注释

Note:

当使用数组形式的patternreplacement时, 将会按照key在数组中出现的顺序进行处理. 这不一定和数组的索引顺序一致. 如果你期望使用索引对等方式用replacementpattern 进行替换, 你可以在调用 preg_replace()之前对两个数组各进行一次 ksort()排序.

参见


PCRE 函数
在线手册:中文  英文

用户评论:

spamthishard at wtriple dot com (2013-06-12 08:02:19)

If you want to replace only the n-th occurrence of $pattern, you can use this function:

<?php

function preg_replace_nth($pattern$replacement$subject$nth=1) {
    return 
preg_replace_callback($pattern,
        function(
$found) use (&$pattern, &$replacement, &$nth) {
                
$nth--;
                if (
$nth==0) return preg_replace($pattern$replacementreset($found) );
                return 
reset($found);
        }, 
$subject,$nth  );
}

echo 
preg_replace_nth("/(\w+)\|/"'${1} is the 4th|'"|aa|b|cc|dd|e|ff|gg|kkk|"4);

?>

this outputs |aa|b|cc|dd is the 4th|e|ff|gg|kkk| 
backreferences are accepted in $replacement

jas at rephunter dot net (2013-06-06 17:00:45)

It is recommended that str_replace should be used instead of preg_replace when regex is not needed. In particular, at http://php.net/manual/en/function.str-replace.php it says "If you don't need fancy replacing rules (like regular expressions), you should always use this function instead of preg_replace()."

While this is usually true, I have found a significant exception to this guideline and benchmarked to show that str_replace is in fact slower in the case when the pattern to be replaced is an array. In particular, when the number of elements in the pattern gets to be around 7, then the preg_replace is faster. Of course this would depend on the specifics of the regex.

In my case I had pattern arrays with about 40 elements. This becomes far slower with str_replace when there is a simple regex that will do the job.

Below is the benchmark with some examples of arrays for patterns and an equivalent regex.

<?php

/**
 * Title:    Test Harness
 * Author:    **********
 * Date:    27-May-13
 * Project:    *********
 * Purpose:    Test timer comparing str_replace and preg_replace
 *
 * Results:
 *
 * 1. preg_replace is a lot faster!
 * empty loop takes 0.12380504608154 microseconds.
 * Array size: 34
 *  *** str_replace:  6.0163598060608 microseconds.
 *  *** preg_replace: 2.1114869117737 microseconds.
 *
 * 2. str_replace is faster:
 * empty loop takes 0.11837291717529 microseconds.
 * Array size: 3
 *  *** str_replace:  1.7525472640991 microseconds.
 *  *** preg_replace: 2.0717389583588 microseconds.
 *
 * 3. preg_replace is faster:
 * empty loop takes 0.11982989311218 microseconds.
 * Array size: 10
 *  *** str_replace:  2.6692891120911 microseconds.
 *  *** preg_replace: 2.2716360092163 microseconds.
 *
 * 3. about the same: 6 element array is breakeven point to switch to preg_replace
 * empty loop takes 0.12036299705505 microseconds.
 * Array size: 6
 *  *** str_replace:  2.0874700546265 microseconds.
 *  *** preg_replace: 2.1840009689331 microseconds.
 *
 */

$iterations 1000000;
$str1 'this is a - test';

// empty loop
$begin microtime(true);
for (
$i 0$i $iterations$i++)
{
}
$end microtime(true);
$empty_loop_time $end $begin;
echo 
"empty loop takes $empty_loop_time microseconds.\n";

// test1 loop
// alternate array variations
//$aStr = array('\r\n','\r','\n','%','.','(',')',':',';','&',"\x07","\x15",'!','\'','"',
'#','^','*','_','=','+','\\','?','|','<','>','{','}','’','`','~','“','?','$');
//$aStr = array('\r\n','\r','\n');
//$aStr = array('0','1','2','3','4','5','6','7','8','9');
$aStr = array('0','1','2','3','4','5','6');
$cnt count($aStr);
echo 
'Array size: ' $cnt "\n";
$begin microtime(true);
for (
$i 0$i $iterations$i++)
{
    
$str2 str_replace($aStr,'',$str1);
}
$end microtime(true);
$test1_loop_time $end $begin;

echo 
' *** str_replace:  ';
$test_time = ($test1_loop_time $empty_loop_time) * 1000000 $iterations;
echo 
"$test_time microseconds.\n";

// test2 loop
$begin microtime(true);
for (
$i 0$i $iterations$i++)
{
    
$str2 preg_replace('/[^a-zA-Z\s]/','',$str1);
}
$end microtime(true);
$test2_loop_time $end $begin;

echo 
' *** preg_replace: ';
$test_time = ($test2_loop_time $empty_loop_time) * 1000000 $iterations;
echo 
"$test_time microseconds.\n";

?>

Dustin (2013-05-16 11:53:47)

Matching substrings where the match can exist at the end of the string was non-intuitive to me.
I found this because:
strtotime() interprets 'mon' as 'Monday', but Postgres uses interval types that return short names by default, e.g. interval '1 month' returns as '1 mon'.
I used something like this:
$str = "mon month monday Mon Monday Month MONTH MON";
$strMonth = preg_replace('~(mon)([^\w]|$)~i', '$1th$2', $str);
echo "$str\n$strMonth\n";
//to output:
mon month monday Mon Monday Month MONTH MON
month month monday Month Monday Month MONTH MONth

nik at rolls dot cc (2013-03-17 22:14:56)

To split Pascal/CamelCase into Title Case (for example, converting descriptive class names for use in human-readable frontends), you can use the below function:

<?php
function expandCamelCase($source) {
  return 
preg_replace('/(?<!^)([A-Z][a-z]|(?<=[a-z])[^a-z]|(?<=[A-Z])[0-9_])/'' $1'$source);
}
?>

Before:
  ExpandCamelCaseAPIDescriptorPHP5_3_4Version3_21Beta
After:
  Expand Camel Case API Descriptor PHP 5_3_4 Version 3_21 Beta

cincodenada at gmail dot dot dot com (2012-10-30 22:43:42)

There seems to be some confusion over how greediness works.  For those familiar with Regular Expressions in other languages, particularly Perl: it works like you would expect, and as documented.  Greedy by default, un-greedy if you follow a quantifier with a question mark.

There is a PHP/PCRE-specific U pattern modifier that flips the greediness, so that quantifiers are by default un-greedy, and become greedy if you follow the quantifier with a question mark: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

To make things clear, a series of examples:

<?php

$preview 
"a bunch of stuff <code>this that</code> and more stuff <code>with a second code block</code> then extra at the end"

$preview_default preg_replace('/<code>(.*)<\/code>/is'"<code class=\"prettyprint\">$1</code>"$preview);
$preview_manually_ungreedy preg_replace('/<code>(.*?)<\/code>/is'"<code class=\"prettyprint\">$1</code>"$preview);

$preview_U_default preg_replace('/<code>(.*)<\/code>/isU'"<code class=\"prettyprint\">$1</code>"$preview);
$preview_U_manually_greedy preg_replace('/<code>(.*?)<\/code>/isU'"<code class=\"prettyprint\">$1</code>"$preview);

echo 
"Default, no ?: $preview_default\n";
echo 
"Default, with ?: $preview_manually_ungreedy\n";
echo 
"U flag, no ?: $preview_U_default\n";
echo 
"U flag, with ?: $preview_U_manually_greedy\n";

?>

Results in this:

Default, no ?: a bunch of stuff <code class="prettyprint">this that</code> and more stuff <code>with a second code block</code> then extra at the end
Default, with ?: a bunch of stuff <code class="prettyprint">this that</code> and more stuff <code class="prettyprint">with a second code block</code> then extra at the end
U flag, no ?: a bunch of stuff <code class="prettyprint">this that</code> and more stuff <code class="prettyprint">with a second code block</code> then extra at the end
U flag, with ?: a bunch of stuff <code class="prettyprint">this that</code> and more stuff <code>with a second code block</code> then extra at the end

As expected: greedy by default, ? inverts it to ungreedy.  With the U flag, un-greedy by default, ? makes it greedy.

henke at henke37 dot cjb dot net (2012-09-17 01:36:13)

Warning: not all strings are from a regular language, some are from context free grammars.
Here is a pair of strings that are impossible to correctly parse with regular expressions:
a*(b+(c*d)-e)/(f-(g*h)+i)-j
a[i]b[i]c[/i]d[/i]e[i]f[i]g[/i]h[/i]

A (2012-08-01 23:24:14)

preg seems not greedy by default
preg_replace('/&lt;code&gt;(.*?)&lt;\/code&gt;/is',
"<code class='prettyprint'>$1</code>",
$preview);
if you have:
"a bunch of stuff <code>this that</code> and more stuff <code>with a second code block</code> then extra at the end"
and do a htmlentities() on the quoted string,
then the above will will replace every instance of the stuff between the CODE tags with the modified code tag

zzapper (2012-07-10 17:07:32)

yes you can use different pattern delimiters useful when working on an Url

<?php
$logo
=preg_replace('#\\.\\./images#','/images',$logo);
?>

webmaster at antoinebouchard dot net (2012-06-22 14:37:18)

It may seem useless, but the font tag in Internet Explorer won't recognize compressed hexa values. This is a simple function to uncompress hexa values in the font tag

<?php
function colorfix($text) {
    return 
preg_replace('/"#([a-f0-9])([a-f0-9])([a-f0-9])"/i''"#$1$1$2$2$3$3"'$text);
}
?>

gjarrige at six-axe dot fr (2012-03-01 09:55:21)

to drop some "dangerous" characters that cause a crash in DB2 databases, you can use that technique :
preg_replace('/[^\x09\x0A\x0D\x20-\x7F\xC0-\xFF]/', '', $str);
I found this very effective technique in a very good page from Sitepoint (complete the sitepoint.com url with the string "character-encodings-and-input" to find the page).

Ray dot Paseur at SometimesUsesGmail dot com (2012-02-13 15:48:18)

Please see Example #4 Strip whitespace. This works as designed, but if you are using Windows, it may not work as expected. The potential "gotcha" is the CR/LF line endings. On a Unix system, where there is only a single character line ending, that regex pattern will preserve line endings. On Windows, it may strip line endings.

hvishnu999 at gmail dot com (2012-01-08 06:25:05)

To covert a string to SEO friendly, do this:

<?php
$realname 
"This is the string to be made SEO friendly!"

$seoname preg_replace('/\%/',' percentage',$realname);
$seoname preg_replace('/\@/',' at ',$seoname);
$seoname preg_replace('/\&/',' and ',$seoname);
$seoname preg_replace('/\s[\s]+/','-',$seoname);    // Strip off multiple spaces
$seoname preg_replace('/[\s\W]+/','-',$seoname);    // Strip off spaces and non-alpha-numeric
$seoname preg_replace('/^[\-]+/','',$seoname); // Strip off the starting hyphens
$seoname preg_replace('/[\-]+$/','',$seoname); // // Strip off the ending hyphens
$seoname strtolower($seoname);

echo 
$seoname;
?>

This will print: this-is-the-string-to-be-made-seo-friendly

support ~ at ~ mbnad ~ dot ~ com (2012-01-02 17:47:16)

Wrong number of letters in words and how solve this problem and also remove extra spaces in a row:

<?php
$message_body 
'HHHHHEEEEELLLLLOOOOO             IAM COOOOOOOOOOOOOOOOOOOL      !!!!!!!!!!';
echo 
"\n".$message_body."\n";
$message_body preg_replace('~(.?)\1{3,}~''$1'$message_body);
$message_body preg_replace('~\s+~'' 'trim($message_body));
echo 
"\n".$message_body."\n\n";
?>

denis_truffaut a t hotmail d o t com (2011-12-23 17:11:06)

If you want to catch characters, as well european, russian, chinese, japanese, korean of whatever, just :
- use mb_internal_encoding('UTF-8');
- use preg_replace('`...`u', '...', $string) with the u (unicode) modifier
For further information, the complete list of preg_* modifiers could be found at :
http://php.net/manual/en/reference.pcre.pattern.modifiers.php

akarmenia at gmail dot com (2011-10-22 23:47:14)

[Editor's note: in this case it would be wise to rely on the preg_quote() function instead which was added for this specific purpose]

If your replacement string has a dollar sign or a backslash. it may turn into a backreference accidentally! This will fix it.

I want to replace 'text' with '$12345' but this becomes a backreference to $12 (which doesn't exist) and then it prints the remaining '34'. The function down below will return a string that escapes the backreferences.

OUTPUT:
string(8) "some 345"
string(11) "some \12345"
string(8) "some 345"
string(11) "some $12345"

<?php

$a 
'some text';

// Either of these will backreference and fail
$b1 '\12345'// Should be '\\12345' to avoid backreference
$b2 '$12345'// Should be '\$12345' to avoid backreference

$d = array($b1$b2);

foreach (
$d as $b) {
    
$result1 preg_replace('#(text)#'$b$a); // Fails
    
var_dump($result1);
    
$result2 preg_replace('#(text)#'preg_escape_back($b), $a); // Succeeds
    
var_dump($result2);
}

// Escape backreferences from string for use with regex
function preg_escape_back($string) {
    
// Replace $ with \$ and \ with \\
    
$string preg_replace('#(?<!\\\\)(\\$|\\\\)#''\\\\$1'$string);
    return 
$string;
}

?>

randall dot reynolds at rightnow dot com (2011-09-27 17:01:27)

The instructions say, use \\\\ (four backslashes) to represent a backslash. You can shorthand this and use \\\ (three backslashes) to represent a backslash, because of the way the parsers read it.
It appears the double escaping is required because the string is parsed twice, once by PHP and once by the regular expression generator. Thus it is expected that the PHP parser turns both instances above into \\ (two backslashes) for the PCRE parser.

erik dot stetina at gmail dot com (2011-09-27 03:25:55)

simple function to remove comments from string

<?php
function remove_comments( & $string )
{
  
$string preg_replace("%(#|;|(//)).*%","",$string);
  
$string preg_replace("%/\*(?:(?!\*/).)*\*/%s","",$string); // google for negative lookahead
  
return $string;
}
?>

USAGE:
<?php
$config 
file_get_contents("config.cfg");
print 
"before:".$config;
remove_comments($config);
print 
"after:".$config;
?>

OUTPUT:
before:
/*
 *  this is config file
 */
; logdir
LOGDIR ./log/
// logfile
LOGFILE main.log
# loglevel
LOGLEVEL 3
after:

LOGDIR ./log/

LOGFILE main.log

LOGLEVEL 3

kosalasl at gmail dot com (2011-08-22 09:03:20)

I wrote some useful function to display date format based on date function particular string. preg_replace function really help me to write this tiny code

<?php
function mysql2formatDate($strn,$outformat='n/j/Y'){
    
    return 
preg_replace("/(\d{4})-(\d{2})-(\d{2})/e","Date('$outformat',strtotime('$0'))",$strn);
}
?>

timitheenchanter (2011-07-28 13:34:54)

If you have issues where preg_replace returns an empty string, please take a look at these two ini parameters:
pcre.backtrack_limit
pcre.recursion_limit
The default is set to 100K. If your buffer is larger than this, look to increase these two values.

sreekanth at outsource-online dot net (2011-07-19 01:29:27)

if your intention to code and decode mod_rewrite urls and handle it with php and mysql ,this should work
to convert to url
$url = preg_replace('/[^A-Za-z0-9_-]+/', '-', $string);
And to check in mysql with the url value,use the same expression discounting '-'.
first replace the url value with php using preg_replace and use with mysql REGEXP
$sql = "select * from table where fieldname_to_check REGEXP '".preg_replace("/-+/",'[^A-Za-z0-9_]+',$url)."'"

sjoerd.linders (2011-07-04 03:36:56)

If u want to clean a string to pure ascii[0-126] use the following regulare expression:

<?php
$AsciiData 
preg_replace'/[\x7f-\xff]/'''$DurtyData );
?>

sjoerd.linders (2011-07-04 03:30:42)

If u want to clean a string to pure ascii[0-126] use the following regulare expression:

<?php $AsciiData preg_replace'/[\x7f-\xff]/'''$DurtyData ); ?>

anyvie at devlibre dot fr (2011-06-17 02:11:02)

A variable can handle a huge quantity of data but preg_replace can't.

Example :
<?php
$url 
"ANY URL WITH LOTS OF DATA";

// We get all the data into $data
$data file_get_contents($url);

// We just want to keep the content of <head>
$head preg_replace("#(.*)<head>(.*?)</head>(.*)#is"'$2'$data);
?>

$head can have the desired content, or be empty, depends on the length of $data.

For this application, just add :
$data = substr($data, 0, 4096);
before using preg_replace, and it will work fine.

someuser at dot dot com (2011-06-08 16:10:35)

Replacement of line numbers, with replacement limit per line.

Solution that worked for me.
I have a file with tasks listed each starting from number, and only starting number should be removed because forth going text has piles of numbers to be omitted.

56 Patient A of 46 years suffering ... ...
57 Newborn of 26 weeks was ...
58 Jane, having age 18 years recollects onsets of ...
...
587 Patient of 70 years ...

etc.

<?php
// Array obtained from file    
$array file($filetrue);

// Decompile array with foreach loop
foreach($array as $value)
{
    
//    Take away numbers 100-999
    //    Starting from biggest 
    //
    //    %            Delimiter
    //    ^            Make match from beginning of line
    //    [0-9]        Range of numbers
    //    {3}        Multiplication of digit range (For tree digit numbers)
    //
    
if(preg_match('%^[0-9]{3}%'$value)) 
    {
        
// Re-assing to value its modified copy
        
$value preg_replace('%^[0-9]{3}%''-HERE WAS XXX NUMBER-'$value1);
    }
                
    
// Take away numbers 10-99
    
elseif(preg_match('%^[0-9]{2}%'$value)) {
        
$value preg_replace('%^[0-9]{2}%''-HERE WAS XX NUMBER-'$value1);
    }
                
    
// Take away numbers 0-9
    
elseif(preg_match('%^[0-9]%'$value)) {
        
$value preg_replace('%^[0-9]%''-HERE WAS X NUMBER-'$value1);
    }
                
    
// Build array back
    
$arr[] = array($value);
    
    }
}
?>

craiga at craiga dot id dot au (2011-05-15 19:02:41)

If there's a chance your replacement text contains any strings such as "$0.95", you'll need to escape those $n backreferences:

<?php
function escape_backreference($x)
{
    return 
preg_replace('/\$(\d)/''\\\$$1'$x);
}
?>

Sam (2011-05-11 07:51:24)

To remove extra spaces, I prefer
<?php
$str 
preg_replace("'\s+'"' '$str);
?>
as this will convert ALL whitespace to a single space, including lone \t and \n

php at example dot com (2011-04-21 09:07:56)

Note that strange behavior of preg_replace when using backslashes in your $replacement, to avoid SQL-Injection:

<?php
// string '\\\"' (length=4) - correct result
str_replace("foo"mysql_real_escape_string('\"'), 'foo');

// string '\\"' (length=3) - one slash missing
preg_replace("~foo~"mysql_real_escape_string('\"'), 'foo');

// string '\\\\"' (length=5) - escaped backslashes, now have too much
preg_replace("~foo~"addslashes(mysql_real_escape_string('\"')), 'foo');

// string '\.\.\"' (length=6) - '\\.\\.\"' has the same backslash count as mysql_real_escape_string('\"') but after replacement, they are different
preg_replace("~foo~"'\\.\\.\"''foo');
?>

me at perochak dot com (2011-02-22 21:57:34)

If you would like to remove a tag along with the text inside it then use the following code.

<?php
preg_replace
('/(<tag>.+?)+(<\/tag>)/i'''$string);
?>

example
<?php $string='<span class="normalprice">55 PKR</span>'?>

<?php
$string 
preg_replace('/(<span class="normalprice">.+?)+(<\/span>)/i'''$string);
?>

This will results a null or empty string.

<?php
$string
='My String <span class="normalprice">55 PKR</span>';

$string preg_replace('/(<span class="normalprice">.+?)+(<\/span>)/i'''$string);
?>

This will results a " My String"

nospam at probackup dot nl (2011-01-19 05:03:35)

Warning: a common made mistake in trying to remove all characters except numbers and letters from a string, is to use code with a regex similar to preg_replace('[^A-Za-z0-9_]', '', ...). The output goes in an unexpected direction in case your input contains two double quotes.
echo preg_replace('[^A-Za-z0-9_]', '', 'D"usseldorfer H"auptstrasse')
D"usseldorfer H"auptstrasse
It is important to not forget a leading an trailing forward slash in the regex:
echo preg_replace('/[^A-Za-z0-9_]/', '', 'D"usseldorfer H"auptstrasse')
Dusseldorfer Hauptstrasse
PS An alternative is to use preg_replace('/\W/', '', $t) for keeping all alpha numeric characters including underscores.

highlighter (2011-01-12 08:37:38)

Accent independent highlighting of text, UTF-8 safe:
<?php
function prepare_search_term($str,$delim='#') {
    
$search preg_quote($str,$delim);
    
    
$search preg_replace('/[aàá?????]/iu''[aàá?????]'$search);
    
$search preg_replace('/[eèéê?]/iu''[eèéê?]'$search);
    
$search preg_replace('/[iìí??]/iu''[iìí??]'$search);
    
$search preg_replace('/[oòó????]/iu''[oòó????]'$search);
    
$search preg_replace('/[uùú?ü]/iu''[uùú?ü]'$search);
    
// add more characters...
    
    
return $search;
}

function 
highlight($searchtext$text) {
    
$search prepare_search_term($searchtext);
    return 
preg_replace('#' $search '#iu''<span style="background-color:red">$0</span>'$text);
}

$testtext 'cafe cáfé càfè CAFE C?F? C?F? càfé cáfè C?F? C?F?';
echo 
highlight('cafe',$testtext) . '<br />';
echo 
highlight('cáfé',$testtext) . '<br />';
echo 
highlight('CAFE',$testtext) . '<br />';
echo 
highlight('C?F?',$testtext) . '<br />';
?>

This function finds and highlights all words in $testtext with (e.g.) every single word in $testtext as input. Useful if you retrieved a text from a MySQL database and want to highlight the search term accent-independently.
The main trick is in "prepare_search_term": Every single character of the search pattern (1st parameter of preg_replace) in the search term is replaced by the whole string in the 2nd parameter, which then in function "highlight" acts as the search pattern for replacement there. Hope you find this useful.

easai (2010-12-21 10:40:45)

To extract hiragana, katakana, kanji portion of a Japanese text, explicity set the unicode range AND specify u modifier for the pattern.

<?php

echo "Hiragana -- ";
$pattern ='/[^\x{3040}-\x{309F}]+/u';
$s=preg_replace($pattern,"",$str);
echo 
$s."<br />";

echo 
"Katakana -- ";
$pattern ='/[^\x{30A0}-\x{30FF}]+/u';
$s=preg_replace($pattern,"",$str);
echo 
$s."<br />";

echo 
"Kanji -- ";
$pattern ='/[^\x{4E00}-\x{9FBF}]+/u';
$s=preg_replace($pattern,"",$str);
echo 
$s."<br />";

?>

Meisam Mulla (2010-12-09 21:38:38)

I needed a function that validated phone numbers provided by a user and if necessary arrange the numbers in this format 555-555-5555. I couldn't find anything so I just wrote one

<?php
function checkPhone($number)
{
    if(
preg_match('^[0-9]{3}+-[0-9]{3}+-[0-9]{4}^'$number)){
        return 
$number;
    } else {
        
$items = Array('/\ /''/\+/''/\-/''/\./''/\,/''/\(/''/\)/''/[a-zA-Z]/');
        
$clean preg_replace($items''$number);
        return 
substr($clean03).'-'.substr($clean33).'-'.substr($clean64);
    }
}
?>

ude dot mpco at wotsrabt dot maps-on (2010-12-08 10:30:24)

I find it useful to output HTML form names to the user from time to time while going through the $_GET or $_POST on a user's submission and output keys of the GET or POST array... the only problem being in the name attribute I follow common programming guidelines and have names like the following: eventDate, eventTime, userEmail, etc. Not great to just output to the user-- so I came up with this function. It just adds a space before any uppercase letter in the string.

<?php
function caseSwitchToSpaces$stringVariableName )
{

$pattern '/([A-Z])/';
$replacement ' ${1}';

return 
preg_replace$pattern$replacement$stringVariableName );
}

//ex. 
echo( caseSwitchToSpaces"helloWorld" ) );
?>

would output:

"hello World"

You could also do title-style casing to it if desired so the first word isn't lowercase.

contact at tothepointsolution dot com (2010-12-05 10:51:39)

Just something to have in mind...

When using preg_replace, have in mind that variables are calculated. There is no warning issued if the variable is not defined, even when error reporting is set to  E_ALL ^ E_STRICT

As you can see from the below example:

<?php
$text 
'The price is PRICE ';

$lookFor 'PRICE';
$replacement '$100';

echo 
$replacement.'<br />';
//will display
//$100

echo preg_replace('/'.$lookFor.'/'$replacement$text).'<br />';
//Will display
//The price is 0  

echo str_replace($lookFor$replacement$text).'<br />';
//Will display
//The price is $100
?>

Terminux (dot) anonymous at gmail (2010-12-03 21:58:32)

This function will strip all the HTML-like content in a string.
I know you can find a lot of similar content on the web, but this one is simple, fast and robust. Don't simply use the built-in functions like strip_tags(), they dont work so good.

Careful however, this is not a correct validation of a string ; you should use additional functions like mysql_real_escape_string and filter_var, as well as custom tests before putting a submission into your database.

<?php 

$html 
= <<<END
<div id="function.preg-split" class="refentry"> Bonjour1 \t
<div class="refnamediv"> Bonjour2 \t
<h1 class="refname">Bonjour3 \t</h1>
<h1 class=""">Bonjour4 \t</h1>
<h1 class="*%1">Bonjour5 \t</h1>
<body>Bonjour6 \t<//body>>
</ body>Bonjour7 \t<////        body>>
<
a href="image.php" alt="trans" /        >
some leftover text...
     < DIV class=noCompliant style = "text-align:left;" >
... and some other ...
< dIv > < empty>  </ empty>
  <p> This is yet another text <br  >
     that wasn't <b>compliant</b> too... <br   />
     </p>
 <div class="noClass" > this one is better but we don't care anyway </div ><P>
    <input   type= "text"  name ='my "name' value  = "nothin really." readonly>
end of paragraph </p> </Div>   </div>   some trailing text 
END;

// This echoes correctly all the text that is not inside HTML tags
$html_reg '/<+\s*\/*\s*([A-Z][A-Z0-9]*)\b[^>]*\/*\s*>+/i';
echo 
htmlentitiespreg_replace$html_reg''$html ) );

// This extracts only a small portion of the text
echo htmlentities(strip_tags($html));

?>

arthur at kuhrmeier dot name (2010-09-30 14:38:53)

I needed a function to convert short color notations (e.g. #3a0) to the long version (i.e. #33aa00). As I didn't find one, I immersed into unfamiliar territory: PCRE. But after some trials and errors, I managed to write a function that doubles all hex characters and drops any other.

<?php
function colorlong ($color) {
    return 
preg_replace ("/(?(?=[^0-9a-f])[^.]|(.))/i"'$1$1'$color);
}

//--- some examples ---
echo colorlong ('d08');  //--- dd0088 ---
echo colorlong ('bad beer');  //--- bbaaddbbeeee ---
echo colorlong ('nothing');  //--- yep, nothing ;-) ---
echo colorlong ('E6f');  //--- EE66ff: case is not changed ---

?>

rysiek at fwioo dot pl (2010-09-21 17:03:58)

doesn't seem like preg_replace() supports named subpatterns - i.e. '(?P<name>\w+)' - which would be a godsent in a multitude of situations...

AmigoJack (2010-07-10 06:13:54)

If you're using preg_replace() on huge strings you have to be aware of PREG's limitations. In fact, after each preg_xxx() function you should check if PREG internally failed (and by "failure" I don't mean regexp syntax errors).

On default PHP installations you will run into problems when using preg_xxx() functions on strings with a length of more than 100'000 characters. To workaround rare occasions you can use this:

<?php
    $iSet
0;  // Count how many times we increase the limit
    
while( $iSet10 ) {  // If the default limit is 100'000 characters the highest new limit will be 250'000 characters
        
$sNewTextpreg_replace$sRegExpPattern$sRegExpReplacement$sVeryLongText );  // Try to use PREG

        
if( preg_last_error()== PREG_BACKTRACK_LIMIT_ERROR ) {  // Only check on backtrack limit failure
            
ini_set'pcre.backtrack_limit', (int)ini_get'pcre.backtrack_limit' )+ 15000 );  // Get current limit and increase
            
$iSet++;  // Do not overkill the server
        
} else {  // No fail
            
$sVeryLongText$sNewText;  // On failure $sNewText would be NULL
            
break;  // Exit loop
        
}
    }
?>

However, be careful: 1.) ini_set() may be forbidden on your server; 2.) preg_last_error() doest not exist prior to PHP 5.2.0; 3.) setting a backtrack limit too high may crash PHP (not only the script currently executed). So if you work a lot with long strings you definitly have to look out for a real solution!

See also natedubya's comment (02-Oct-2009 04:08).

matt at mattjanssen dot com (2010-06-24 09:56:28)

To mask credit cards using a single regexp expression (keeping the first and last four digits):
preg_replace('/(?!^.?)[0-9](?!(.){0,3}$)/', '*', '3456-7890-1234-5678')
Keep the FIRST CHARACTER, the LAST FOUR CHARACTER, and any NON-NUMERIC CHARACTERS in-between. Mask (*) everything else.
"3456-7890-1234-5678" = "3***-****-****-5678"
"4567890123456789" = "4***********6789"
"4928-abcd9012-3456" = "4***-abcd****-3456"
"498291842" = "4****1842"
If the regular expression is a bit confusing, (?!) is a "look-ahead not-equals", meaning make sure this does NOT come before or next, but leave it alone.

bvandale at hotmail dot com (2010-06-04 09:47:45)

I needed a regular expression to remove php tags when importing legacy content. From the comments on this page I arrived at this:
<\?(php)?[\n\s\r]*.*[\n\s\r]*\?>
Worked like a beaut.

mbernard dot webdeveloper at gmail dot com (2010-04-06 08:30:28)

For those who simply want to convert a date from US format (YYYY/MM/DD) to French format (DD/MM/YYYY), here is a useful function :

<?php
function date2fr($date) {
    return 
preg_replace(
        
"/([0-9]{4})\/([0-9]{2})\/([0-9]{2})/i",
        
"$3/$2/$1",
        
$date
    
);
}

function 
date2us($date) {
    return 
preg_replace(
        
"/([0-9]{2})\/([0-9]{2})\/([0-9]{4})/i",
        
"$3/$2/$1",
        
$date
    
);
}
?>

Anonymous (2010-03-21 08:35:20)

Another easy way to extract the width and the height of getimagesize();

<?php

$image 
getimagesize($params['tmp_name']);
$pattern '/^width=\"(?P<width>\d+)\" height=\"(?P<height>\d+)\"$/';

preg_match($pattern$image[3], $matches);

?>

This will return the needed values to the $matches array.
(e.g. $matches['width'] and $matches['height'];)

Hope this helps :)

NOTE: I had to change \s with a space in the regular expression because I kept getting the following error:
"Your note contains a bit of text that will result in a line that is too long, even after using wordwrap()."

tom dot h dot anderson at gmail dot com (2010-03-18 14:50:08)

A utf-8 safe way to remove accents:

<?php
/**
 * UTF-8 Normalization of ASCII extended characters
 */
#require_once 'utf8.inc'; # http://hsivonen.iki.fi/php-utf8/
function normalize($string) {
    
$ext = array(192193194195196197224225226227228229199231200201202203232233234235204205206207236237238239210211212213214216242243244245246248209241217218219220249250251252221255253);

    
$norm = array(65656565656597979797979767996969696910110110110173737373105105105105797979797979111111111111111111781108585858511711711711789121121);

    
$string db_UTF8::utf8tounicode($string);
    
// Using array insersect is slower
    
foreach ($ext as $k => $e) {
        if (
$pos array_search($e$string)) {
            
$string[$pos] = $norm[$k];
        }
    }
    
$string db_UTF8::unicodetoutf8($string);
    return 
$string;
}

sergei dot garrison at gmail dot com (2010-03-09 13:40:28)

If you want to add simple rich text functionality to HTML input fields, preg_replace can be quite handy.

For example, if you want users to be able to bold text by typing *text* or italicize it by typing _text_, you can use the following function.

<?php
function encode(&$text) {
    
$text preg_replace('/\*([^\*]+)\*/''<b>\1</b>'$text);
    
$text preg_replace('/_([^_]+)_/''<i>\1</i>'$text);
    return 
$text;
    }
?>

This works for nested tags, too, although it will not fix nesting mistakes.

To make this function more efficient, you could put the delimiters (* and _, in this case) and their HTML tag equivalents in an array and loop through them.

hello at weblap dot ro (2010-03-06 04:10:40)

Post slug generator, for creating clean urls from titles.
It works with many languages.

<?php
function remove_accent($str)
{
  
$a = array('?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''à''á''?''?''?''?''?''?''è''é''ê''?''ì''í''?''?''?''ò''ó''?''?''?''?''ù''ú''?''ü''?''?''?''ā''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''ē''?''?''?''?''?''?''?''ě''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''ī''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''ń''?''?''?''ň''?''?''ō''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''ū''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''?''ǎ''?''ǐ''?''ǒ''?''ǔ''?''ǖ''?''ǘ''?''ǚ''?''ǜ''?''?''?''?''?''?');
  
$b = array('A''A''A''A''A''A''AE''C''E''E''E''E''I''I''I''I''D''N''O''O''O''O''O''O''U''U''U''U''Y''s''a''a''a''a''a''a''ae''c''e''e''e''e''i''i''i''i''n''o''o''o''o''o''o''u''u''u''u''y''y''A''a''A''a''A''a''C''c''C''c''C''c''C''c''D''d''D''d''E''e''E''e''E''e''E''e''E''e''G''g''G''g''G''g''G''g''H''h''H''h''I''i''I''i''I''i''I''i''I''i''IJ''ij''J''j''K''k''L''l''L''l''L''l''L''l''l''l''N''n''N''n''N''n''n''O''o''O''o''O''o''OE''oe''R''r''R''r''R''r''S''s''S''s''S''s''S''s''T''t''T''t''T''t''U''u''U''u''U''u''U''u''U''u''U''u''W''w''Y''y''Y''Z''z''Z''z''Z''z''s''f''O''o''U''u''A''a''I''i''O''o''U''u''U''u''U''u''U''u''U''u''A''a''AE''ae''O''o');
  return 
str_replace($a$b$str);
}

function 
post_slug($str)
{
  return 
strtolower(preg_replace(array('/[^a-zA-Z0-9 -]/''/[ -]+/''/^-|-$/'), 
  array(
'''-'''), remove_accent($str)));
}
?>

Example: post_slug(' -Lo#&@rem  IPSUM //dolor-/sit - amet-/-consectetur! 12 -- ')
will output: lorem-ipsum-dolor-sit-amet-consectetur-12

roscoe-p-coltrane (2010-02-27 22:43:31)

Reading arguments against variables in CSS got me to thinking. 

Process your CSS files in a fashion similar to the following. This particular routine is certainly not the most efficient way, but is what I came up with on the spur of the moment. The prefix token is entirely arbitrary - everything between the leading colon and terminating semicolon is the target. In this way, default values can be put in place, and the constant identifiers simply left as comments, should the stylesheet be used without processing; this would also inhibit your editor from emitting errors about your odd syntax. The declaration pattern at the top assumes something like this:

/*@css_const
 [
     bgc_a=#ccccee,
        fc_a=#000099,
     bgc_b=#5555cc,
     bgc_c=#eeeeff,
     bgc_d=#599fee
 ]
*/

...within the target CSS file.

Usage like so:

.Element {
     font-size:10pt;
     color:#000/*fc_a*/;
     background-color:#fff/*bgc_a*/;
}

And then...

<?php
$dec_pat 
'/^\/\*\@css_const\s+\[(.*)\]\s+\*\//Ums';
preg_match_all($dec_pat,$css,$m);
$lhs = array();
$rhs = array();
foreach(
$m[1] as &$p) {
    
$p explode(",",$p);
    foreach(
$p as &$q) {
        list(
$k,$v) = explode("=",trim($q));
        
$lhs[] = '/(\w+\:).*\/\*' $k '\*\/;$/Um';
        
$rhs[] = '\1' $v ';';
    }
}
$css preg_replace($lhs,$rhs,$css);
// spit it out or return it; whatever
?>

...resulting, of course, in:

.Element {
     font-size:10pt;
     color:#000099;
     background-color:#ccccee;
}

Again, efficiency was not the immediate goal, so please don't slay me...

jjorgenson at wilanddirect dot dot dot com (2010-02-19 15:34:46)

I like using canned SQL queries, where specific values and/or field names are separate in some array.  However the %s place-holders of 'printf()' were tedious and error prone because of sequence dependency.  By using 'tag' replacement ( a word having some delimiter(s) around it ) and an associative array, I was able to disconnect the canned query from the data in the array.  

ex:
old style:
<?php
    $sql 
"SELECT %s FROM %s WHERE %s = %s";
    
printf($sqldata["my_field"], data["table_name"], data["my_field"], data["field_value"] );
?>

new style:
<?php
     $sql 
"SELECT <my_field> FROM <table_name> FROM <my_field> = <field_value> ";
     
$outsql preg_replace("/<(\w\+)>/e","$data[\"$1\"]"$sql );
?>
     NOTE the preg_replace could be optimized to use an annonymous function instead of a "/e" eval of the parm2 expression.

Take this a step further...
Setup your associative array to contain the output from some previous SQL, and the canned SQLs to reference the fields of other queries.  You can then chain your queries where the results of one feeds another.

Hope this helps someone.

-- jj --

Metal Developer (2010-01-19 09:58:53)

This simple pattern will convert images EXIF Datetime (eg. "2009:04:26 05:09:47") to MySQL Datetime (eg. "2009-04-26 05:09:47") only with one line of code.
<?php
$mysql_datetime 
preg_replace('/(\d{2}):(\d{2}):(\d{2})(.*)/''$1-$2-$3 $4'$exif_datetime);
?>

info AT photosillusions DOT com (2009-12-21 12:03:37)

Bonjour, voici ce qui pourrait aider certain avec cette fonction.
Les / simple ne sont pas détecté.
Hi, There is a simple code I've found to help others.
The simple / are not detected.
mauvais code (code false) :
$contents = preg_replace("/##/MENUG##/","MENU GAUCHE",$contents);
Vous obtenez le message suivant :
You obtain this warning :

Warning: preg_replace() [function.preg-replace]: Unknown modifier 'M' in systeme.php on line 56
Vous remplacez les / par des -. Ce qui est passable et fonctionne bien.
You replace / by - . This work good !
Bon code (good code) :
$contents = preg_replace("-##/MENUG##-","MENU GAUCHE",$contents);
M'écrire si vous avez des questions ! :)
Email me for any questions ! :)

natedubya at no spam dot gmail dot com (2009-10-02 09:08:47)

As a pertinent note, there's an issue with this function where parsing any string longer than 94326 characters long will silently return null. So be careful where you use it at.

info at gratisrijden dot nl (2009-10-01 19:48:33)

if you are using the preg_replace with arrays, the replacements will apply as subject for the patterns later in the array. This means replaced values can be replaced again.

Example:
<?php
$text 
'We want to replace BOLD with the <boldtag> and OLDTAG with the <newtag>';

$patterns = array('/BOLD/i''/OLDTAG/i');
$replacements = array('<boldtag>''<newtag>');

echo 
preg_replace ($patterns$replacements$text);
?>

Output:
We want to replace <b<newtag>> with the <<b<newtag>>tag> and <newtag> with the <newtag>

Look what happend with BOLD.

alammar at gmail dot com (2009-07-24 14:02:54)

I was writing a web crawler and wanted to limit its range to one website. I needed to dynamically escape the urls so I wrote the following function:

<?php

//Escape a string to be used as a regular expression pattern
//Ex: escape_string_for_regex('http://www.example.com/s?q=php.net+docs')
// returns http:\/\/www\.example\.com\/s\?q=php\.net\+docs
function escape_string_for_regex($str)
{
        
//All regex special chars (according to arkani at iol dot pt below):
        // \ ^ . $ | ( ) [ ]
        // * + ? { } ,
        
        
$patterns = array('/\//''/\^/''/\./''/\$/''/\|/',
 
'/\(/''/\)/''/\[/''/\]/''/\*/''/\+/'
'/\?/''/\{/''/\}/''/\,/');
        
$replace = array('\/''\^''\.''\$''\|''\(''\)'
'\[''\]''\*''\+''\?''\{''\}''\,');
        
        return 
preg_replace($patterns,$replace$str);
}

?>

marco dot demaio at vanylla dot it (2009-07-24 03:56:24)

Maybe it's better to avoid using \w and \W character class matches because they behave differently depending on locale server settings, therefore you will end up with a regular expression match that changes its behavior depending upon server settings.

<?php
   
/*
   according to most websites the regexp sintax [\w] is identical to [A-Za-z0-9_]
   
   See:
   http://en.wikipedia.org/wiki/Regular_expression
   http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx   
   */
   
   
$result1 preg_replace('/[A-Za-z0-9_]*/'''"test àèìòù test");   
   
$result2 preg_replace('/\w*/'''"test àèìòù test");
      
   echo 
"<pre>[" $result1 "]</pre>"//ok, it shows: "[ àèìòù ]"
   
echo "<pre>[" $result2 "]</pre>"//WARNING on server other than UK/US locale, it shows: "[  ]"
   
   /*
   Why does this happen?
   preg_replace uses Perl Compatible Regular Expressions and NOT POSIX regular exp sintax.
   Therefore a minor difference is that in the former one the \w charcater class matches also
   the accented characters depending on locale settings.
   
   See: http://perldoc.perl.org/perlre.html#Regular-Expressions
   */
?>

mybodya at gmail dot com (2009-07-14 04:40:43)

This works fine for windows, linux and mac, encoding cp1251

<?php
function valid_filename($str)
    return 
preg_replace('/[^0-9a-zа-я??ё\`\~\!\@\#\$\%\^\*\(\)\; \,\.\'\/\_\-]/i'' ',$str);
}
?>

can be used html without htmlspetialchars()

spcmky at gmail dot com (2009-04-24 07:15:54)

Using this for SEO urls. I had to modify it a bit to get through the word wrap.  Pretty sure you can one line it a lot of it.

<?php
public static function encodeUrlParam $string )
{
  
$string trim($string);
    
  if ( 
ctype_digit($string) )
  {
    return 
$string;
  }
  else 
  {      
    
// replace accented chars
    
$accents '/&([A-Za-z]{1,2})(grave|acute|circ|cedil|uml|lig);/';
    
$string_encoded htmlentities($string,ENT_NOQUOTES,'UTF-8');

    
$string preg_replace($accents,'$1',$string_encoded);
      
    
// clean out the rest
    
$replace = array('([\40])','([^a-zA-Z0-9-])','(-{2,})');
    
$with = array('-','','-');
    
$string preg_replace($replace,$with,$string); 
  } 

  return 
strtolower($string);
}
?>

atifrashid at hotmail dot com (2009-04-15 23:56:21)

To make seo url from title or any text:

<?php
function CleanFileName$Raw ){
    
$Raw trim($Raw);
    
$RemoveChars  = array( "([\40])" "([^a-zA-Z0-9-])""(-{2,})" );
    
$ReplaceWith = array("-""""-"); 
    return 
preg_replace($RemoveChars$ReplaceWith$Raw);
}
    
echo 
CleanFileName('whatever $4 sd- -sdf- @    sd 8 as +% sdf ;');
?>

hubert at uhoreg dot ca (2009-04-07 17:09:10)

The issue described by arie below is not actually a bug, but expected behaviour. The PCRE Regular Expression Details says: "The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w."

arie dot benichou at gmail dot com (2009-03-09 12:50:26)

<?php
//Be carefull with utf-8, even with unicode and utf-8 support enabled, a pretty odd bug occurs depending on your operating system
$str "Hi, my name is Arié!<br />";
echo 
preg_replace('#\bArié\b#u''Gontran'$str);
//on windows system, output is "Hi, my name is Gontran<br />"
//on unix system, output is "Hi, my name is Arié<br />"
echo preg_replace('#\bArié(|\b)#u''Gontran'$str);
//on windows and unix system, output is "Hi, my name is Gontran<br />"

arkani at iol dot pt (2009-03-04 11:00:11)

Because i search a lot 4 this:
The following should be escaped if you are trying to match that character
\ ^ . $ | ( ) [ ]
* + ? { } ,
Special Character Definitions
\ Quote the next metacharacter
^ Match the beginning of the line
. Match any character (except newline)
$ Match the end of the line (or before newline at the end)
| Alternation
() Grouping
[] Character class
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
More Special Character Stuff
\t tab (HT, TAB)
\n newline (LF, NL)
\r return (CR)
\f form feed (FF)
\a alarm (bell) (BEL)
\e escape (think troff) (ESC)
\033 octal char (think of a PDP-11)
\x1B hex char
\c[ control char
\l lowercase next char (think vi)
\u uppercase next char (think vi)
\L lowercase till \E (think vi)
\U uppercase till \E (think vi)
\E end case modification (think vi)
\Q quote (disable) pattern metacharacters till \E
Even More Special Characters
\w Match a "word" character (alphanumeric plus "_")
\W Match a non-word character
\s Match a whitespace character
\S Match a non-whitespace character
\d Match a digit character
\D Match a non-digit character
\b Match a word boundary
\B Match a non-(word boundary)
\A Match only at beginning of string
\Z Match only at end of string, or before newline at the end
\z Match only at end of string
\G Match only where previous m//g left off (works only with /g)

akam AT akameng DOT com (2009-02-17 16:02:58)

<?php                    
$converted    

array(
//3 of special chars

'/(;)/ie'
'/(#)/ie'
'/(&)/ie'

//MySQL reserved words!
//Check mysql website!
'/(ACTION)/ie''/(ADD)/ie''/(ALL)/ie''/(ALTER)/ie''/(ANALYZE)/ie''/(AND)/ie''/(AS)/ie''/(ASC)/ie',

//remaining of special chars
'/(<)/ie''/(>)/ie''/(\.)/ie''/(,)/ie''/(\?)/ie''/(`)/ie''/(!)/ie''/(@)/ie''/(\$)/ie''/(%)/ie''/(\^)/ie''/(\*)/ie''/(\()/ie''/(\))/ie''/(_)/ie''/(-)/ie''/(\+)/ie'
'/(=)/ie''/(\/)/ie''/(\|)/ie''/(\\\)/ie'"/(')/ie"'/(")/ie''/(:)/'
);

$input_text preg_replace($converted"UTF_to_Unicode('\\1')"$text);

function 
UTF_to_Unicode($data){

//return $data;
}
?>
The above example useful for filtering input data, then saving into mysql database, it's not need tobe decoded again, just use UTF-8 as charset.
Please Note escaping special chars between delimiter..

Svoop (2009-02-10 05:41:50)

I have written a short introduction and a colorful cheat sheet for Perl Compatible Regular Expressions (PCRE):
http://www.bitcetera.com/en/techblog/2008/04/01/regex-in-a-nutshell/

neamar at neamar dot fr (2009-01-04 12:08:22)

I was needing regular expression with brace matching, but i was not able to find anything for this problem.
So, if i had :
\bold{something \underline{another thing} and another bold thing}
My regexp would stop at the first closing brace, and it seemed to be a common problem with regular expression, often discussed on forums.

So here is the snippet i used, perhaps it'll be useful :
<?php
    
function preg_replace_with_braces($Regexp,$Remplacement,$Texte)
    {
        
preg_match_all($Regexp,$Texte,$Resultats,PREG_SET_ORDER);
        
        
$SVGRemplacement=$Remplacement;
        foreach(
$Resultats as $Resultat)
        {
//For each result
            
$Remplacement=$SVGRemplacement;
            foreach(
$Resultat as $n=>$Match)
            {
//For each set of capturing parenthesis
                
if($n>&& strpos($Match,'{')!==false)
                {
//We find a open brace in our regexp : we'll need to find the closing one !
                    
$InitialMatch=$Match;
                    
$Offset=strpos($Texte,$Resultat[0]);
                    
$Offset=strpos($Texte,$Match,$Offset);//We move the caret to the good place : let's start !
                    
$Depart=$Offset;
                    
$Taille=strlen($Texte);
                    
$NestingLevel=0;
                    while(
$NestingLevel>=&& $Offset<$Taille)
                    {
//Browse the string, searching for braces. Perhaps the most important place !
                        
$Offset++;
                        if(
$Texte[$Offset]=='{')
                            
$NestingLevel++;
                        elseif(
$Texte[$Offset]=='}')
                            
$NestingLevel--;
                    }
                    
$Match=substr($Texte,$Depart,$Offset-$Depart);
                    
$Resultat[0]=str_replace($InitialMatch,$Match,$Resultat[0]);
                }
                
$Remplacement=str_replace('$' $n,$Match,$Remplacement);
            }
            
$Texte=str_replace($Resultat[0],$Remplacement,$Texte);
        }
        return 
$Texte;
    }
?>
Hope it'll be useful !
I know it's pretty odd and unclean, but that was a quick workaround i had.

callummann at blueyonder dot co dot uk (2008-12-31 16:31:16)

For BBcode, rather than having two different arrays you can use the same one.

<?php

$bbcode 
= array(

"/\[b\](.*?)\[\/b\]/is" => "<strong>$1</strong>",
"/\[u\](.*?)\[\/u\]/is" => "<u>$1</u>",
"/\[url\=(.*?)\](.*?)\[\/b\]/is" => "<a href='$1'>$2</a>"

);
$text "[b]Text[/b][u]Text[/u]";

$text preg_replace(array_keys($bbcode), array_values($bbcode), $text);

?>

montana [at] percepticon [dot] com (2008-12-23 12:47:20)

<?php
/*

Coding across linux, mac, and windows gets annoying dealing with errors resulting from EOL format in config files, etc. 

Standardize to CRLF format.

Instead of using one regex I just broke this up into smaller groups - 
may take longer to execute three statements rather than one, 
but it also granularizes the use case.

*/
//different formats all living together
$s "Testing 1.\r" "Testing 2.\n" "Testing 3.\r\n" "Testing 4.\n\r:END";

$s preg_replace("/(?<!\\n)\\r+(?!\\n)/"" :REPLACED: \r\n"$s); //replace just CR with CRLF
$s preg_replace("/(?<!\\r)\\n+(?!\\r)/"" :REPLACED: \r\n"$s); //replace just LF with CRLF
$s preg_replace("/(?<!\\r)\\n\\r+(?!\\n)/"" :REPLACED: \r\n"$s); //replace misordered LFCR with CRLF

echo $s;

/*

output:

Testing 1. :REPLACED: 
Testing 2. :REPLACED: 
Testing 3.
Testing 4. :REPLACED: 
:END

*/
?>

mdrisser at gmail dot com (2008-11-28 10:13:55)

An alternative to the method suggested by sheri is to remember that the regex modifier '$' only looks at the end of the STRING, the example given is a single string consisting of multiple lines.

Try:
<?php
// Following is 1 string containing 3 lines
$s "Testing, testing.\r\n"
   
"Another testing line.\r\n"
   
"Testing almost done.";

echo 
preg_replace('/\.\\r\\n/m''@\r\n'$s);
?>

This results in the string:
Testing, testing@\r\nAnother testing line@\r\nTesting almost done.

jette at nerdgirl dot dk (2008-11-19 04:47:41)

I use this to prevent users from overdoing repeated text. The following function only allows 3 identical characters at a time and also takes care of repetitions with whitespace added.

This means that 'haaaaaaleluuuujaaaaa' becomes 'haaaleluuujaaa' and 'I am c o o o o o o l' becomes 'I am c o o o l'

<?php
//Example of user input
$str "aaaaaaaaaaabbccccccccaaaaad d d d   d      d d ddde''''''''''''";

function 
stripRepeat($str) {
  
//Do not allow repeated whitespace
  
$str preg_replace("/(\s){2,}/",'$1',$str);
  
//Result: aaaaaaaaaaabbccccccccaaaaad d d d d d d ddde''''''''''''

  //Do not allow more than 3 identical characters separated by any whitespace 
  
$str preg_replace('{( ?.)\1{4,}}','$1$1$1',$str);
  
//Final result: aaabbcccaaad d d ddde'''

  
return $str;
}
?>

To prevent any repetitions of characters, you only need this:

<?php
$str 
preg_replace('{(.)\1+}','$1',$str);
//Result: abcad d d d d d d de'
?>

7r6ivyeo at mail dot com (2008-11-17 09:25:20)

String to filename:

<?php
function string_to_filename($word) {
    
$tmp preg_replace('/^\W+|\W+$/'''$word); // remove all non-alphanumeric chars at begin & end of string
    
$tmp preg_replace('/\s+/''_'$tmp); // compress internal whitespace and replace with _
    
return strtolower(preg_replace('/\W-/'''$tmp)); // remove all non-alphanumeric chars except _ and -
}
?>

Returns a usable & readable filename.

Sheri (2008-09-22 14:32:37)

The situation described below by dyer85 at gmail on 28-Aug-2008 at 08:41 can be addressed (since PCRE version 7.3) by including "(*ANYCRLF)" at the start of the pattern. Then linebreaks will be detected for line endings typical of unix, mac and PC texts.

When included in the pattern for the example cited, all expected replacements are made.

<?php
$s 
"Testing, testing.\r\n"
   
"Another testing line.\r\n"
   
"Testing almost done.";

echo 
preg_replace('/(*ANYCRLF)\.$/m''.@'$s);
?>

tal at ashkenazi dot co dot il (2008-09-14 01:46:28)

after long time of tring get rid of \n\r and <BR> stuff i've came with this... 
(i done some changes in clicklein() function...)

<?php
    
function clickable($url){
        
$url                                    =    str_replace("\\r","\r",$url);
        
$url                                    =    str_replace("\\n","\n<BR>",$url);
        
$url                                    =    str_replace("\\n\\r","\n\r",$url);

        
$in=array(
        
'`((?:https?|ftp)://\S+[[:alnum:]]/?)`si',
        
'`((?<!//)(www\.\S+[[:alnum:]]/?))`si'
        
);
        
$out=array(
        
'<a href="$1"  rel=nofollow>$1</a> ',
        
'<a href="http://$1" rel=\'nofollow\'>$1</a>'
        
);
        return 
preg_replace($in,$out,$url);
    }

?>

dyer85 at gmail dot com (2008-08-28 19:41:49)

There seems to be some unexpected behavior when using the /m modifier when the line terminators are win32 or mac format.

If you have a string like below, and try to replace dots, the regex won't replace correctly:

<?php
$s 
"Testing, testing.\r\n"
   
"Another testing line.\r\n"
   
"Testing almost done.";

echo 
preg_replace('/\.$/m''.@'$s); // only last . replaced
?>

The /m modifier doesn't seem to work properly when CRLFs or CRs are used. Make sure to convert line endings to LFs (*nix format) in your input string.

tasser at ne8 dot in (2008-08-23 13:42:13)

preg_replace is greedy by default, and the behaviour of ? on the pattern is to make it non-greedy contrary to what i read on PCRE docs. 

<?php
$str 
="asdfd adsfd aaaadasd";

$str preg_replace("/(a)(.*)(d)/","a($2)d",$str);
// a(sdfd adsfd aaaadas)d

$str preg_replace("/(a)(.*)?(d)/U","a($2)d",$str);
// a(s)dfd a()dsfd a(aaa)da(s)d
?>

which is what i wanted.

halityesil [ at at] globya [ dot dot] net (2008-07-31 06:08:57)

<?PHP

function strip_tags_attributes($sSource$aAllowedTags FALSE$aDisabledAttributes FALSE$aAllowedProperties 'font|font-size|font-weight|color' '|text-align|text-decoration|margin|margin-left' '|margin-top|margin-bottom|margin-right|padding' '|padding-top|padding-left|padding-right|padding-bottom' '|width|height'){

   if( !
is_array$aDisabledAttributes ) ){
      
$aDisabledAttributes = array('onabort''onactivate''onafterprint''onafterupdate''onbeforeactivate''onbeforecopy''onbeforecut''onbeforedeactivate''onbeforeeditfocus''onbeforepaste''onbeforeprint''onbeforeunload''onbeforeupdate''onblur''onbounce''oncellchange''onchange''onclick''oncontextmenu''oncontrolselect''oncopy''oncut''ondataavaible''ondatasetchanged''ondatasetcomplete''ondblclick''ondeactivate''ondrag''ondragdrop''ondragend''ondragenter''ondragleave''ondragover''ondragstart''ondrop''onerror''onerrorupdate''onfilterupdate''onfinish''onfocus''onfocusin''onfocusout''onhelp''onkeydown''onkeypress''onkeyup''onlayoutcomplete''onload''onlosecapture''onmousedown''onmouseenter''onmouseleave''onmousemove''onmoveout''onmouseover''onmouseup''onmousewheel''onmove''onmoveend''onmovestart''onpaste''onpropertychange''onreadystatechange''onreset''onresize''onresizeend''onresizestart''onrowexit''onrowsdelete''onrowsinserted''onscroll''onselect''onselectionchange''onselectstart''onstart''onstop''onsubmit''onunload');
   }
   
   
$sSource stripcslashes$sSource );
            
   
$sSource strip_tags$sSource$aAllowedTags );
        
   if( empty(
$aDisabledAttributes) ){
      return 
$sSource;
   }

   
$aDisabledAttributes = @ implode('|'$aDisabledAttributes);
        
   
$sSource preg_replace('/<(.*?)>/ie'"'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" $aDisabledAttributes ")[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'"$sSource );
   
$sSource preg_replace('/\s(' $aDisabledAttributes ').*?([\s\>])/''\\2'$sSource);
            
   
$regexp '@([^;"]+)?(?<!'$aAllowedProperties .'):(?!\/\/(.+?)\/)((.*?)[^;"]+)(;)?@is';    
   
$sSource preg_replace($regexp''$sSource);
   
$sSource preg_replace('@[a-z]*=""@is'''$sSource); 
            
   return 
$sSource;
}

?>

Online resource help skype name : globya 

good luck !

Anonymous (2008-07-30 09:21:30)

People using functions like scandir with user input and protecting against "../" by using preg_replace make sure you run ir recursivly untill preg_match no-long finds it, because if you don't the following can happen.

If a user gives the path:
"./....//....//....//....//....//....//....//"
then your script detects every "../" and removes them leaving:
"./../../../../../../../"
Which is proberly going back enough times to show root.

I just found this vunrability in an old script of mine, which was written several years ago.

Always do:
<?php
while( preg_match( [expression], $input ) )
{
   
$input preg_replace( [expression], ""$input );
}
?>

Robert Hartung (2008-07-30 05:34:36)

I just was trying to make a negated replace with a string NOT being in another string.

The code that is actually working:

<?php
$result 
preg_replace('#{(?!disable1|disable2)[a-z0-9]+}#is''$1'$foo);
?>

This code matches on all strings that do NOT start with "disable1" or "disable2" and contain of a-z0-9. It took me several hours to figuere out this very easy example!

Hope anyone could use it

marcin at pixaltic dot com (2008-07-25 15:56:56)

<?php
    
//:::replace with anything that you can do with searched string:::
    //Marcin Majchrzak
    //pixaltic.com
    
    
$c "2 4 8";
    echo (
$c); //display:2 4 8

    
$cp "/(\d)\s(\d)\s(\d)/e"//pattern
    
$cr "'\\3*\\2+\\1='.(('\\3')*('\\2')+('\\1'))"//replece
    
$c preg_replace($cp$cr$c);
    echo (
$c); //display:8*4+2=34
?>

dweintraub at solbright dot com (2008-07-16 15:23:29)

When you use the '$1', '$2', etc. replacement values, they can be either in double or single quotes. There is no need to worry about the dollar sign being interpreted as a variable or not:

<?php
print preg_replace("/I want (\S+) one/""$1 is the one I want""I want that one") . "\n";
print 
preg_replace("/I want (\S+) one/"'$1 is the one I want'"I want that one") . "\n";
?>

Both lines will print "that is the one I want".

David (2008-07-11 04:59:06)

Take care when you try to strip whitespaces out of an UTF-8 text. Using something like:

<?php
$text 
preg_replace"{\s+}"' '$text );
?>

brokes in my case the letter à which is hex c3a0. But a0 is a whitespace. So use 

<?php
$text 
preg_replace"{[ \t]+}"' '$text );
?>

to strip all spaces and tabs, or better, use a multibyte function like mb_ereg_replace.

akniep at rayo dot info (2008-07-07 15:22:42)

preg_replace (and other preg-functions) return null instead of a string when encountering problems you probably did not think about!
-------------------------

It may not be obvious to everybody that the function returns NULL if an error of any kind occurres. An error I happen to stumple about quite often was the back-tracking-limit:
http://de.php.net/manual/de/pcre.configuration.php
#ini.pcre.backtrack-limit

When working with HTML-documents and their parsing it happens that you encounter documents that have a length of over 100.000 characters and that may lead to certain regular-expressions to fail due the back-tracking-limit of above.

A regular-expression that is ungreedy ("U", http://de.php.net/manual/de/reference.pcre.pattern.modifiers.php) often does the job, but still: sometimes you just need a greedy regular expression working on long strings ...

Since, an unhandled return-value of NULL usually creates a consecutive error in the application with unwanted and unforeseen consequences, I found the following solution to be quite helpful and at least save the application from crashing:

<?php

$string_after 
preg_replace'/some_regexp/'"replacement"$string_before );

// if some error occurred we go on working with the unchanged original string
if (PREG_NO_ERROR !== preg_last_error())
{
    
$string_after $string_before;
    
    
// put email-sending or a log-message here
//if

// free memory
unset( $string_before );

?>

You may or should also put a log-message or the sending of an email into the if-condition in order to get informed, once, one of your regular-expressions does not have the effect you desired it to have.

da_pimp2004_966 at hotmail dot com (2008-06-20 23:09:36)

A simple BB like thing..

<?php
function AddBB($var) {
        
$search = array(
                
'/\[b\](.*?)\[\/b\]/is',
                
'/\[i\](.*?)\[\/i\]/is',
                
'/\[u\](.*?)\[\/u\]/is',
                
'/\[img\](.*?)\[\/img\]/is',
                
'/\[url\](.*?)\[\/url\]/is',
                
'/\[url\=(.*?)\](.*?)\[\/url\]/is'
                
);

        
$replace = array(
                
'<strong>$1</strong>',
                
'<em>$1</em>',
                
'<u>$1</u>',
                
'<img src="$1" />',
                
'<a href="$1">$1</a>',
                
'<a href="$1">$2</a>'
                
);

        
$var preg_replace ($search$replace$var);
        return 
$var;
}
?>

Michael W (2008-04-16 15:35:58)

For filename tidying I prefer to only ALLOW certain characters rather than converting particular ones that we want to exclude. To this end I use ...

<?php
  $allowed 
"/[^a-z0-9\\040\\.\\-\\_\\\\]/i";
  
preg_replace($allowed,"",$str));
?>

Allows letters a-z, digits, space (\\040), hyphen (\\-), underscore (\\_) and backslash (\\\\), everything else is removed from the string.

php-comments-REMOVE dot ME at dotancohen dot com (2008-02-29 12:02:05)

Below is a function for converting Hebrew final characters to their
normal equivelants should they appear in the middle of a word.
The /b argument does not treat Hebrew letters as part of a word,
so I had to work around that limitation.

<?php

$text
="????? ???????";

function 
hebrewNotWordEndSwitch ($from$to$text) {
   
$text=
    
preg_replace('/'.$from.'([?-?])/u','$2'.$to.'$1',$text);
   return 
$text;
}

do {
   
$text_before=$text;
   
$text=hebrewNotWordEndSwitch("?","?",$text);
   
$text=hebrewNotWordEndSwitch("?","?",$text);
   
$text=hebrewNotWordEndSwitch("?","?",$text);
   
$text=hebrewNotWordEndSwitch("?","?",$text);
   
$text=hebrewNotWordEndSwitch("?","?",$text);
}   while ( 
$text_before!=$text );

print 
$text// ????? ??????!

?>

The do-while is necessary for multiple instances of letters, such
as "????" which would start off as "????". Note that there's still the
problem of acronyms with gershiim but that's not a difficult one
to solve. The code is in use at http://gibberish.co.il which you can
use to translate wrongly-encoded Hebrew, transliterize, and some
other Hebrew-related functions.

To ensure that there will be no regular characters at the end of a
word, just convert all regular characters to their final forms, then
run this function. Enjoy!

Jacob Fogg (2008-01-14 13:29:48)

Here is my attempt at cleaning up a file name... it's similar to  what someone else has done however a little cleaner with the addition of the | in the reserved characters... also I clean any characters from x00 to x40 (all non display characters and space) as well as everything greater than 7f and greater (removes the Del character and other non English characters), replacing them with an '_'.

<?php
function clean_filename($filename){//function to clean a filename string so it is a valid filename
  
$reserved preg_quote('\/:*?"<>|''/');//characters that are  illegal on any of the 3 major OS's
  //replaces all characters up through space and all past ~ along with the above reserved characters
  
return preg_replace("/([\\x00-\\x20\\x7f-\\xff{$reserved}])/e""_"$filename);
}
?>

[EDIT BY danbrown AT php DOT net: Inserted typofix/bugfix provided by "Bryan Roach" on 15 January, 2008.]

admin[a-t]saltwaterc[d0t]net (2007-12-11 08:17:05)

<?php
function repl_amp($text)
    {
    
$text=preg_replace("/&(?!amp;)/i""&"$text);
    
$text=preg_replace("/&#(\d+);/i""&#$1;"$text); // For numeric entities
    
$text=preg_replace("/&(\w+);/i""&$1;"$text); // For literal entities
    
return $text;
    }
?>

The RegEx Tester says that the first expression is OK, but when testing with various entities, some of them came out broken. I'd tried to use only 2 preg_replace(); calls instead of three by using the alternative branch from the pattern syntax - which didn't came out well. Sorry for the previous error, and I still hope that someone can find a better alternative.

ulf dot reimers at tesa dot com (2007-12-07 10:28:02)

Hi,

as I wasn't able to find another way to do this, I wrote a function converting any UTF-8 string into a correct NTFS filename (see http://en.wikipedia.org/wiki/Filename).

<?php
function strToNTFSFilename($string)
{
  
$reserved preg_quote('\/:*?"<>''/');
  return 
preg_replace("/([\\x00-\\x1f{$forbidden}])/e""_"$string);
}
?>

It converts all control characters and filename characters which are reserved by Windows ('\/:*?"<>') into an underscore.
This way you can safely create an NTFS filename out of any UTF-8 string.

mike dot hayward at mikeyskona dot co dot uk (2007-10-18 08:49:51)

Hi.
Not sure if this will be a great help to anyone out there, but thought i'd post just in case.
I was having an Issue with a project that relied on $_SERVER['REQUEST_URI']. Obviously this wasn't working on IIS.
(i am using mod_rewrite in apache to call up pages from a database and IIS doesn't set REQUEST_URI). So i knocked up this simple little preg_replace to use the query string set by IIS when redirecting to a PHP error page.

<?php
//My little IIS hack :)
if(!isset($_SERVER['REQUEST_URI'])){ 
  
$_SERVER['REQUEST_URI'] = preg_replace'/404;([a-zA-Z]+:\/\/)(.*?)\//i'"/" $_SERVER['QUERY_STRING'] );
}
?>

Hope this helps someone else out there trying to do the same thing :)

sternkinder at gmail dot com (2007-08-24 03:10:37)

From what I can see, the problem is, that if you go straight and substitute all 'A's wit 'T's you can't tell for sure which 'T's to substitute with 'A's afterwards. This can be for instance solved by simply replacing all 'A's by another character (for instance '_' or whatever you like), then replacing all 'T's by 'A's, and then replacing all '_'s (or whatever character you chose) by 'A's:

<?php
$dna 
"AGTCTGCCCTAG";
echo 
str_replace(array("A","G","C","T","_","-"), array("_","-","G","A","T","C"), $dna); //output will be TCAGACGGGATC
?>

Although I don't know how transliteration in perl works (though I remember that is kind of similar to the UNIX command "tr") I would suggest following function for "switching" single chars:

<?php
function switch_chars($subject,$switch_table,$unused_char="_") {
    foreach ( 
$switch_table as $_1 => $_2 ) {
        
$subject str_replace($_1,$unused_char,$subject);
        
$subject str_replace($_2,$_1,$subject);
        
$subject str_replace($unused_char,$_2,$subject);
    }
    return 
$subject;
}

echo 
switch_chars("AGTCTGCCCTAG", array("A"=>"T","G"=>"C")); //output will be TCAGACGGGATC
?>

rob at ubrio dot us (2007-08-21 13:48:37)

Also worth noting is that you can use array_keys()/array_values() with preg_replace like:

<?php
$subs 
= array(
  
'/\[b\](.+)\[\/b\]/Ui' => '<strong>$1</strong>',
  
'/_(.+)_/Ui' => '<em>$1</em>'
  
...
  ...
);

$raw_text '[b]this is bold[/b] and this is _italic!_';

$bb_text preg_replace(array_keys($subs), array_values($subs), $raw_text);
?>

lehongviet at gmail dot com (2007-07-25 01:15:29)

I got problem echoing text that contains double-quotes into a text field. As it confuses value option. I use this function below to match and replace each pair of them by smart quotes. The last one will be replaced by a hyphen(-).

It works for me.

<?php
function smart_quotes($text) {
  
$pattern '/"((.)*?)"/i';
  
$text preg_replace($pattern,"“\\1”",stripslashes($text));
  
$text str_replace("\"","-",$text);
  
$text addslashes($text);
  return 
$text;
}
?>

131 dot php at cloudyks dot org (2007-07-17 04:37:35)

Based on previous comment, i suggest 
( this function already exist in php 6 )

<?php
function unicode_decode($str){
    return 
preg_replace(
        
'#\\\u([0-9a-f]{4})#e',
        
"unicode_value('\\1')",
        
$str);
}

function 
unicode_value($code) {
    
$value=hexdec($code);
    if(
$value<0x0080)
        return 
chr($value);
    elseif(
$value<0x0800)
        return 
chr((($value&0x07c0)>>6)|0xc0)
            .
chr(($value&0x3f)|0x80);
    else
        return 
chr((($value&0xf000)>>12)|0xe0)
        .
chr((($value&0x0fc0)>>6)|0x80)
        .
chr(($value&0x3f)|0x80);
}
?>

[EDIT BY danbrown AT php DOT net:  This function originally written by mrozenoer AT overstream DOT net.]

mtsoft at mt-soft dot com dot ar (2007-07-09 14:30:41)

This function takes a URL and returns a plain-text version of the page. It uses cURL to retrieve the page and a combination of regular expressions to strip all unwanted whitespace. This function will even strip the text from STYLE and SCRIPT tags, which are ignored by PHP functions such as strip_tags (they strip only the tags, leaving the text in the middle intact).

Regular expressions were split in 2 stages, to avoid deleting single carriage returns (also matched by \s) but still delete all blank lines and multiple linefeeds or spaces, trimming operations took place in 2 stages.

<?php
function webpage2txt($url)
{
$user_agent “Mozilla/4.0 (compatibleMSIE 5.01Windows NT 5.0);

$ch curl_init();    // initialize curl handle
curl_setopt($chCURLOPT_URL$url); // set url to post to
curl_setopt($chCURLOPT_FAILONERROR1);              // Fail on errors
curl_setopt($chCURLOPT_FOLLOWLOCATION1);    // allow redirects
curl_setopt($chCURLOPT_RETURNTRANSFER,1); // return into a variable
curl_setopt($chCURLOPT_PORT80);            //Set the port number
curl_setopt($chCURLOPT_TIMEOUT15); // times out after 15s

curl_setopt($chCURLOPT_USERAGENT$user_agent);

$document curl_exec($ch);

$search = array(@<script[^>]*?>.*?</script>@si’,  // Strip out javascript
‘@<style[^>]*?>.*?</style>@siU’,    // Strip style tags properly
‘@<[\/\!]*?[^<>]*?>@si’,            // Strip out HTML tags
‘@<![\s\S]*?–[ \t\n\r]*>@’,         // Strip multi-line comments including CDATA
‘/\s{2,}/’,

);

$text = preg_replace($search, “\n”, html_entity_decode($document));

$pat[0] = “/^\s+/”;
$pat[2] = “/\s+\$/”;
$rep[0] = “”;
$rep[2] = ” “;

$text = preg_replace($pat, $rep, trim($text));

return $text;
}
?>

Potential uses of this function are extracting keywords from a webpage, counting words and things like that. If you find it useful, drop us a comment and let us know where you used it.

ismith at nojunk dot motorola dot com (2007-03-21 09:47:27)

Be aware that when using the "/u" modifier, if your input text contains any bad UTF-8 code sequences, then preg_replace will return an empty string, regardless of whether there were any matches.
This is due to the PCRE library returning an error code if the string contains bad UTF-8.

dani dot church at gmail dot youshouldknowthisone (2007-02-07 11:09:57)

Note that it is in most cases much more efficient to use preg_replace_callback(), with a named function or an anonymous function created with create_function(), instead of the /e modifier. When preg_replace() is called with the /e modifier, the interpreter must parse the replacement string into PHP code once for every replacement made, while preg_replace_callback() uses a function that only needs to be parsed once.

Alexey Lebedev (2006-09-07 02:21:24)

Wasted several hours because of this:

<?php
$str
='It&#039;s a string with HTML entities';
preg_replace('~&#(\d+);~e''code2utf($1)'$str);
?>

This code must convert numeric html entities to utf8. And it does with a little exception. It treats wrong codes starting with &#0

The reason is that code2utf will be called with leading zero, exactly what the pattern matches - code2utf(039).
And it does matter! PHP treats 039 as octal number.
Try <?php print(011); ?>

Solution:
<?php preg_replace('~&#0*(\d+);~e''code2utf($1)'$str); ?>

robvdl at gmail dot com (2006-04-21 05:15:54)

For those of you that have ever had the problem where clients paste text from msword into a CMS, where word has placed all those fancy quotes throughout the text, breaking the XHTML validator... I have created a nice regular expression, that replaces ALL high UTF-8 characters with HTML entities, such as ’.

Note that most user examples on php.net I have read, only replace selected characters, such as single and double quotes. This replaces all high characters, including greek characters, arabian characters, smilies, whatever.

It took me ages to get it just downto two regular expressions, but it handles all high level characters properly.

<?php
$text 
preg_replace('/([\xc0-\xdf].)/se'"'&#' . ((ord(substr('$1', 0, 1)) - 192) * 64 + (ord(substr('$1', 1, 1)) - 128)) . ';'"$text);
$text preg_replace('/([\xe0-\xef]..)/se'"'&#' . ((ord(substr('$1', 0, 1)) - 224) * 4096 + (ord(substr('$1', 1, 1)) - 128) * 64 + (ord(substr('$1', 2, 1)) - 128)) . ';'"$text);
?>

gabe at mudbuginfo dot com (2004-10-18 13:39:35)

It is useful to note that the 'limit' parameter, when used with 'pattern' and 'replace' which are arrays, applies to each individual pattern in the patterns array, and not the entire array.
<?php

$pattern 
= array('/one/''/two/');
$replace = array('uno''dos');
$subject "test one, one two, one two three";

echo 
preg_replace($pattern$replace$subject1);
?>

If limit were applied to the whole array (which it isn't), it would return:
test uno, one two, one two three

However, in reality this will actually return:
test uno, one dos, one two three

steven -a-t- acko dot net (2004-02-08 09:45:09)

People using the /e modifier with preg_replace should be aware of the following weird behaviour. It is not a bug per se, but can cause bugs if you don't know it's there.
The example in the docs for /e suffers from this mistake in fact.
With /e, the replacement string is a PHP expression. So when you use a backreference in the replacement expression, you need to put the backreference inside quotes, or otherwise it would be interpreted as PHP code. Like the example from the manual for preg_replace:
preg_replace("/(<\/?)(\w+)([^>]*>)/e",
"'\\1'.strtoupper('\\2').'\\3'",
$html_body);
To make this easier, the data in a backreference with /e is run through addslashes() before being inserted in your replacement expression. So if you have the string
He said: "You're here"
It would become:
He said: \"You\'re here\"
...and be inserted into the expression.
However, if you put this inside a set of single quotes, PHP will not strip away all the slashes correctly! Try this:
print ' He said: \"You\'re here\" ';
Output: He said: \"You're here\"
This is because the sequence \" inside single quotes is not recognized as anything special, and it is output literally.
Using double-quotes to surround the string/backreference will not help either, because inside double-quotes, the sequence \' is not recognized and also output literally. And in fact, if you have any dollar signs in your data, they would be interpreted as PHP variables. So double-quotes are not an option.
The 'solution' is to manually fix it in your expression. It is easiest to use a separate processing function, and do the replacing there (i.e. use "my_processing_function('\\1')" or something similar as replacement expression, and do the fixing in that function).
If you surrounded your backreference by single-quotes, the double-quotes are corrupt:
$text = str_replace('\"', '"', $text);
People using preg_replace with /e should at least be aware of this.
I'm not sure how it would be best fixed in preg_replace. Because double-quotes are a really bad idea anyway (due to the variable expansion), I would suggest that preg_replace's auto-escaping is modified to suit the placement of backreferences inside single-quotes (which seemed to be the intention from the start, but was incorrectly applied).

thewolf at pixelcarnage dot com (2003-10-23 07:38:58)

I got sick of trying to replace just a word, so I decided I would write my own string replacement code. When that code because far to big and a little faulty I decided to use a simple preg_replace:

<?php
/**
 * Written by Rowan Lewis
 * $search(string), the string to be searched for
 * $replace(string), the string to replace $search
 * $subject(string), the string to be searched in
 */
function word_replace($search$replace$subject) {
    return 
preg_replace('/[a-zA-Z]+/e''\'\0\' == \'' $search '\' ? \'' $replace '\': \'\0\';'$subject);
}
?>

I hope that this code helpes someone!

易百教程