多字节字符串 函数
在线手册:中文  英文

mb_convert_encoding

(PHP 4 >= 4.0.6, PHP 5)

mb_convert_encoding转换字符的编码

说明

string mb_convert_encoding ( string $str , string $to_encoding [, mixed $from_encoding ] )

string 类型 str 的字符编码从可选的 from_encoding 转换到 to_encoding

参数

str

要编码的 string

to_encoding

str 要转换成的编码类型。

from_encoding

在转换前通过字符代码名称来指定。它可以是一个 array 也可以是逗号分隔的枚举列表。 如果没有提供 from_encoding,则会使用内部(internal)编码。

参见支持的编码

返回值

编码后的 string

范例

Example #1 mb_convert_encoding() 例子

<?php
/* 转换内部编码为 SJIS */
$str mb_convert_encoding($str"SJIS");

/* 将 EUC-JP 转换成 UTF-7 */
$str mb_convert_encoding($str"UTF-7""EUC-JP");

/* 从 JIS, eucjp-win, sjis-win 中自动检测编码,并转换 str 到 UCS-2LE */
$str mb_convert_encoding($str"UCS-2LE""JIS, eucjp-win, sjis-win");

/* "auto" 扩展成 "ASCII,JIS,UTF-8,EUC-JP,SJIS" */
$str mb_convert_encoding($str"EUC-JP""auto");
?>

参见


多字节字符串 函数
在线手册:中文  英文

用户评论:

josip at cubrad dot com (2013-06-27 23:24:17)

For my last project I needed to convert several CSV files from Windows-1250 to UTF-8, and after several days of searching around I found a function that is partially solved my problem, but it still has not transformed all the characters. So I made ??this:
function w1250_to_utf8($text) {
// map based on:
// http://konfiguracja.c0.pl/iso02vscp1250en.html
// http://konfiguracja.c0.pl/webpl/index_en.html#examp
// http://www.htmlentities.com/html/entities/
$map = array(
chr(0x8A) => chr(0xA9),
chr(0x8C) => chr(0xA6),
chr(0x8D) => chr(0xAB),
chr(0x8E) => chr(0xAE),
chr(0x8F) => chr(0xAC),
chr(0x9C) => chr(0xB6),
chr(0x9D) => chr(0xBB),
chr(0xA1) => chr(0xB7),
chr(0xA5) => chr(0xA1),
chr(0xBC) => chr(0xA5),
chr(0x9F) => chr(0xBC),
chr(0xB9) => chr(0xB1),
chr(0x9A) => chr(0xB9),
chr(0xBE) => chr(0xB5),
chr(0x9E) => chr(0xBE),
chr(0x80) => '&euro;',
chr(0x82) => '&sbquo;',
chr(0x84) => '&bdquo;',
chr(0x85) => '&hellip;',
chr(0x86) => '&dagger;',
chr(0x87) => '&Dagger;',
chr(0x89) => '&permil;',
chr(0x8B) => '&lsaquo;',
chr(0x91) => '&lsquo;',
chr(0x92) => '&rsquo;',
chr(0x93) => '&ldquo;',
chr(0x94) => '&rdquo;',
chr(0x95) => '&bull;',
chr(0x96) => '&ndash;',
chr(0x97) => '&mdash;',
chr(0x99) => '&trade;',
chr(0x9B) => '&rsquo;',
chr(0xA6) => '&brvbar;',
chr(0xA9) => '&copy;',
chr(0xAB) => '&laquo;',
chr(0xAE) => '&reg;',
chr(0xB1) => '&plusmn;',
chr(0xB5) => '&micro;',
chr(0xB6) => '&para;',
chr(0xB7) => '&middot;',
chr(0xBB) => '&raquo;',
);
return html_entity_decode(mb_convert_encoding(strtr($text, $map), 'UTF-8', 'ISO-8859-2'), ENT_QUOTES, 'UTF-8');
}

urko at wegetit dot eu (2012-09-11 18:17:29)

If you are trying to generate a CSV (with extended chars) to be opened at Exel for Mac, the only that worked for me was:
<?php mb_convert_encoding$CSV'Windows-1252''UTF-8'); ?>

I also tried this:

<?php
//Separado OK, chars MAL
iconv('MACINTOSH''UTF8'$CSV);
//Separado MAL, chars OK
chr(255).chr(254).mb_convert_encoding$CSV'UCS-2LE''UTF-8');
?>

But the first one didn't show extended chars correctly, and the second one, did't separe fields correctly

qdb at kukmara dot ru (2011-11-07 07:44:54)

mb_substr and probably several other functions works faster in ucs-2 than in utf-8. and utf-16 works slower than utf-8. here is test, ucs-2 is near 50 times faster than utf-8, and utf-16 is near 6 times slower than utf-8 here:
<?php
header
('Content-Type: text/html; charset=utf-8');
mb_internal_encoding('utf-8');

$s='укгез??ш?хз?х?шк2049??лдябчсячмииюсит.июб?рарэ'
.'лдэфв??у?й?уй??у034928348539857?шаыдларораш??рлоавы';
$s.=$s;
$s.=$s;
$s.=$s;
$s.=$s;
$s.=$s;
$s.=$s;
$s.=$s;

$t1=microtime(true);
$i=0;
while(
$i<mb_strlen($s)){
    
$a=mb_substr($s,$i,2);
    
$i+=2;
    if(
$i==10)echo$a.'. ';
    
//echo$a.'. ';
}
echo
$i.'. ';
echo(
microtime(true)-$t1);

echo
'<br>';
$s=mb_convert_encoding($s,'UCS-2','utf8');
mb_internal_encoding('UCS-2');
$t1=microtime(true);
$i=0;
while(
$i<mb_strlen($s)){
    
$a=mb_substr($s,$i,2);
    
$i+=2;
    if(
$i==10)echo mb_convert_encoding($a,'utf8','ucs2').'. ';
    
//echo$a.'. ';
}
echo
$i.'. ';
echo(
microtime(true)-$t1);

echo
'<br>';
$s=mb_convert_encoding($s,'utf-16','ucs-2');
mb_internal_encoding('utf-16');
$t1=microtime(true);
$i=0;
while(
$i<mb_strlen($s)){
    
$a=mb_substr($s,$i,2);
    
$i+=2;
    if(
$i==10)echo mb_convert_encoding($a,'utf8','utf-16').'. ';
    
//echo$a.'. ';
}
echo
$i.'. ';
echo(
microtime(true)-$t1);

?>
output:
?х. 12416. 1.71738100052
?х. 12416. 0.0211279392242
?х. 12416. 11.2330229282

Alexanderdotbooneatmnsudotedu (2011-06-09 10:50:43)

I was having trouble parsing an Excel document to html, this seems to do the trick for character encoding.

<?php
function decode_characters($info)
{
    
$info mb_convert_encoding($info"HTML-ENTITIES""UTF-8");
    
$info preg_replace('~^(&([a-zA-Z0-9]);)~',htmlentities('${1}'),$info);
    return(
$info);
}
?>

alvaro at demogracia dot com (2011-04-06 03:37:32)

You can use the HTML-ENTITIES encoding to insert arbitrary Unicode characters, e.g. "U+25B6 (BLACK RIGHT-POINTING TRIANGLE)":

<?php
$triangle 
mb_convert_encoding('&#x25B6;''UTF-8''HTML-ENTITIES');
?>

Be aware though that older PHP versions do not seem to support hexadecimal notation (&#x25B6;). In such case, use decimal notation (&#9654;):

<?php
$triangle 
mb_convert_encoding('&#9654;''UTF-8''HTML-ENTITIES');
?>

gullevek at gullevek dot org (2010-08-25 00:27:44)

If you want to convert japanese to ISO-2022-JP it is highly recommended to use ISO-2022-JP-MS as the target encoding instead. This includes the extended character set and avoids ? in the text. For example the often used "1 in a circle" ① will be correctly converted then.

regrunge at hotmail dot it (2010-05-14 08:00:29)

I've been trying to find the charset of a norwegian (with a lot of ?, ?, ?) txt file written on a Mac, i've found it in this way:

<?php
$text 
"A strange string to pass, maybe with some ?, ?, ? characters.";

foreach(
mb_list_encodings() as $chr){
        echo 
mb_convert_encoding($text'UTF-8'$chr)." : ".$chr."<br>";    
 } 
?>

The line that looks good, gives you the encoding it was written in.

Hope can help someone

Daniel Trebbien (2009-07-23 11:25:38)

Note that `mb_convert_encoding($val, 'HTML-ENTITIES')` does not escape '\'', '"', '<', '>', or '&'.

me at gsnedders dot com (2009-06-18 15:06:42)

It appears that when dealing with an unknown "from encoding" the function will both throw an E_WARNING and proceed to convert the string from ISO-8859-1 to the "to encoding".

alexandrefelipemuller at gmail dot com (2009-02-18 12:06:20)

I used this function insted mb_convert_encoding, because mbstring wasn't enabled at my comercial server. It only suports utf7, 8 e iso 8859-1:

<?php
function my_convert_encoding($string,$to,$from)
{
        
// Convert string to ISO_8859-1
        
if ($from == "UTF-8")
                
$iso_string utf8_decode($string);
        else
                if (
$from == "UTF7-IMAP")
                        
$iso_string imap_utf7_decode($string);
                else
                        
$iso_string $string;

        
// Convert ISO_8859-1 string to result coding
        
if ($to == "UTF-8")
                return(
utf8_encode($iso_string));
        else
                if (
$to == "UTF7-IMAP")
                        return(
imap_utf7_encode($iso_string));
                else
                        return(
$iso_string);
}
?>

chzhang at gmail dot com (2009-01-05 00:34:31)

instead of ini_set(), you can try this
mb_substitute_character("none");

francois at bonzon point com (2008-11-10 17:05:38)

aaron, to discard unsupported characters instead of printing a ?, you might as well simply set the configuration directive:

mbstring.substitute_character = "none"

in your php.ini. Be sure to include the quotes around none. Or at run-time with

<?php
ini_set
('mbstring.substitute_character'"none");
?>

aaron at aarongough dot com (2008-11-07 08:24:46)

My solution below was slightly incorrect, so here is the correct version (I posted at the end of a long day, never a good idea!)

Again, this is a quick and dirty solution to stop mb_convert_encoding from filling your string with question marks whenever it encounters an illegal character for the target encoding. 

<?php
function convert_to $source$target_encoding )
    {
    
// detect the character encoding of the incoming file
    
$encoding mb_detect_encoding$source"auto" );
       
    
// escape all of the question marks so we can remove artifacts from
    // the unicode conversion process
    
$target str_replace"?""[question_mark]"$source );
       
    
// convert the string to the target encoding
    
$target mb_convert_encoding$target$target_encoding$encoding);
       
    
// remove any question marks that have been introduced because of illegal characters
    
$target str_replace"?"""$target );
       
    
// replace the token string "[question_mark]" with the symbol "?"
    
$target str_replace"[question_mark]""?"$target );
   
    return 
$target;
    }
?>

Hope this helps someone! (Admins should feel free to delete my previous, incorrect, post for clarity)
-A

Edward (2008-09-16 03:54:55)

If mb_convert_encoding doesn't work for you, and iconv gives you a headache, you might be interested in this free class I found. It can convert almost any charset to almost any other charset. I think it's wonderful and I wish I had found it earlier. It would have saved me tons of headache.
I use it as a fail-safe, in case mb_convert_encoding is not installed. Download it from http://mikolajj.republika.pl/
This is not my own library, so technically it's not spamming, right? ;)
Hope this helps.

StigC (2008-08-13 15:38:47)

For the php-noobs (like me) - working with flash and php.

Here's a simple snippet of code that worked great for me, getting php to show special Danish characters, from a Flash email form:

<?php
// Name Escape
$escName mb_convert_encoding($_POST["Name"], "ISO-8859-1""UTF-8");

// message escape
$escMessage mb_convert_encoding($_POST["Message"], "ISO-8859-1""UTF-8");

// Headers.. and so on...
?>

nospam at nihonbunka dot com (2008-05-15 18:51:34)

rodrigo at bb2 dot co dot jp wrote that inconv works better than mb_convert_encoding, I find that when converting from uft8 to shift_jis
$conv_str = mb_convert_encoding($str,$toCS,$fromCS);
works while
$conv_str = iconv($fromCS,$toCS.'//IGNORE',$str);
removes tildes from $str.

katzlbtjunk at hotmail dot com (2008-01-25 04:36:30)

Clean a string for use as filename by simply replacing all unwanted characters with underscore (ASCII converts to 7bit). It removes slightly more chars than necessary. Hope its useful.
$fileName = 'Test:!"$%&/()=?????ü<<';
echo strtr(mb_convert_encoding($fileName,'ASCII'),
' ,;:?*#!§$%&/(){}<>=`?|\\\'"',
'____________________________');

rodrigo at bb2 dot co dot jp (2008-01-15 03:47:52)

For those who can?t use mb_convert_encoding() to convert from one charset to another as a metter of lower version of php, try iconv().
I had this problem converting to japanese charset:
$txt=mb_convert_encoding($txt,'SJIS',$this->encode);
And I could fix it by using this:
$txt = iconv('UTF-8', 'SJIS', $txt);
Maybe it?s helpfull for someone else! ;)

mightye at gmail dot com (2007-11-13 09:24:48)

To petruzanauticoyahoo?com!ar

If you don't specify a source encoding, then it assumes the internal (default) encoding.  ? is a multi-byte character whose bytes in your configuration default (often iso-8859-1) would actually mean ?±.  mb_convert_encoding() is upgrading those characters to their multi-byte equivalents within UTF-8.

Try this instead:
<?php
print mb_convert_encoding"?""UTF-8""UTF-8" );
?>
Of course this function does no work (for the most part - it can actually be used to strip characters which are not valid for UTF-8).

volker at machon dot biz (2007-09-24 21:05:34)

Hey guys. For everybody who's looking for a function that is converting an iso-string to utf8 or an utf8-string to iso, here's your solution:
public function encodeToUtf8($string) {
return mb_convert_encoding($string, "UTF-8", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
}
public function encodeToIso($string) {
return mb_convert_encoding($string, "ISO-8859-1", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
}
For me these functions are working fine. Give it a try

aofg (2007-08-21 18:49:55)

When converting Japanese strings to ISO-2022-JP or JIS on PHP >= 5.2.1, you can use "ISO-2022-JP-MS" instead of them.
Kishu-Izon (platform dependent) characters are converted correctly with the encoding, as same as with eucJP-win or with SJIS-win.

David Hull (2006-12-20 10:52:40)

As an alternative to Johannes's suggestion for converting strings from other character sets to a 7bit representation while not just deleting latin diacritics, you might try this:

<?php
$text 
iconv($from_enc'US-ASCII//TRANSLIT'$text);
?>

The only disadvantage is that it does not convert "?" to "ae", but it handles punctuation and other special characters better.
-- 
David

phpdoc at jeudi dot de (2006-09-05 06:46:41)

I\&#039;d like to share some code to convert latin diacritics to their
traditional 7bit representation, like, for example,
- &agrave;,&ccedil;,&eacute;,&icirc;,... to a,c,e,i,...
- &szlig; to ss
- &auml;,&Auml;,... to ae,Ae,...
- &euml;,... to e,...
(mb_convert \&quot;7bit\&quot; would simply delete any offending characters).
I might have missed on your country\&#039;s typographic
conventions--correct me then.
&lt;?php
/**
* @args string $text line of encoded text
* string $from_enc (encoding type of $text, e.g. UTF-8, ISO-8859-1)
*
* @returns 7bit representation
*/
function to7bit($text,$from_enc) {
$text = mb_convert_encoding($text,\&#039;HTML-ENTITIES\&#039;,$from_enc);
$text = preg_replace(
array(\&#039;/&szlig;/\&#039;,\&#039;/&amp;(..)lig;/\&#039;,
\&#039;/&amp;([aouAOU])uml;/\&#039;,\&#039;/&amp;(.)[^;]*;/\&#039;),
array(\&#039;ss\&#039;,\&quot;$1\&quot;,\&quot;$1\&quot;.\&#039;e\&#039;,\&quot;$1\&quot;),
$text);
return $text;
}
?&gt;
Enjoy :-)
Johannes
==
[EDIT BY danbrown AT php DOT net: Author provided the following update on 27-FEB-2012.]
==
An addendum to my &quot;to7bit&quot; function referenced below in the notes.
The function is supposed to solve the problem that some languages require a different 7bit rendering of special (umlauted) characters for sorting or other applications. For example, the German &szlig; ligature is usually written &quot;ss&quot; in 7bit context. Dutch &yuml; is typically rendered &quot;ij&quot; (not &quot;y&quot;).
The original function works well with word (alphabet) character entities and I&#039;ve seen it used in many places. But non-word entities cause funny results:
E.g., &quot;&copy;&quot; is rendered as &quot;c&quot;, &quot;&shy;&quot; as &quot;s&quot; and &quot;&amp;rquo;&quot; as &quot;r&quot;.
The following version fixes this by converting non-alphanumeric characters (also chains thereof) to &#039;_&#039;.
&lt;?php
/**
* @args string $text line of encoded text
* string $from_enc (encoding type of $text, e.g. UTF-8, ISO-8859-1)
*
* @returns 7bit representation
*/
function to7bit($text,$from_enc) {
$text = preg_replace(/W+/,&#039;_&#039;,$text);
$text = mb_convert_encoding($text,&#039;HTML-ENTITIES&#039;,$from_enc);
$text = preg_replace(
array(&#039;/&szlig;/&#039;,&#039;/&amp;(..)lig;/&#039;,
&#039;/&amp;([aouAOU])uml;/&#039;,&#039;/&yuml;/&#039;,&#039;/&amp;(.)[^;]*;/&#039;),
array(&#039;ss&#039;,&quot;$1&quot;,&quot;$1&quot;.&#039;e&#039;,&#039;ij&#039;,&quot;$1&quot;),
$text);
return $text;
}
?&gt;
Enjoy again,
Johannes

mac.com@nemo (2006-07-08 07:38:47)

For those wanting to convert from $set to MacRoman, use iconv():

<?php

$string 
iconv('UTF-8''macintosh'$string);

?>

('macintosh' is the IANA name for the MacRoman character set.)

eion at bigfoot dot com (2006-02-20 16:54:52)

many people below talk about using 
<?php
    mb_convert_encode
($s,'HTML-ENTITIES','UTF-8');
?>
to convert non-ascii code into html-readable stuff.  Due to my webserver being out of my control, I was unable to set the database character set, and whenever PHP made a copy of my $s variable that it had pulled out of the database, it would convert it to nasty latin1 automatically and not leave it in it's beautiful UTF-8 glory.

So [insert korean characters here] turned into ?????.

I found myself needing to pass by reference (which of course is deprecated/nonexistent in recent versions of PHP)
so instead of
<?php
    mb_convert_encode
(&$s,'HTML-ENTITIES','UTF-8');
?>
which worked perfectly until I upgraded, so I had to use
<?php
    call_user_func_array
('mb_convert_encoding', array(&$s,'HTML-ENTITIES','UTF-8'));
?>

Hope it helps someone else out

Tom Class (2005-11-11 07:35:53)

Why did you use the php html encode functions? mbstring has it's own Encoding which is (as far as I tested it) much more usefull:
HTML-ENTITIES
Example:
$text = mb_convert_encoding($text, 'HTML-ENTITIES', "UTF-8");

Stephan van der Feest (2005-09-09 04:47:41)

To add to the Flash conversion comment below, here's how I convert back from what I've stored in a database after converting from Flash HTML text field output, in order to load it back into a Flash HTML text field:
function htmltoflash($htmlstr)
{
return str_replace("&lt;br /&gt;","\n",
str_replace("<","&lt;",
str_replace(">","&gt;",
mb_convert_encoding(html_entity_decode($htmlstr),
"UTF-8","ISO-8859-1"))));
}

Stephan van der Feest (2005-09-09 03:50:54)

Here's a tip for anyone using Flash and PHP for storing HTML output submitted from a Flash text field in a database or whatever.
Flash submits its HTML special characters in UTF-8, so you can use the following function to convert those into HTML entity characters:
function utf8html($utf8str)
{
return htmlentities(mb_convert_encoding($utf8str,"ISO-8859-1","UTF-8"));
}

jamespilcher1 - hotmail (2004-02-01 19:55:57)

be careful when converting from iso-8859-1 to utf-8.
even if you explicitly specify the character encoding of a page as iso-8859-1(via headers and strict xml defs), windows 2000 will ignore that and interpret it as whatever character set it has natively installed.
for example, i wrote char #128 into a page, with char encoding iso-8859-1, and it displayed in internet explorer (& mozilla) as a euro symbol.
it should have displayed a box, denoting that char #128 is undefined in iso-8859-1. The problem was it was displaying in "Windows: western europe" (my native character set).
this led to confusion when i tried to convert this euro to UTF-8 via mb_convert_encoding()
IE displays UTF-8 correctly- and because PHP correctly converted #128 into a box in UTF-8, IE would show a box.
so all i saw was mb_convert_encoding() converting a euro symbol into a box. It took me a long time to figure out what was going on.

lanka at eurocom dot od dot ua (2003-02-07 08:03:56)

Another sample of recoding without MultiByte enabling.
(Russian koi->win, if input in win-encoding already, function recode() returns unchanged string)

<?php
  
// 0 - win
  // 1 - koi
  
function detect_encoding($str) {
    
$win 0;
    
$koi 0;

    for(
$i=0$i<strlen($str); $i++) {
      if( 
ord($str[$i]) >224 && ord($str[$i]) < 255$win++;
      if( 
ord($str[$i]) >192 && ord($str[$i]) < 223$koi++;
    }

    if( 
$win $koi ) {
      return 
1;
    } else return 
0;

  }

  
// recodes koi to win
  
function koi_to_win($string) {

    
$kw = array(128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183,  184185186187188189190191254224225246228229244227245232233234235236237238239255240241242243230226252251231248253249247250222192193214196197212195213200201202203204205206207223208209210211198194220219199216221217215218);
    
$wk = array(128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183,  184185186187188189190191225226247231228229246250233234235236237238239240242,  243244245230232227254251253255249248252224241193194215199196197214218201202203204205206207208210211212213198200195222219221223217216220192209);

    
$end strlen($string);
    
$pos 0;
    do {
      
$c ord($string[$pos]);
      if (
$c>128) {
        
$string[$pos] = chr($kw[$c-128]);
      }

    } while (++
$pos $end);

    return 
$string;
  }

  function 
recode($str) {

    
$enc detect_encoding($str);
    if (
$enc==1) {
      
$str koi_to_win($str);
    }

    return 
$str;
  }
?>

易百教程