Wednesday, March 11, 2009

PHP: reformat string for ISO Latin -1 character set


Share at Facebook

First of all what is ISO Latin character set? You'll find a brief of ISO 8859-1 (Latin-1) Characters List at
http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html

Now, you have a string that contains the HTML syntax of those character set. For example, instead of a comma(,) you have , inside your string. Or, you can have * instead of having the asteroid(*)

You can convert the string having those HTML syntax of Latin char set using a simple regular expression. Inside the regular expression I've used the /e modifier, so that I can use $1, $2, etc variable that returned from regex matching of preg_match. Finally I just converted the ascii to char using chr() function of PHP.

Here is the code that will polish your string after removing the HTML syntax for ISO 8859-1

$html_source = 'Only $1,00!! what the f***??';
$html_source = preg_replace('/&#(\d\d);/e', 'chr(\'$1\')', $html_source);
print "$html_source\n";

Thanks for reading the article.




1 comment:

fromvega said...

Why not simply use html_entity_decode ?