Sitekeepers - Webmaster's blog

Monday, December 26, 2005

ISO to UTF-8 Tutorial

Searching the web for a problem I face on a greek site about encoding, I found a very useful PHP class which changes the charset of a variable.

First of you all you have to download the class from here:
http://mikolajj.republika.pl/files/ConvertCharset/ConvertCharset_v1.0.zip .

Secondly, unzip the file and upload to your /www/ folder the ConvertTables dir and the ConvertCharset.class.php .

Open your php script you want to change the encoding.
Add to the line #1 the next code:

ob_start();
include("ConvertCharset.class.php");
?>

Then, go to the last line of your script and add:

$contents = ob_get_contents(); // store buffer in $contents
ob_end_clean(); // delete output buffer and stop buffering
$FromCharset = "iso-8859-7";
$ToCharset = "utf-8";
$text = new ConvertCharset();
$contents= $text ->Convert($contents, $FromCharset, $ToCharset);
echo "$contents";
?>

Don't forget to change the meta tag charset to UTF-8 to work properly.

Here is the list charsets you can operate with. The main requirement is that a character has to be in both character sets, otherwise it will return an error.

WINDOWS
windows-1250 - Central Europe
windows-1251 - Cyrillic
windows-1252 - Latin I
windows-1253 - Greek
windows-1254 - Turkish
windows-1255 - Hebrew
windows-1256 - Arabic
windows-1257 - Baltic
windows-1258 - Viet Nam
cp874 - Thai - this file is also for DOS


DOS
cp437 - Latin US
cp737 - Greek
cp775 - BaltRim
cp850 - Latin1
cp852 - Latin2
cp855 - Cyrylic
cp857 - Turkish
cp860 - Portuguese
cp861 - Iceland
cp862 - Hebrew
cp863 - Canada
cp864 - Arabic
cp865 - Nordic
cp866 - Cyrylic Russian (this is the one, used in IE "Cyrillic (DOS)" )
cp869 - Greek2


MAC (Apple)
x-mac-cyrillic
x-mac-greek
x-mac-icelandic
x-mac-ce
x-mac-roman


ISO (Unix/Linux)
iso-8859-1
iso-8859-2
iso-8859-3
iso-8859-4
iso-8859-5
iso-8859-6
iso-8859-7
iso-8859-8
iso-8859-9
iso-8859-10
iso-8859-11
iso-8859-12
iso-8859-13
iso-8859-14
iso-8859-15
iso-8859-16


MISCELLANEOUS
gsm0338 (ETSI GSM 03.38)
cp037
cp424
cp500
cp856
cp875
cp1006
cp1026
koi8-r (Cyrillic)
koi8-u (Cyrillic Ukrainian)
nextstep
us-ascii
us-ascii-quotes

DSP implementation for NeXT
stdenc
symbol
zdingbat

Good Luck