from: http://www.unicodetools.com/
PHP: Use utf8_decode($data) (convert from UTF-8 to ISO-8859-1
) and utf8_encode($data) (convert from ISO-8859-1 to UTF-8
). Some native PHP functions such as strtolower(), strtoupper() and ucfirst() do not always function correctly with UTF-8 strings. Possible solutions: convert to latin first or add the following line to your code setlocale(LC_CTYPE, 'C'); Make sure not to save your PHP files using a BOM (Byte-Order Marker) UTF-8 file marker (your browser might show these BOM characters between PHP pages on your site).
PERL: use Encode qw(from_to); from_to($data, "iso-8859-1", "utf8"); You can use is_utf8($data) to check if a string is valid UTF-8
Python: To encode in UTF-8: utf8string = unicode(data,"utf-8"); To decode back to locale character set: utf8string.encode("utf-8");
.NET C#: In C-Sharp use System.Text: ASCIIEncoding ascii = new ASCIIEncoding(); UTF8Encoding utf8 = new UTF8Encoding(); byte[] asciiBytes = Encoding.Convert(utf8, ascii, utf8bytes);
Java: Use String.getBytes(Charset)
to convert a string or use the CharsetEncoder class.
MySQL: MySQL uses charachter sets on all levels, there are settings like: character_set_connection and collation_connection, and you can specify a character set at the database level, the table level and field level. To convert a charachter set inside a MySQL query use convert: SELECT CONVERT(latin1field USING utf8)
. If you are experiencing speed issues with table joins after converting character sets of tabels or fields make sure that all ID fields use the same COLLATE setting.
HTML: You can specify your prefered character set using the content-type meta tag (example: <meta http-equiv="content-type" content="text/html; charset=UTF-8">). To avoid problems with various character sets it is sometimes easier to convert your special charachters to (plain ASCII) HTML code. HTML encoded special characters are also readable by old browsers, whereas the content-type meta tag is not. You can use this special charachter to HTML code converter for this.
Unix systems: Use the character set conversion tool: iconv -f ISO-8859-1 -t UTF-8 filename.txt
Windows systems: Most good text-editors offer Unicode support, such as UltraEdit (File → Conversions → ASCII to UTF-8 or ASCII to Unicode (16-Bit)).