Character Sets and Document Encoding
Thursday 2nd July, 2009
I've just spent an embarrassingly long time trying to solve a problem with normalizing a string (i.e. getting rid of all non-western characters and replacing them with equivalents).
I needed to take the string "L'Autopsie Phénoménale De Dieu" and remove the é's, replacing them with e's - this was so the string could be used as part of a URL.
I tried all sorts of things, from the obvious (str_replace / strtr functions) to the ridiculously complicated (all sorts of string encoding, decoding, regex replacements) but nothing seemed to work as expected.
Eventually I discovered that the problem was nothing to do with the PHP - it was infact the document encoding. All pages on my site are set with the standard UTF-8 charset, but the PHP files that were actually performing the string replace functions with saved with a 'Western European' encoding type. Changing this to UTF-8 soon solved the problem.
In the end I used a great function by allixsenos from the PHP.net comments area.
Comments
Please login to comment on this page