Character Sets and Document Encoding
Monday 2nd August, 2010
I've just spent an embarrassingly long time trying to solve a problem with normalizing a string (i.e. getting rid of all non-western characters and replacing them with equivalents).
I needed to take the string "L'Autopsie Phénoménale De Dieu" and remove the é's, replacing them with e's - this was so the string could be used as part of a URL.
I tried all sorts of things, from the str_replace / strstr functions to the all sorts of string encoding, decoding, regex replacements, but nothing seemed to work as expected.
Eventually I discovered that the problem was nothing to do with the PHP - it was infact the document encoding. All pages on my site are set with the standard UTF-8 charset, but the PHP files that were actually performing the string replace functions were saved in a 'Western European' encoding type. Changing this to UTF-8 soon solved the problem.
In the end I used a great function by allixsenos from the PHP.net comments area.
Comments
Please login to comment on this page