Re: I love character encoding!
Tonight’s goal: Make a simple PHP class.
- Input: a URL pointing to an HTML document.
- Output: a UTF-8 version, regardless of what encoding it’s really in.
Sounds easy, right?
mb_detect_encoding and mb_convert_encdoing plus a bit string magic. Shouldn’t be that complex - love to see your implementation.
Shouldn’t be, but it is.
mb_detect_encodingdoesn’t always detect properly. It works statistically, and it’s imperfect.mb_convert_encodingis generally better thaniconv, buticonvsupports more input encodings.- Both
mb_convert_encodingandiconvare only as good as your input-encoding detection. If you tell them that the input is e.g. GB2312, you better be reasonably sure that it’s not something else.