When Bad Things Happen to Good Characters

Get to Know a Character

It can be useful to know your characters, but more practically useful to know one character well. My character is an "e" with an acute accent, character code 233 (decimal) in Latin-1 and Unicode.

Inserting Characters

There are many ways it can be inserted into a document:

What Could Possibly Go Wrong?

If é is UTF-8 encoded, but displayed without decoding, it looks like this:
é
The first 128 characters in the Latin-1 character set (same as ASCII), are simply represented as themselves in UTF-8. The second half of Latin-1 characters are split. The first half of the non-ASCII Latin-1 characters are represented by themselves, preceded by code 194 decimal or C2 hex, so the UTF-8 encoding for character code 191 (decimal), ¿, is
¿
The second half of the non-ASCII Latin-1 characters are represented by a different character, preceded by code 195 decimal or C3 hex. So, when looking at UTF-8 encodings of Latin-1 characters, if you see  or à where you do not expect it, there are probably too many UTF-8 encodings. Multiple extra encodings have a pattern to them:
0 é
1 é
2 é
3 é
4 é
5 you get the idea
Note: If you see boxes in the characters above, it is because the font used is missing that character. There is no way to fix it other than getting a new font or by changing the font. Often, the fonts used in a window title or status bar or JavaScript are more limited than those used elsewhere, so the "alert", "title", and "status" buttons in the Character Conversion Corner can be used to test characters in those contexts.

Too few encodings can have a bad effect that looks different. When é is not UTF-8 encoded, it can appear like this very high numbered character:

Progressive under-encoding can result in a question mark being displayed.

Diagnostic Reference

You are now ready to diagnose UTF-8 encoding problems (e.g., with é):
Symptom Diagnosis
é no problems
é too much UTF-8 encoding, or viewing UTF-8 encoded text with Latin-1 encoding
é much too much UTF-8 encoding
too little UTF-8 encoding
? something bad happened to this character
  wild animals have eaten this character
𐀓 if you see a box, the font in use is missing this character. Firefox 3's boxes contain the hexadecimal value for the missing character, but it's still just a missing character.

Background Information from Wikipedia