View on GitHub

Character Set

A character set is an encoding system to let computers know how to recognize Character , including letters, numbers, punctuation marks, and whitespace.

On this page

Character set

In earlier times, countries developed their own character sets due to their different languages used, such as Kanji JIS codes (e.g. Shift-JIS, EUC-JP, etc.) for Japanese, Big5 for traditional Chinese, and KOI8-R for Russian. However, Unicode gradually became most acceptable character set for its universal language support.

If a character set is used incorrectly (For example, Unicode for an article encoded in Big5), you may see nothing but broken characters, which are called Mojibake .

Character Set

See also