Hi,
--- Siu Sun siusun@best-view.net wrote:
Chinese, Japanese, Korean etc.. These we call it "Double-Bytes Character".
the nickname of this group is CJK. there's CJKV, which V is for Vietnamese. Vietnam also use Chinese character.
PS. I have some confuse on UTF-8 and Unicode. UTF-8 and Unicode is the same
Unicode is just a general name, an organization name. if you need to made a specific to encoding name, the popular one is UCS-2.
UCS-2 encoding is like what you have mentioned, using two bytes for one characters. (UCS = Universal Character Set)
But like you've also mentioned, two bytes character processing led many problem with legacy system.
To solve this compatibiltiy problem, Unicode introduces UTF-8 encoding. UTF is UCS Transformation Format, this means UTF try to kept UCS's double-bytes char in the form of ASCII single-byte char. (so, the plain ASCII file is also counted as UTF-8 file)
U+0000..U+007F: 0xxxxxxx U+0080..U+07FF: 110xxxxx 10xxxxxx U+0800..U+FFFF: 1110xxxx 10xxxxxx 10xxxxxx
with this scheme, some characters (from U+0000..U+007F which is essentially ASCII 0x00..0x7F) will not change, some characters (from U+0080..U+07FF) will be kept using two bytes, and some characters will be kept using three bytes.
for more details, go http://www.unicode.org/glossary/ http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
----
for brief, * Unicode is not official name of any encoding. * Unicode != UTF-8
:)
Art
===== ---- FREE SOFTWARE --> free as in "freedom" http://www.fsf.org/philosophy/free-sw.html ---- SIIT student community http://siit.net Sirindhorn Int'l Inst of Tech, Thammasat U
__________________________________________________ Do You Yahoo!? Yahoo! - Official partner of 2002 FIFA World Cup http://fifaworldcup.yahoo.com