Hi,
--- Siu Sun <siusun(a)best-view.net> wrote:
Chinese, Japanese, Korean etc.. These we call it
"Double-Bytes Character".
the nickname of this group is CJK.
there's CJKV, which V is for Vietnamese.
Vietnam also use Chinese character.
PS. I have some confuse on UTF-8 and Unicode. UTF-8
and Unicode is the same
Unicode is just a general name, an organization name.
if you need to made a specific to encoding name,
the popular one is UCS-2.
UCS-2 encoding is like what you have mentioned,
using two bytes for one characters.
(UCS = Universal Character Set)
But like you've also mentioned, two bytes character
processing led many problem with legacy system.
To solve this compatibiltiy problem,
Unicode introduces UTF-8 encoding.
UTF is UCS Transformation Format, this means
UTF try to kept UCS's double-bytes char
in the form of ASCII single-byte char.
(so, the plain ASCII file is also counted as UTF-8
file)
U+0000..U+007F: 0xxxxxxx
U+0080..U+07FF: 110xxxxx 10xxxxxx
U+0800..U+FFFF: 1110xxxx 10xxxxxx 10xxxxxx
with this scheme,
some characters (from U+0000..U+007F which is
essentially ASCII 0x00..0x7F) will not change,
some characters (from U+0080..U+07FF) will be
kept using two bytes, and some characters
will be kept using three bytes.
for more details, go
http://www.unicode.org/glossary/
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
----
for brief,
* Unicode is not official name of any encoding.
* Unicode != UTF-8
:)
Art
=====
----
FREE SOFTWARE --> free as in "freedom"
http://www.fsf.org/philosophy/free-sw.html
----
SIIT student community
http://siit.net
Sirindhorn Int'l Inst of Tech, Thammasat U
__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com