UNICODE 7

中文man手册

目录

UNICODE

NAME
æè¿° (DESCRIPTION)
ç»åå符 (COMBINING CHARACTERS)
å®ç°çº§å« (IMPLEMENTATION LEVELS)
LINUX ä¸ç UNICODE (UNICODE UNDER LINUX)
ç§æåº (PRIVATE AREA)
æç® (LITERATURE)
ç¼ºæ¾ (BUGS)
ä½è (AUTHOR)
åè§(SEE ALSO)
[䏿çç»´æ¤äºº]
[ä¸æçææ°æ´æ°]
ãä¸å½linux论åmanæå页翻è¯è®¡åã:
è·

NAME

Unicode - 16 ä½ç»ä¸è¶çº§å符é

æè¿° (DESCRIPTION)

å½éæ å ISO 10646 å®ä¹äº éç¨å符é (Universal Character Set, UCS). UCS å嫿æå«çåç¬¦éæ åéçå符,å¹¶ä¸ä¿è¯äº äºæ¢å¼å®¹æ§ (round-trip compatibility), ä¹å°±æ¯è¯´ï¼å½ä¸ä¸ªåç¬¦ä¸²å¨ UCS åä»»ä½å«çå- 符éä¹é´è½¬æ¢æ¶, 转æ¢è¡¨å¯ä»¥ä¿è¯ä¸ä¼æä¿¡æ¯ä¸¢å¤±ç°è±¡åçï¼

UCS åå«äºè¡¨ç¤ºå ä¹ææå·²ç¥çè¯è¨æå¿éçå符ï¼è¯¥åç¬¦éæ¢å æ¬é£äºä½¿ç¨æ©å±æä¸è¯çè¯è¨,ä¹åæ¬ä¸é¢çè¿äºè¯è¨: Greek, Cyrillic, Hebrew,Arabic, Armenian, Gregorian, Japanese, Chinese, Hiragana, Katakana, Korean, Hangul, Devangari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, alayam, Thai, Lao, Bopomofo,çç.èå¦å¤çè¯- è¨,ä¾å¦ Tibetian, Khmer, Runic, Ethiopian, Hieroglyphics, åç§ Indo-European è¯è¨, è¿æè®¸å¤å¶ä»çè¯è¨, æ£å¨è¢«å å¥å¶ ä¸.1993 å¹´åå¸è¯¥æ åçæ¶å, è¿ä¸æ¸æ¥ææ ·æè½å¯¹åé¢å å¥çè¿äº è¯è¨ä¸- ç大é¨å使´å¥½çç¼ç . å¦å¤, è¿äºè¯è¨æéçå符, 以åç± TeX, PostScript, MS-DOS, Macintosh, Videotext, OCR, è¿æå¾ å¤åå¤çç³»ç»ææä¾ç大éçå¾å½¢, å°å·ä½, æ°å¦åç§å¦ç¬¦å·, é½å·² è¢«åæ¬è¿æ¥, è¿åæ¬äºä¸äºç¹å«ç¼ç ä»¥ä¿è¯åææå¶å®å·²åå¨åç¬¦é æ åçå¯é转æ¢å¼å®¹æ§.

UCS æ å (ISO 10646) æè¿°äºä¸ä¸ª 31 ä½å符éçä½ç³», ä¸è¿, ç®å åªä½¿ç¨äºåé¢ 65534 个ç¼ç ä½ç½® (0x0000-0xfffd, å®ä»¬è¢«ç§°ä¸º åºæ¬å¤è¯è¨å (Basic Multilingual Plane,BMP)), åéç»äºå符, è䏿们 ä¼°è®¡åªæé£äºå¾å¤æªçå符(æ¯å¦ï¼ Hieroglyphics)为äºä¸é¨ çç§å¦ç®ç, æä¼å¨å°æ¥çæä¸ªæ¶å, éè¦ 16 ä½ç BMP ä¹å¤çé¨å.

ä» 0x0000 å° 0x007f ä¹é´ç UCS å符åç»å¸ US-ASCII åç¬¦éæ¯ä¸æ ·ç, èä» 0x0000 å° 0x00ff ä¹é´çå符çäº ISO 8859-1 Latin-1 å符éï¼

ç»åå符 (COMBINING CHARACTERS)

ä¸äº UCS ç¼ç è¢«åéç»äº ç»åå符(combining characters). è¿æ ·çæå½¢æç¹ç±»ä¼¼äºæåæºä¸çéé³é®. ä¸ä¸ªç»ååç¬¦åªæ¯ ç»åé¢çå- 符添å ä¸ä¸ªéé³. å¨ UCS éæéè¦çéé³åç¬¦é½æä»ä»¬èªå·±çç¼ç , ä¸è¿, ç»åå- 符æºå¶å许ç»ä»»ä¸å符添å éé³åå¶ä»çå¯è¯å«è®°å·. ç»åå- ç¬¦æ»æ¯è·å¨é£äºä»ä»¬æä¿®é¥°çå符åé¢. ä¾å¦,å¾·è¯ç¬¦å· Umlaut-A (带åé³ç¬¦ç大åæä¸åæ¯ A)æ¢å¯ä»¥è¡¨ç¤ºä¸º UCS ç¼ç  0x00c4, ä¹å¯ä»¥ ç¨ä¸ä¸ªæ£å¸¸ç"大åæä¸åæ¯ A"åé¢è·ä¸ä¸ª"ç»ååé³ç¬¦å·": 0x0041 0x0308 æ¥è¡¨ç¤ºï¼

å®ç°çº§å« (IMPLEMENTATION LEVELS)

ç±äºä¸æ¯ææç³»ç»é½æ¯æè±¡ç»ååç¬¦è¿æ ·çé«çº§æºå¶, ISO 10646 ææäº UCS çä¸ç§å®ç°çº§å«:
çº§å« 1 (Level 1)

䏿¯æç»åå符å Hangul Jamo å符(æé²è¯çä¸ç§æ´å¤ æçä¸ç¨çç¼ç , Hangul é³èç¼ç æä¸¤æä¸ä¸ªäºå符).

çº§å« 2 (Level 2)

类似äºçº§å«1, å´å¨ä¸äºè¯è¨éé¢ä¹æ¯æä¸äºç»åå符. (æ¯å¦ï¼ Hebrew, Arabic, Devangari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugo, Kannada, Malayalam, Thai å Lao).

çº§å« 3 (Level 3)

æ¯æææ UCS å符.

Unicode åä¼åå¸ç Unicode 1.1 æ åå ISO 10646 ææè¿°ç 飿 ·, å¨ç¬¬ 3 æ§è¡çº§å«åªåæ¬äº UCS (åºæ¬å¤è¯è¨å Basic Multilingual Plane). Unicode 1.1 è¿ä¸ºä¸äº ISO 10646 çå符å®ä¹å å¥äºä¸äºè¯ä¹å®ä¹.

LINUX ä¸ç UNICODE (UNICODE UNDER LINUX)

å¨ Linux ä¸, 为äºéä½ç»åå符çå®ç°å¤ææ§, ç®ååªåæ¬äºæ§ è¡çº§å« 1 ä¸ç BMP. æ´é«çæ§è¡çº§å«æ´éåäºä¸é¨çåå¤çæ ¼å¼, è䏿¯ä¸ä¸ªæ®éçç³»ç»å符é. å¨ linux ä¸ C çç±»å wchar_t æ¯ä¸ä¸ª æç¬¦å·ä½ç 32 使´åå¹¶ä¸å¶å¼è§£é为 UCS4 ç¼ç ï¼

æ¬å°å设置ææç³»ç»å符ç¼ç æ¯ä½¿ç¨è¯¸å¦ UTF-8 è¿æ¯ ISO 8859-1è¿æ ·çç¼ç ï¼ 象åºå½æ° wctomb, mbtowc, æè wprintf å°±å¯ä»¥ç¨äºåé¨ wchar_t å符åå符串ä¸ç³»ç»å符ç¼ç ä¹é´å转æ¢.

ç§æåº (PRIVATE AREA)

å¨ BMP é, 0xe000 å° 0xf8ff çèå´è¢«æ åä¿çåç§ç¨å èæ°¸è¿ä¸ä¼ 被åéç»ä»»ä½å符. å¯¹äº Linux 社åº, è¯¥ç§æåºè¢«åç»å为å¯ä»¥è¢«ä»»ä½ç»ç«¯ç¨æ· ç¬ç«ä½¿ç¨ç 0xe000 å° 0xefff çèå´, 以åä» 0xf000 å° 0xf8ff ç»ææ linux ç¨æ·æå±ç¨ç linux åº.H. Peter Anvin(<Peter.Anvin@linux.org>, Yggdrasil Computing,Inc) ç°å¨ç»´æ¤ç»è®°åéå° linux åºçå符. 该åºåæ¬ä¸äº Unicode ä¸ç¼ºå°ç DEC VT100 çå¾å½¢å符, è¿ä½¿æ§å¶å° çåä½ç¼å²åºå¯ä»¥ç´æ¥è·å¾è¿äºå符, 该åºè¿åæ¬ä¸äºè±¡ Klingon è¿æ ·çå¤èè¯è¨æä½¿ç¨çå符.

æç® (LITERATURE)

*

Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane. International Standard ISO 10646-1, International Organization for Standardization, Geneva, 1993.

è¿æ¯ UCS çæ£å¼è§è, é常æ£å¼, ä¹å¾å, è¿é常贵. 妿è¦å® è´ä¿¡æ¯, å»çç www.iso.ch.

*

The Unicode Standard - Worldwide Character Encoding Version 1.0. The Unicode Consortium, Addison-Wesley, Reading, MA, 1991.

Unicode å·²ç»æ 1.1.4 çå¯ç¨,ä¸ 1.0 ççå·®å«å¯ä»¥å¨ ftp.unicode.org æ¾å°. Unicode 2.0 ä¹å°å¨ 1996 å¹´åºç䏿¬ä¹¦.

*

S. Harbison, G. Steele. C - A Reference Manual. Fourth edition, Prentice Hall, Englewood Cliffs, 1995, ISBN 0-13-326224-3.

䏿¬å¾å¥½ç C è¯è¨ç¼ç¨åè书. ç°å¨ç第åçåå«äº 1994 年对æ å ISO C çç¬¬ä¸æ¬¡ä¿®æ£ (ISO/IEC 9899:1990), æ·»å äºå¤§é å¤çå¤ç§å符éçæ°ç C åºå½æ°.

ç¼ºæ¾ (BUGS)

å¨åè¿ä¸ªæåé¡µçæ¶å,linux 对 UCS ç C è¯è¨åºæ¯æè¿æªå®æ.

ä½è (AUTHOR)

Markus Kuhn <mskuhn@cip.informatik.uni-erlangen.de>

åè§(SEE ALSO)

utf-8(7)ï¼ http://www.linuxforum.net/books/UTF-8-Unicode.html

[䏿çç»´æ¤äºº]

mapping <mapping@263.net>

[ä¸æçææ°æ´æ°]

2000/11/06

ãä¸å½linux论åmanæå页翻è¯è®¡åã:

http://cmpp.linuxforum.net

è·

æ¬é¡µé¢ä¸æçç±ä¸æ man æå页计åæä¾ã
䏿 man æå页计åï¼https://github.com/man-pages-zh/manpages-zh