UTF-8 7

中文man手册

目录

UTF-8

NAME
æè¿°
屿§
ç¼ç æ¹å¼
举ä¾è¯´æ
éµå¾ªæ å
ä½è
åè
[䏿çç»´æ¤äºº]
[ä¸æçææ°æ´æ°]
ãä¸å½linux论åmanæå页翻è¯è®¡åã:
è·

NAME

UTF-8 - ASCII å¼å®¹çå¤åè Unicode ç¼ç

æè¿°

The Unicode å符é使ç¨çæ¯ 16 ä½ï¼ååèï¼ç ãææ®éç Unicode ç¼ç æ¹æ³ï¼ UCS-2) ç±ä¸ä¸ª 16 ä½åååºåç»æã è¿æ ·çå符串ä¸- 忬äºçä¸äºå¦â\0âæâ/âè¿æ ·ç卿件åä¸æèæ¯å¨ C åºå½æ°ä¸å·æç¹æ®æä¹çå符ã å¦å¤ï¼å¦ææ²¡æåé大çä¿®æ£çè¯ï¼å¤§é¨åæä½ ASCII ç æä»¶ç UNIX å·¥å·ä¸è½å¤æ- £ç¡®è¯å« 16 ä½çå符ãå æ¤ï¼ UCS-2 å¯¹äº Unicode çæä»¶åãææ¬æä»¶ãç¯å¢åéçç- æ¥è¯´å¹¶ä¸æ¯ä¸ç§åéçå¤é¨ç¼ç æ¹å¼ã ISO 10646 Universal Character Set (UCS), æ¯ Unicode çè¶éï¼çè³ä½¿ç¨äº 31 ä½ç¼ç æ¹å¼ï¼ å¦å¤è¿æä½¿ç¨ 32 ç¼ç ç UCS-4 乿忠·ä¸è¿°çé®é¢ã UCS-4 èç¨ UTF-8 对 Unicode UCS ç¼ç å°±ä¸ä¼åå¨è¿æ ·çé®é¢ãæä»¥ï¼UTF-8 徿æ¾çæ¯å¨ UNIX ç±»æä½ç³»ç»ä¸ç Unicode å符éçè§£å³æ¹æ¡ã

屿§

UTF-8 ç¼ç å·æä»¥ä¸ä¼è¯å±æ§ï¼

*

UCS åç¬¦ä» 0x00000000 å° 0x0000007f ï¼ä¼ ç»ç US-ASCII å- 符ï¼ç®åå°ç¼ç ä¸ºåè 0x00 å° 0x7f ï¼ä¸ ASCII ç å¼å®¹ï¼ã è¿æå³çåªåå« 7 ä½ ASCII å符çæä»¶ååç¬¦ä¸²å¨ ASCII å UTF-8. ç¼ç æ¹å¼ä¸æ¯å®å¨ä¸æ ·çã

*

ææå¤§äº 0x7f ç UCS å符被ç¼ç æä¸ºå¤åèåºåã该åºåå¨é¨æ¯ç± 0x80 å° 0fd çåç¬¦ç»æï¼ è¿æ ·å°±ä¸ä¼ææ å ASCII åç¬¦ä¼ è¢«ä½ä¸ºæä¸ªå- çä¸ä¸ªé¨åè¿ç§ç°è±¡åºç°ï¼ 对äºâ\0âåââè¿æ ·çç¹æ®å- 符æ¥è¯´ä¹å°±ä¸ä¼æé®é¢äºã

*

ä¿çäº UCS-4 åå¸ä¸çåè串çæå顺åºã

*

ææ 2ˆ32 次æ¹ç UCS ç é½è½å¤ä½¿ç¨ UTF-8 æ¥è¿è¡ç¼ç ã

*

0xfe å 0xff 两个åç¬¦å¨ UTF-8 ä¸ä¸ä¼è¢«ç¨å°ã

*

表示é ASCII ç ç UCS å¤åè串çå¼å§åç¬¦æ»æ¯ 0xc0 å° 0xfd ä¹é´çå- 符ï¼å¹¶ä¼æåºè¯¥ä¸²çé¿åº¦ã å¤åè串çå¶ä»åç¬¦é½æ¯ 0x80 å° 0xbf ä¹é´çå- 符ã è¿ä½¿å¾å忥é常ç®åï¼å¹¶ä»¤ç¼ç æ¯æ æçï¼ ä¸¢åèç°è±¡ä¹ä¸å®¹æåçã

*

ç¨ UTF-8 ç¼ç ç UCS å符å¯ä»¥å¢å å° 6 个åèçé¿åº¦ãè Unicode åªè½å¢å å° 3 个åèé¿ãç±äº Linux åªä½¿ç¨ 16 ä½ç Unicode ï¼ UCS çåéãæä»¥å¨ Linux ä¸ï¼ UTF-8 å¤åè串é¿åº¦æå¤ä¸ä¼è¶è¿ä¸ä¸ªåèã

ç¼ç æ¹å¼

ä¸é¢çåèä¸²ç¨æ¥è¡¨ç¤ºä¸ä¸ªå符ãç¨ä»ä¹ä¸²ä¾ç§è¯¥åç¬¦å¨ UCS ç¼ç ä¸- çåºå·æ¥å®ï¼
0x00000000 - 0x0000007F:

0xxxxxxx

0x00000080 - 0x000007FF:

110xxxxx 10xxxxxx

0x00000800 - 0x0000FFFF:

1110xxxx 10xxxxxx 10xxxxxx

0x00010000 - 0x001FFFFF:

11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

0x00200000 - 0x03FFFFFF:

111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

0x04000000 - 0x7FFFFFFF:

1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

è¿é xxx çä½ç½®äºè¿å¶ä½å½¢å¼çå符ç¼ç å¡«å¥ã åªç¨æç- çé£ä¸ªè¶³å¤è¡¨è¾¾ä¸ä¸ªå符ç¼ç æ°çå¤åè串ã

举ä¾è¯´æ

Unicode å符 0xa9 = 1010 1001 (çæææç符å·) å¨ UTF-8 ä¸è¢«ç¼ç ä¸ºï¼

11000010 10101001 = 0xc2 0xa9

å符0x2260 = 0010 0010 0110 0000 (âä¸çäºâ符å·)被ç¼ç ä¸ºï¼

11100010 10001001 10100000 = 0xe2 0x89 0xa0

éµå¾ªæ å

ISO 10646, Unicode 1.1, XPG4, Plan 9.

ä½è

Markus Kuhn

åè

unicode(7)

[䏿çç»´æ¤äºº]

billpan <billpan@yeah.net>

[ä¸æçææ°æ´æ°]

2000/11/09

ãä¸å½linux论åmanæå页翻è¯è®¡åã:

http://cmpp.linuxforum.net

è·

æ¬é¡µé¢ä¸æçç±ä¸æ man æå页计åæä¾ã
䏿 man æå页计åï¼https://github.com/man-pages-zh/manpages-zh