×­Ìå×Ö×÷Æ· | ×­Ìå×Ö֪ʶ | ¼ÓÈëÊÕ²Ø ×­Ìå×Öת»»Æ÷Èí¼þ¿Éת»»¶àÖÖ×­Ìå×ÖÔÚÏßÔ¤ÀÀ ÍøÒ³°æ V2.0
×­Ìå×Öת»»Æ÷

µ±Ç°Î»Öãº×­Ìå×ÖÍø > ֪ʶ¿â >

utf8±àÂëת»»Æ÷

ʱ¼ä:2023-12-30 08:56:19 ±à¼­£º×­×Ö¾ý À´Ô´£º×­Ìå×ÖÍø

Ò» Ô¤±¸ÖªÊ¶

1£¬×Ö·û£º×Ö·ûÊdzéÏóµÄ×îСÎı¾µ¥Î»¡£ËüûÓй̶¨µÄÐÎ×´£¨¿ÉÄÜÊÇÒ»¸ö×ÖÐΣ©£¬¶øÇÒûÓÐÖµ¡£¡°A¡±ÊÇÒ»¸ö×Ö·û£¬¡°€¡±£¨µÂ¹ú¡¢·¨¹úºÍÐí¶àÆäËûÅ·ÖÞ¹ú¼ÒͨÓûõ±ÒµÄ±êÖ¾£©Ò²ÊÇÒ»¸ö×Ö·û¡£¡°ÖС±¡°¹ú¡±ÕâÊÇÁ½¸öºº×Ö×Ö·û¡£×Ö·û½ö½ö´ú±íÒ»¸ö·ûºÅ£¬Ã»ÓÐÈκÎʵ¼ÊÖµµÄÒâÒå¡£

2£¬×Ö·û¼¯£º×Ö·û¼¯ÊÇ×Ö·ûµÄ¼¯ºÏ¡£ÀýÈ磬ºº×Ö×Ö·ûÊÇÖйúÈË×îÏÈ·¢Ã÷µÄ×Ö·û£¬ÔÚÖÐÎÄ¡¢ÈÕÎÄ¡¢º«ÎĺÍÔ½ÄÏÎĵÄÊéдÖÐʹÓá£ÕâҲ˵Ã÷ÁË×Ö·ûºÍ×Ö·û¼¯Ö®¼äµÄ¹ØÏµ£¬×Ö·û×é³É×Ö·û¼¯£¨iso8859-1£¬GB2312/GBK£¬unicode£©¡£

3£¬´úÂëµã£º×Ö·û¼¯ÖеÄÿ¸ö×Ö·û¶¼±»·ÖÅäµ½Ò»¸ö¡°´úÂëµã¡±¡£Ã¿¸ö´úÂëµã¶¼ÓÐÒ»¸öÌØ¶¨µÄΨһÊýÖµ£¬³ÆÎª±êÖµ¡£¸Ã±êÁ¿ÖµÍ¨³£ÓÃÊ®Áù½øÖƱíʾ¡£

4£¬´úÂëµ¥Ôª£º ÔÚÿÖÖ±àÂëÐÎʽÖУ¬´úÂëµã±»Ó³Éäµ½Ò»¸ö»ò¶à¸ö´úÂëµ¥Ôª¡£¡°´úÂëµ¥Ôª¡±ÊǸ÷¸ö±àÂ뷽ʽÖеĵ¥¸öµ¥Ôª¡£´úÂëµ¥ÔªµÄ´óСµÈЧÓÚÌØ¶¨±àÂ뷽ʽµÄλÊý£º UTF-8 £ºUTF-8 ÖеĴúÂëµ¥ÔªÓÉ 8 λ×é³É£»ÔÚ UTF-8 ÖУ¬ÒòΪ´úÂëµ¥Ôª½ÏСµÄÔµ¹Ê£¬Ã¿¸ö´úÂëµã³£³£±»Ó³Éäµ½¶à¸ö´úÂëµ¥Ôª¡£´úÂëµã½«±»Ó³Éäµ½Ò»¸ö¡¢Á½¸ö¡¢Èý¸ö»òËĸö´úÂëµ¥Ôª£» UTF-16 £ºUTF-16 ÖеĴúÂëµ¥ÔªÓÉ 16 λ×é³É£»UTF-16 µÄ´úÂëµ¥Ôª´óСÊÇ 8 λ´úÂëµ¥ÔªµÄÁ½±¶¡£ËùÒÔ£¬±êÁ¿ÖµÐ¡ÓÚ U+10000 µÄ´úÂëµã±»±àÂëµ½µ¥¸ö´úÂëµ¥ÔªÖУ» UTF-32£ºUTF-32 ÖеĴúÂëµ¥ÔªÓÉ 32 λ×é³É£» UTF-32 ÖÐʹÓÃµÄ 32 λ´úÂëµ¥Ôª×ã¹»´ó£¬Ã¿¸ö´úÂëµã¶¼¿É±àÂëΪµ¥¸ö´úÂëµ¥Ôª£» GB18030£ºGB18030 ÖеĴúÂëµ¥ÔªÓÉ 8 λ×é³É£»ÔÚ GB18030 ÖУ¬ÒòΪ´úÂëµ¥Ôª½ÏСµÄÔµ¹Ê£¬Ã¿¸ö´úÂëµã³£³£±»Ó³Éäµ½¶à¸ö´úÂëµ¥Ôª¡£´úÂëµã½«±»Ó³Éäµ½Ò»¸ö¡¢Á½¸ö»òËĸö´úÂëµ¥Ôª¡£

5£¬¾ÙÀý£º ¡°Öйú±±¾©Ïã½¶ÊǸö´ó±¿µ°¡±ÕâÊÇÎÒ¶¨ÒåµÄaka×Ö·û¼¯£»

¸÷×Ö·û¶ÔÓ¦´úÂëµãΪ£º

±± 00000001

¾© 00000010

Ïã 10000001

½¶ 10000010

ÊÇ 10000100

¸ö 10001000

´ó 10010000

±¿ 10100000

µ° 11000000

ÖÐ 00000100

¹ú 00001000

ÏÂÃæÊÇÎÒ¶¨ÒåµÄ zixia ±àÂë·½°¸£¨8룩£¬¿ÉÒÔ¿´µ½ËüµÄ±àÂëÖбíʾÁËaka×Ö·û¼¯µÄËùÓÐ×Ö·û¶ÔÓ¦µÄ ´úÂëµ¥Ôª£»

±± 10000001 ¾© 10000010 Ïã 00000001 ½¶ 00000010 ÊÇ 00000100 ¸ö 00001000 ´ó 00010000 ±¿ 00100000 µ° 01000000 ÖÐ 10000100 ¹ú 10001000

ËùνÎı¾Îļþ ¾ÍÊÇÎÒÃǰ´Ò»¶¨±àÂ뷽ʽ½«¶þ½øÖÆÊý¾Ý±íʾΪ¶ÔÓ¦µÄÎı¾Èç 00000001000000100000010000001000000100000010000001000000ÕâÑùµÄÎļþ¡£ÎÒÓÃÒ»¸öÖ§³Ö zixia±àÂëºÍaka×Ö·û¼¯µÄ¼Çʱ¾´ò¿ª£¬Ëü¾Í°´ÕÕ±àÂë·½°¸ÏÔʾΪ ¡°Ïã½¶ÊǸö´ó±¿µ° ¡± Èç¹ûÎÒ°ÑÕâЩ×Ö·û°´ÕÕGBKÁí´æÒ»¸öÎļþ£¬ÄÇôÔò¿Ï¶¨²»ÊÇÕâ¸ö£¬¶øÊÇ 1100111111100011 1011110110110110 1100101011000111 1011100011110110 1011010011110011 1011000110111111 1011010110110000 110100001010

¶þ£¬×Ö·û¼¯

1£¬ ³£ÓÃ×Ö·û¼¯·ÖÀà ASCII¼°ÆäÀ©Õ¹×Ö·û¼¯ ×÷Ó㺱íÓïÓ¢Óï¼°Î÷Å·ÓïÑÔ¡£ λÊý£ºASCIIÊÇÓÃ7λ±íʾµÄ£¬Äܱíʾ128¸ö×Ö·û£»ÆäÀ©Õ¹Ê¹ÓÃ8λ±íʾ£¬±íʾ256¸ö×Ö·û¡£ ·¶Î§£ºASCII´Ó00µ½7F£¬À©Õ¹´Ó00µ½FF¡£ ISO-8859-1×Ö·û¼¯ ×÷ÓãºÀ©Õ¹ASCII£¬±íʾÎ÷Å·¡¢Ï£À°ÓïµÈ¡£ λÊý£º8룬 ·¶Î§£º´Ó00µ½FF£¬¼æÈÝASCII×Ö·û¼¯¡£ GB2312×Ö·û¼¯ ×÷Ó㺹ú¼Ò¼òÌåÖÐÎÄ×Ö·û¼¯£¬¼æÈÝASCII¡£ λÊý£ºÊ¹ÓÃ2¸ö×Ö½Ú±íʾ£¬Äܱíʾ7445¸ö·ûºÅ£¬°üÀ¨6763¸öºº×Ö£¬¼¸ºõ¸²¸ÇËùÓÐ¸ßÆµÂʺº×Ö¡£ ·¶Î§£º¸ß×Ö½Ú´ÓA1µ½F7, µÍ×Ö½Ú´ÓA1µ½FE¡£½«¸ß×ֽں͵Í×Ö½Ú·Ö±ð¼ÓÉÏ0XA0¼´¿ÉµÃµ½±àÂë¡£ BIG5×Ö·û¼¯ ×÷ÓãºÍ³Ò»·±Ìå×Ö±àÂë¡£ λÊý£ºÊ¹ÓÃ2¸ö×Ö½Ú±íʾ£¬±íʾ13053¸öºº×Ö¡£ ·¶Î§£º¸ß×Ö½Ú´ÓA1µ½F9£¬µÍ×Ö½Ú´Ó40µ½7E£¬A1µ½FE¡£ GBK×Ö·û¼¯ ×÷ÓãºËüÊÇGB2312µÄÀ©Õ¹£¬¼ÓÈë¶Ô·±Ìå×ÖµÄÖ§³Ö£¬¼æÈÝGB2312¡£ λÊý£ºÊ¹ÓÃ2¸ö×Ö½Ú±íʾ£¬¿É±íʾ21886¸ö×Ö·û¡£ ·¶Î§£º¸ß×Ö½Ú´Ó81µ½FE£¬µÍ×Ö½Ú´Ó40µ½FE¡£ GB18030×Ö·û¼¯ ×÷ÓãºËü½â¾öÁËÖÐÎÄ¡¢ÈÕÎÄ¡¢³¯ÏÊÓïµÈµÄ±àÂ룬¼æÈÝGBK¡£ λÊý£ºËü²ÉÓñä×Ö½Ú±íʾ(1 ASCII£¬2£¬4×Ö½Ú)¡£¿É±íʾ27484¸öÎÄ×Ö¡£ ·¶Î§£º1×Ö½Ú´Ó00µ½7F; 2×Ö½Ú¸ß×Ö½Ú´Ó81µ½FE£¬µÍ×Ö½Ú´Ó40µ½7EºÍ80µ½FE£»4×Ö½ÚµÚÒ»Èý×Ö½Ú´Ó81µ½FE£¬µÚ¶þËÄ×Ö½Ú´Ó30µ½39¡£ UCS×Ö·û¼¯ ×÷Ó㺹ú¼Ê±ê×¼ ISO 10646 ¶¨ÒåÁËͨÓÃ×Ö·û¼¯ (Universal Character Set)¡£ËüÊÇÓëUNICODEͬÀàµÄ×éÖ¯£¬UCS-2ºÍUNICODE¼æÈÝ¡£ λÊý£ºËüÓÐUCS-2ºÍUCS-4Á½ÖÖ¸ñʽ£¬·Ö±ðÊÇ2×Ö½ÚºÍ4×Ö½Ú¡£ ·¶Î§£ºÄ¿Ç°£¬UCS-4Ö»ÊÇÔÚUCS-2Ç°Ãæ¼ÓÁË0¡Á0000¡£ UNICODE×Ö·û¼¯ ×÷ÓãºÎªÊÀ½ç650ÖÖÓïÑÔ½øÐÐͳһ±àÂ룬¼æÈÝISO-8859-1¡£ λÊý£ºUNICODE×Ö·û¼¯Óжà¸ö±àÂ뷽ʽ£¬·Ö±ðÊÇUTF-8£¬UTF-16ºÍUTF-32¡£

2 £¬°´Ëù±íʾµÄÎÄ×Ö·ÖÀà ÓïÑÔ ×Ö·û¼¯ ÕýʽÃû³Æ Ó¢Óï¡¢Î÷Å·Óï ASCII£¬ISO-8859-1 MBCS ¶à×Ö½Ú ¼òÌåÖÐÎÄ GB2312 MBCS ¶à×Ö½Ú ·±ÌåÖÐÎÄ BIG5 MBCS ¶à×Ö½Ú ¼ò·±ÖÐÎÄ GBK MBCS ¶à×Ö½Ú ÖÐÎÄ¡¢ÈÕÎļ°³¯ÏÊÓï GB18030 MBCS ¶à×Ö½Ú ¸÷¹úÓïÑÔ UNICODE£¬UCS DBCS ¿í×Ö½ÚÈý

£¬±àÂëUTF-8£º²ÉÓñ䳤×Ö½Ú (1 ASCII, 2 Ï£À°×Öĸ, 3 ºº×Ö, 4 Æ½Ãæ·ûºÅ) ±íʾ£¬ÍøÂç´«Êä, ¼´Ê¹´íÁËÒ»¸ö×Ö½Ú£¬²»Ó°ÏìÆäËû×Ö½Ú£¬¶øË«×Ö½ÚÖ»ÒªÒ»¸ö´íÁË£¬ÆäËûÒ²´íÁË£¬¾ßÌåÈçÏ£º Èç¹ûÖ»ÓÐÒ»¸ö×Ö½ÚÔòÆä×î¸ß¶þ½øÖÆÎ»Îª0£»Èç¹ûÊǶà×Ö½Ú£¬ÆäµÚÒ»¸ö×Ö½Ú´Ó×î¸ßλ¿ªÊ¼£¬Á¬ÐøµÄ¶þ½øÖÆÎ»ÖµÎª1µÄ¸öÊý¾ö¶¨ÁËÆä±àÂëµÄ×Ö½ÚÊý£¬ÆäÓà¸÷×Ö½Ú¾ùÒÔ10¿ªÍ·¡£UTF-8×î¶à¿ÉÓõ½6¸ö×Ö½Ú¡£ UTF-16£º²ÉÓÃ2×Ö½Ú£¬UnicodeÖв»Í¬²¿·ÖµÄ×Ö·û¶¼Í¬Ñù»ùÓÚÏÖÓеıê×¼¡£ÕâÊÇΪÁ˱ãÓÚת»»¡£´Ó 0¡Á0000µ½0¡Á007FÊÇASCII×Ö·û£¬´Ó0¡Á0080µ½0¡Á00FFÊÇISO-8859-1¶ÔASCIIµÄÀ©Õ¹¡£Ï£À°×Öĸ±íʹÓôÓ0¡Á0370µ½ 0¡Á03FF µÄ´úÂ룬˹À­·òÓïʹÓôÓ0¡Á0400µ½0¡Á04FFµÄ´úÂ룬ÃÀ¹úʹÓôÓ0¡Á0530µ½0¡Á058FµÄ´úÂ룬ϣ²®À´ÓïʹÓôÓ0¡Á0590µ½0¡Á05FFµÄ´úÂë¡£Öйú¡¢ÈÕ±¾ºÍº«¹úµÄÏóÐÎÎÄ×Ö£¨×ܳÆÎªCJK£©Õ¼ÓÃÁË´Ó0¡Á3000µ½0¡Á9FFFµÄ´úÂ룻ÓÉÓÚ0¡Á00ÔÚcÓïÑÔ¼°²Ù×÷ϵͳÎļþÃûµÈÖÐÓÐÌØÊâÒâÒ壬¹ÊºÜ¶àÇé¿öÏÂÐèÒªUTF-8±àÂë±£´æÎı¾£¬È¥µôÕâ¸ö0¡Á00¡£¾ÙÀýÈçÏ£º UTF-16: 0¡Á0080 =0000 0000 1000 0000 UTF-8: 0xC280=1100 0010 1000 0000 UTF-32£º²ÉÓÃ4×Ö½Ú¡£ ÓÅȱµã UTF-8¡¢UTF-16ºÍUTF-32¶¼¿ÉÒÔ±íʾÓÐЧ±àÂë¿Õ¼ä (U+000000-U+10FFFF) ÄÚµÄËùÓÐUnicode×Ö·û¡£ ʹÓÃUTF-8±àÂëʱASCII×Ö·ûÖ»Õ¼1¸ö×Ö½Ú£¬´æ´¢Ð§ÂʱȽϸߣ¬ÊÊÓÃÓÚÀ­¶¡×Ö·û½Ï¶àµÄ³¡ºÏÒÔ½ÚÊ¡¿Õ¼ä¡£ ¶ÔÓÚ´ó¶àÊý·ÇÀ­¶¡×Ö·û£¨ÈçÖÐÎĺÍÈÕÎÄ£©À´Ëµ£¬UTF-16ËùÐè´æ´¢¿Õ¼ä×îС£¬Ã¿¸ö×Ö·ûÖ»Õ¼2¸ö×Ö½Ú¡£ Windows NTÄÚºËÊÇUnicode£¨UTF-16£©£¬²ÉÓÃUTF-16±àÂëÔÚµ÷ÓÃϵͳAPIʱÎÞÐèת»»£¬´¦ÀíËÙ¶ÈÒ²±È½Ï¿ì¡£ ²ÉÓÃUTF-16ºÍUTF-32»áÓÐBig EndianºÍLittle EndianÖ®·Ö£¬¶øUTF-8ÔòûÓÐ×Ö½Ú˳ÐòÎÊÌ⣬ËùÒÔUTF-8Êʺϴ«ÊäºÍͨÐÅ¡£ UTF-32²ÉÓÃ4×Ö½Ú±àÂ룬һ·½Ãæ´¦ÀíËٶȱȽϿ죬µ«ÁíÒ»·½ÃæÒ²ÀË·ÑÁË´óÁ¿¿Õ¼ä£¬Ó°Ïì´«ÊäËÙ¶È£¬Òò¶øºÜÉÙʹÓá£

ËÄ£¬ÈçºÎÅжÏ×Ö·û¼¯1£¬×Ö½ÚÐò Ê×ÏÈ˵һÏÂ×Ö½ÚÐò¶Ô±àÂëµÄÓ°Ï죬×Ö½ÚÐò·ÖΪBig Endian×Ö½ÚÐòºÍLittle Endian×Ö½ÚÐò¡£²»Í¬µÄ´¦ÀíÆ÷¿ÉÄܲ»Ò»Ñù¡£ËùÒÔ£¬´«ÊäʱÐèÒª¸æËß´¦ÀíÆ÷µ±Ê±µÄ±àÂë×Ö½ÚÐò¡£¶ÔÓÚǰÕß¶øÑÔ£¬¸ßλ×Ö½Ú´æÔڵ͵ØÖ·£¬µÍ×Ö½Ú´æÓڸߵØÖ·£»ºóÕßÏà·´¡£ÀýÈ磬0X03AB, Big Endian×Ö½ÚÐò 0000: 0 3 0001: AB Little Endian×Ö½ÚÐòÊÇ 0000: AB 0001: 0 3 2£¬±àÂëʶ±ð UNICODE£¬¸ù¾Ýǰ¼¸¸ö×Ö½Ú¿ÉÒÔÅжÏUNICODE×Ö·û¼¯µÄ¸÷ÖÖ±àÂ룬½Ð×öByte Order Mask·½·¨BOM£º UTF-8: EFBBBF (·ûºÏUTF-8¸ñʽ£¬Çë¿´ÉÏÃæ¡£µ«Ã»Óк¬ÒåÔÚUCS¼´UNICODEÖÐ) UTF-16 Big Endian£ºFEFF (ûÓк¬ÒåÔÚUCS-2ÖÐ) UTF-16 Little Endian£ºFFFE (ûÓк¬ÒåÔÚUCS-2ÖÐ) UTF-32 Big Endian£º0000FEFF (ûÓк¬ÒåÔÚUCS-4ÖÐ) UTF-32 Little Endian£ºFFFE0000 (ûÓк¬ÒåÔÚUCS-4ÖÐ) GB2312£º¸ß×ֽں͵Í×ֽڵĵÚ1λ¶¼ÊÇ1¡£ BIG5£¬GBK&GB18030£º¸ß×ֽڵĵÚ1λΪ1¡£²Ù×÷ϵͳÓÐĬÈϵıàÂ룬³£ÎªGBK£¬¿ÉÒÔÏÂÔØ±ðµÄ²¢Éý¼¶¡£ ͨ¹ýÅжϸß×ֽڵĵÚ1λ´Ó¶øÖªµÀÊÇASCII»òÕߺº×Ö±àÂë¡£

#include#include//GBK±àÂëת»»µ½UTF8±àÂëint GBKToUTF8(unsigned char * lpGBKStr,unsigned char * lpUTF8Str,int nUTF8StrLen){ wchar_t * lpUnicodeStr= NULL; int nRetLen=0; if(!lpGBKStr) //Èç¹ûGBK×Ö·û´®ÎªNULLÔò³ö´íÍ˳ö return 0; nRetLen =::MultiByteToWideChar(CP_ACP,0,(char *)lpGBKStr,-1,NULL,NULL); //»ñȡת»»µ½Unicode±àÂëºóËùÐèÒªµÄ×Ö·û¿Õ¼ä³¤¶È lpUnicodeStr=new WCHAR[nRetLen + 1]; //ΪUnicode×Ö·û´®¿Õ¼ä nRetLen=::MultiByteToWideChar(CP_ACP,0,(char *)lpGBKStr,-1,lpUnicodeStr,nRetLen); //ת»»µ½Unicode±àÂë if(!nRetLen) //ת»»Ê§°ÜÔò³ö´íÍ˳ö return 0; nRetLen =::WideCharToMultiByte(CP_UTF8,0,lpUnicodeStr,-1,NULL,0,NULL,NULL); //»ñȡת»»µ½UTF8±àÂëºóËùÐèÒªµÄ×Ö·û¿Õ¼ä³¤¶È if(!lpUTF8Str) //Êä³ö»º³åÇøÎª¿ÕÔò·µ»Ø×ª»»ºóÐèÒªµÄ¿Õ¼ä´óС { if(lpUnicodeStr) delete []lpUnicodeStr; return nRetLen; } if(nUTF8StrLen< nRetLen) //Èç¹ûÊä³ö»º³åÇø³¤¶È²»¹»ÔòÍ˳ö { if(lpUnicodeStr) delete []lpUnicodeStr; return 0; } nRetLen =::WideCharToMultiByte(CP_UTF8,0,lpUnicodeStr,-1,(char *)lpUTF8Str,nUTF8StrLen,NULL,NULL); //ת»»µ½UTF8±àÂë if(lpUnicodeStr) delete []lpUnicodeStr; return nRetLen;} //ʹÓÃÕâÁ½¸öº¯ÊýµÄÀý×Óint main(){ char cGBKStr[]="ÎÒÊÇÖйúÈË!"; char * lpGBKStr= NULL; char * lpUTF8Str= NULL; FILE * fp= NULL; int nRetLen=0; nRetLen =GBKToUTF8((unsigned char *) cGBKStr,NULL,NULL); printf("ת»»ºóµÄ×Ö·û´®ÐèÒªµÄ¿Õ¼ä³¤¶ÈΪ£º%d ",nRetLen); lpUTF8Str =new char[nRetLen + 1]; nRetLen =GBKToUTF8((unsigned char *)cGBKStr,(unsigned char *)lpUTF8Str,nRetLen); if(nRetLen) { printf("GBKToUTF8ת»»³É¹¦£¡"); } else { printf("GBKToUTF8ת»»Ê§°Ü£¡"); goto Ret0; } fp =fopen("C:\\GBKtoUTF8.txt","wb"); //±£´æµ½Îı¾Îļþ fwrite(lpUTF8Str,nRetLen,1,fp); fclose(fp); getchar(); //ÏÈÈ¥´ò¿ªÄǸöÎı¾Îļþ¿´¿´£¬µ¥»÷¼Çʱ¾µÄ¡°Îļþ¡±-¡°Áí´æÎª¡±²Ëµ¥£¬ÔÚ¶Ô»°¿òÖп´µ½±àÂë¿ò±äΪÁË¡°UTF-8¡±ËµÃ÷ת»»³É¹¦ÁË Ret0: { if(lpGBKStr) delete []lpGBKStr; if(lpUTF8Str) delete []lpUTF8Str; } return 0;}

Karlson,2009-07-25 13:39:57

1 class CChineseCode 2 3 { 4 5 public: 6 7 static void UTF_8ToUnicode(wchar_t* pOut,char *pText); // °ÑUTF-8ת»»³ÉUnicode 8 9 static void UnicodeToUTF_8(char* pOut,wchar_t* pText); //Unicode ת»»³ÉUTF-8 10 11 static void UnicodeToGB2312(char* pOut,wchar_t uData); // °ÑUnicode ת»»³É GB2312 12 13 static void Gb2312ToUnicode(wchar_t* pOut,char *gbBuffer);// GB2312 ת»»³É¡¡Unicode 14 15 static void GB2312ToUTF_8(string& pOut,char *pText, int pLen);//GB2312 תΪ UTF-8 16 17 static void UTF_8ToGB2312(string &pOut, char *pText, int pLen);//UTF-8 תΪ GB2312 18 19 }; 20 21 ÀàʵÏÖ 22 23 void CChineseCode::UTF_8ToUnicode(wchar_t* pOut,char *pText) 24 25 { 26 27 char* uchar=(char *)pOut; 28 29 uchar[1]=((pText[0] & 0x0F)<< 4) + ((pText[1] >>2) & 0x0F); 30 31 uchar[0]=((pText[1] & 0x03)<< 6) + (pText[2] & 0x3F); 32 33 return; 34 35 } 36 37 void CChineseCode::UnicodeToUTF_8(char* pOut,wchar_t* pText) 38 39 { 40 41 // ×¢Òâ WCHAR¸ßµÍ×ÖµÄ˳Ðò,µÍ×Ö½ÚÔÚǰ£¬¸ß×Ö½ÚÔÚºó 42 43 char* pchar=(char *)pText; 44 45 pOut[0]=(0xE0 | ((pchar[1] & 0xF0) >>4)); 46 47 pOut[1]=(0x80 | ((pchar[1] & 0x0F)<< 2)) + ((pchar[0] & 0xC0) >>6); 48 49 pOut[2]=(0x80 | (pchar[0] & 0x3F)); 50 51 return; 52 53 } 54 55 void CChineseCode::UnicodeToGB2312(char* pOut,wchar_t uData) 56 57 { 58 59 WideCharToMultiByte(CP_ACP,NULL,&uData,1,pOut,sizeof(wchar_t),NULL,NULL); 60 61 return; 62 63 } 64 65 void CChineseCode::Gb2312ToUnicode(wchar_t* pOut,char *gbBuffer) 66 67 { 68 69 ::MultiByteToWideChar(CP_ACP,MB_PRECOMPOSED,gbBuffer,2,pOut,1); 70 71 return ; 72 73 } 74 75 void CChineseCode::GB2312ToUTF_8(string& pOut,char *pText, int pLen) 76 77 { 78 79 char buf[4]; 80 81 int nLength=pLen* 3; 82 83 char* rst=new char[nLength]; 84 85 memset(buf,0,4); 86 87 memset(rst,0,nLength); 88 89 int i=0; 90 91 int j=0; 92 93 while(i< pLen) 94 95 { 96 97 //Èç¹ûÊÇÓ¢ÎÄÖ±½Ó¸´ÖƾͿÉÒÔ 98 99 if( *(pText + i) >=0) 100 101 { 102 103 rst[j++]=pText[i++]; 104 105 } 106 107 else 108 109 { 110 111 wchar_t pbuffer; 112 113 Gb2312ToUnicode(&pbuffer,pText+i); 114 115 UnicodeToUTF_8(buf,&pbuffer); 116 117 unsigned short int tmp=0; 118 119 tmp=rst[j]=buf[0]; 120 121 tmp=rst[j+1]=buf[1]; 122 123 tmp=rst[j+2]=buf[2]; 124 125 j +=3; 126 127 i +=2; 128 129 } 130 131 } 132 133 rst[j]="'; 134 135 //·µ»Ø½á¹û 136 137 pOut= rst; 138 139 delete []rst; 140 141 return; 142 143 } 144 145 void CChineseCode::UTF_8ToGB2312(string &pOut, char *pText, int pLen) 146 147 { 148 149 char * newBuf=new char[pLen]; 150 151 char Ctemp[4]; 152 153 memset(Ctemp,0,4); 154 155 int i=0; 156 157 int j=0; 158 159 while(i< pLen) 160 161 { 162 163 if(pText >0) 164 165 { 166 167 newBuf[j++]=pText[i++]; 168 169 } 170 171 else 172 173 { 174 175 WCHAR Wtemp; 176 177 UTF_8ToUnicode(&Wtemp,pText + i); 178 179 UnicodeToGB2312(Ctemp,Wtemp); 180 181 newBuf[j]=Ctemp[0]; 182 183 newBuf[j + 1]=Ctemp[1]; 184 185 i +=3; 186 187 j +=2; 188 189 } 190 191 } 192 193 newBuf[j]="'; 194 195 pOut= newBuf; 196 197 delete []newBuf; 198 199 return; 200 201 }
Copyright£º2021-2023 ×­Ìå×Öת»»Æ÷ www.dddtedu.com All rights reserved.