205 lines
5.0 KiB
205 lines
5.0 KiB
13 years ago
|
Introduction:
|
||
|
=============
|
||
|
|
||
|
This file documents an approximate correlation between the data files
|
||
|
provided in the !Unicode distribution and the encoding headers in GNU
|
||
|
libiconv 1.9.1.
|
||
|
|
||
|
Those with '?' in the iconv column either are not represented in iconv
|
||
|
or I've missed the relevant header file ;)
|
||
|
|
||
|
A number of encodings are present in the iconv distribution but not
|
||
|
in !Unicode. These are documented at the end of this file.
|
||
|
|
||
|
Changelog:
|
||
|
==========
|
||
|
|
||
|
v 0.01 (09-Sep-2004)
|
||
|
~~~~~~~~~~~~~~~~~~~~
|
||
|
Initial Incarnation
|
||
|
|
||
|
v 0.02 (11-Sep-2004)
|
||
|
~~~~~~~~~~~~~~~~~~~~
|
||
|
Documented additional encodings supported by the Iconv module.
|
||
|
Corrected list of !Unicode deficiencies.
|
||
|
|
||
|
|
||
|
!Unicode->iconv:
|
||
|
================
|
||
|
|
||
|
Unicode: iconv: notes:
|
||
|
|
||
|
Acorn.Latin1 riscos1.h
|
||
|
|
||
|
Apple.CentEuro mac_centraleurope.h
|
||
|
Apple.Cyrillic mac_cyrillic.h
|
||
|
Apple.Roman mac_roman.h
|
||
|
Apple.Ukrainian mac_ukraine.h
|
||
|
|
||
|
BigFive big5.h
|
||
|
|
||
|
ISO2022.C0.40[ISO646] ?
|
||
|
|
||
|
ISO2022.C1.43[IS6429] ?
|
||
|
|
||
|
ISO2022.G94.40[646old] iso646_cn.h
|
||
|
ISO2022.G94.41[646-GB] ?
|
||
|
ISO2022.G94.42[646IRV] ?
|
||
|
ISO2022.G94.43[FinSwe] ?
|
||
|
ISO2022.G94.47[646-SE] ?
|
||
|
ISO2022.G94.48[646-SE] ?
|
||
|
ISO2022.G94.49[JS201K] jisx0201.h top of JIS range
|
||
|
ISO2022.G94.4A[JS201R] jisx0201.h iso646_jp.h bottom of JIS range
|
||
|
ISO2022.G94.4B[646-DE] ?
|
||
|
ISO2022.G94.4C[646-PT] ?
|
||
|
ISO2022.G94.54[GB1988] ?
|
||
|
ISO2022.G94.56[Teltxt] ?
|
||
|
ISO2022.G94.59[646-IT] ?
|
||
|
ISO2022.G94.5A[646-ES] ?
|
||
|
ISO2022.G94.60[646-NO] ?
|
||
|
ISO2022.G94.66[646-FR] ?
|
||
|
ISO2022.G94.69[646-HU] ?
|
||
|
ISO2022.G94.6B[Arabic] ?
|
||
|
ISO2022.G94.6C[IS6397] ?
|
||
|
ISO2022.G94.7A[SerbCr] ?
|
||
|
|
||
|
ISO2022.G94x94.40[JS6226] ?
|
||
|
ISO2022.G94x94.41[GB2312] gb2312.h
|
||
|
ISO2022.G94x94.42[JIS208] jis0x208.h
|
||
|
ISO2022.G94x94.43[KS1001] ksc5601.h
|
||
|
ISO2022.G94x94.44[JIS212] jis0x212.h
|
||
|
ISO2022.G94x94.47[CNS1] cns11643_1.h the tables differ
|
||
|
ISO2022.G94x94.48[CNS2] cns11643_2.h
|
||
|
ISO2022.G94x94.49[CNS3] cns11643_3.h
|
||
|
ISO2022.G94x94.4A[CNS4] cns11643_4.h
|
||
|
ISO2022.G94x94.4B[CNS5] cns11643_5.h
|
||
|
ISO2022.G94x94.4C[CNS6] cns11643_6.h
|
||
|
ISO2022.G94x94.4D[CNS7] cns11643_7.h
|
||
|
|
||
|
ISO2022.G96.41[Lat1] iso8859_1.h
|
||
|
ISO2022.G96.42[Lat2] iso8859_2.h
|
||
|
ISO2022.G96.43[Lat3] iso8859_3.h
|
||
|
ISO2022.G96.44[Lat4] iso8859_4.h
|
||
|
ISO2022.G96.46[Greek] ?
|
||
|
ISO2022.G96.47[Arabic] iso8859_6.h ISO-8859-6 ignored
|
||
|
ISO2022.G96.48[Hebrew] ?
|
||
|
ISO2022.G96.4C[Cyrill] ?
|
||
|
ISO2022.G96.4D[Lat5] iso8859_5.h
|
||
|
ISO2022.G96.50[LatSup] ?
|
||
|
ISO2022.G96.52[IS6397] ?
|
||
|
ISO2022.G96.54[Thai] tis620.h
|
||
|
ISO2022.G96.56[Lat6] iso8859_6.h
|
||
|
ISO2022.G96.58[L6Sami] ?
|
||
|
ISO2022.G96.59[Lat7] iso8859_7.h
|
||
|
ISO2022.G96.5C[Welsh] ?
|
||
|
ISO2022.G96.5D[Sami] ?
|
||
|
ISO2022.G96.5E[Hebrew] ?
|
||
|
ISO2022.G96.5F[Lat8] iso8859_8.h
|
||
|
ISO2022.G96.62[Lat9] iso8859_9.h
|
||
|
|
||
|
KOI8-R koi8_r.h
|
||
|
|
||
|
Microsoft.CP1250 cp1250.h
|
||
|
Microsoft.CP1251 cp1251.h
|
||
|
Microsoft.CP1252 cp1252.h
|
||
|
Microsoft.CP1254 cp1254.h
|
||
|
Microsoft.CP866 cp866.h
|
||
|
Microsoft.CP932 cp932.h cp932ext.h
|
||
|
|
||
|
iconv->!Unicode:
|
||
|
================
|
||
|
|
||
|
Iconv has the following encodings, which are not present in !Unicode.
|
||
|
Providing a suitable data file for !Unicode is trivial. Whether UnicodeLib
|
||
|
will then act upon the addition of these is unknown.
|
||
|
This list is ordered as per libiconv's NOTES file.
|
||
|
|
||
|
European & Semitic languages:
|
||
|
|
||
|
ISO-8859-16 (iso8859_16.h)
|
||
|
KOI8-{U,RU,T} (koi8_xx.h)
|
||
|
CP125{3,5,6,7} (cp125n.h)
|
||
|
CP850 (cp850.h)
|
||
|
CP862 (cp862.h)
|
||
|
Mac{Croatian,Romania,Greek,Turkish,Hebrew,Arabic} (mac_foo.h)
|
||
|
|
||
|
Japanese:
|
||
|
|
||
|
None afaikt.
|
||
|
|
||
|
Simplified Chinese:
|
||
|
|
||
|
GB18030 (gb18030.h, gb18030ext.h)
|
||
|
HZ-GB-2312 (hz.h)
|
||
|
|
||
|
Traditional Chinese:
|
||
|
|
||
|
CP950 (cp950.h)
|
||
|
BIG5-HKSCS (big5hkscs.h)
|
||
|
|
||
|
Korean:
|
||
|
|
||
|
CP949 (cp949.h)
|
||
|
|
||
|
Armenian:
|
||
|
|
||
|
ARMSCII-8 (armscii_8.h)
|
||
|
|
||
|
Georgian:
|
||
|
|
||
|
Georgian-Academy, Georgian-PS (georgian_academy.h, georgian_ps.h)
|
||
|
|
||
|
Thai:
|
||
|
|
||
|
CP874 (cp874.h)
|
||
|
MacThai (mac_thai.h)
|
||
|
|
||
|
Laotian:
|
||
|
|
||
|
MuleLao-1, CP1133 (mulelao.h, cp1133.h)
|
||
|
|
||
|
Vietnamese:
|
||
|
|
||
|
VISCII, TCVN (viscii.h, tcvn.h)
|
||
|
CP1258 (cp1258.h)
|
||
|
|
||
|
Unicode:
|
||
|
|
||
|
BE/LE variants of normal encodings. I assume UnicodeLib handles
|
||
|
these, but can't be sure.
|
||
|
C99 / JAVA - well, yes.
|
||
|
|
||
|
|
||
|
Iconv Module:
|
||
|
=============
|
||
|
|
||
|
The iconv module is effectively a thin veneer around UnicodeLib. However,
|
||
|
8bit encodings are implemented within the module rather than using the
|
||
|
support in UnicodeLib. The rationale for this is simply that, although
|
||
|
UnicodeLib will understand (and act upon - reportedly...) additions to
|
||
|
the ISO2022 Unicode resource, other encodings are ignored. As the vast
|
||
|
majority of outstanding encodings fall into this category, and the code
|
||
|
is fairly simple, it made sense to implement it within the module.
|
||
|
|
||
|
With use of the iconv module, the list of outstanding encodings is
|
||
|
reduced to:
|
||
|
|
||
|
CP1255 (requires state-based transcoding)
|
||
|
|
||
|
GB18030 (not 8bit - reportedly a requirement of PRC)
|
||
|
HZ-GB-2312 (not 8bit - supported by IE4)
|
||
|
|
||
|
CP950 (not 8bit - a (MS) variant of Big5)
|
||
|
BIG5-HKSCS (not 8bit - again, a Big5 variant)
|
||
|
|
||
|
CP949 (not 8bit)
|
||
|
|
||
|
ARMSCII-8 (easily implemented, if required)
|
||
|
|
||
|
VISCII (easily implemented, if required)
|
||
|
CP1258, TCVN (requires state-based transcoding)
|
||
|
|
||
|
Additionally, the rest of the CodePage encodings implemented in iconv
|
||
|
but not listed above (due to omissions from the iconv documentation)
|
||
|
are implemented by the iconv module.
|