You can not select more than 25 topics
			Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
		
		
		
		
		
			
		
			
				
					
					
						
							205 lines
						
					
					
						
							5.0 KiB
						
					
					
				
			
		
		
	
	
							205 lines
						
					
					
						
							5.0 KiB
						
					
					
				Introduction:
 | 
						|
=============
 | 
						|
 | 
						|
This file documents an approximate correlation between the data files
 | 
						|
provided in the !Unicode distribution and the encoding headers in GNU
 | 
						|
libiconv 1.9.1.
 | 
						|
 | 
						|
Those with '?' in the iconv column either are not represented in iconv
 | 
						|
or I've missed the relevant header file ;)
 | 
						|
 | 
						|
A number of encodings are present in the iconv distribution but not
 | 
						|
in !Unicode. These are documented at the end of this file.
 | 
						|
 | 
						|
Changelog:
 | 
						|
==========
 | 
						|
 | 
						|
v 0.01 (09-Sep-2004)
 | 
						|
~~~~~~~~~~~~~~~~~~~~
 | 
						|
Initial Incarnation
 | 
						|
 | 
						|
v 0.02 (11-Sep-2004)
 | 
						|
~~~~~~~~~~~~~~~~~~~~
 | 
						|
Documented additional encodings supported by the Iconv module.
 | 
						|
Corrected list of !Unicode deficiencies.
 | 
						|
 | 
						|
 | 
						|
!Unicode->iconv:
 | 
						|
================
 | 
						|
 | 
						|
Unicode:			iconv:			notes:
 | 
						|
 | 
						|
Acorn.Latin1			riscos1.h
 | 
						|
 | 
						|
Apple.CentEuro			mac_centraleurope.h
 | 
						|
Apple.Cyrillic			mac_cyrillic.h
 | 
						|
Apple.Roman			mac_roman.h
 | 
						|
Apple.Ukrainian			mac_ukraine.h
 | 
						|
 | 
						|
BigFive				big5.h
 | 
						|
 | 
						|
ISO2022.C0.40[ISO646]		?
 | 
						|
 | 
						|
ISO2022.C1.43[IS6429]		?
 | 
						|
 | 
						|
ISO2022.G94.40[646old]		iso646_cn.h
 | 
						|
ISO2022.G94.41[646-GB]		?
 | 
						|
ISO2022.G94.42[646IRV]		?
 | 
						|
ISO2022.G94.43[FinSwe]		?
 | 
						|
ISO2022.G94.47[646-SE]		?
 | 
						|
ISO2022.G94.48[646-SE]		?
 | 
						|
ISO2022.G94.49[JS201K]		jisx0201.h		top of JIS range 
 | 
						|
ISO2022.G94.4A[JS201R]		jisx0201.h iso646_jp.h	bottom of JIS range
 | 
						|
ISO2022.G94.4B[646-DE]		?
 | 
						|
ISO2022.G94.4C[646-PT]		?
 | 
						|
ISO2022.G94.54[GB1988]		?
 | 
						|
ISO2022.G94.56[Teltxt]		?
 | 
						|
ISO2022.G94.59[646-IT]		?
 | 
						|
ISO2022.G94.5A[646-ES]		?
 | 
						|
ISO2022.G94.60[646-NO]		?
 | 
						|
ISO2022.G94.66[646-FR]		?
 | 
						|
ISO2022.G94.69[646-HU]		?
 | 
						|
ISO2022.G94.6B[Arabic]		?
 | 
						|
ISO2022.G94.6C[IS6397]		?
 | 
						|
ISO2022.G94.7A[SerbCr]		?
 | 
						|
 | 
						|
ISO2022.G94x94.40[JS6226]	?
 | 
						|
ISO2022.G94x94.41[GB2312]	gb2312.h
 | 
						|
ISO2022.G94x94.42[JIS208]	jis0x208.h
 | 
						|
ISO2022.G94x94.43[KS1001]	ksc5601.h
 | 
						|
ISO2022.G94x94.44[JIS212]	jis0x212.h
 | 
						|
ISO2022.G94x94.47[CNS1]		cns11643_1.h		the tables differ
 | 
						|
ISO2022.G94x94.48[CNS2]		cns11643_2.h
 | 
						|
ISO2022.G94x94.49[CNS3]		cns11643_3.h
 | 
						|
ISO2022.G94x94.4A[CNS4]		cns11643_4.h
 | 
						|
ISO2022.G94x94.4B[CNS5]		cns11643_5.h
 | 
						|
ISO2022.G94x94.4C[CNS6]		cns11643_6.h
 | 
						|
ISO2022.G94x94.4D[CNS7]		cns11643_7.h
 | 
						|
 | 
						|
ISO2022.G96.41[Lat1]		iso8859_1.h
 | 
						|
ISO2022.G96.42[Lat2]		iso8859_2.h
 | 
						|
ISO2022.G96.43[Lat3]		iso8859_3.h
 | 
						|
ISO2022.G96.44[Lat4]		iso8859_4.h
 | 
						|
ISO2022.G96.46[Greek]		?
 | 
						|
ISO2022.G96.47[Arabic]		iso8859_6.h		ISO-8859-6 ignored
 | 
						|
ISO2022.G96.48[Hebrew]		?
 | 
						|
ISO2022.G96.4C[Cyrill]		?
 | 
						|
ISO2022.G96.4D[Lat5]		iso8859_5.h
 | 
						|
ISO2022.G96.50[LatSup]		?
 | 
						|
ISO2022.G96.52[IS6397]		?
 | 
						|
ISO2022.G96.54[Thai]		tis620.h
 | 
						|
ISO2022.G96.56[Lat6]		iso8859_6.h
 | 
						|
ISO2022.G96.58[L6Sami]		?
 | 
						|
ISO2022.G96.59[Lat7]		iso8859_7.h
 | 
						|
ISO2022.G96.5C[Welsh]		?
 | 
						|
ISO2022.G96.5D[Sami]		?
 | 
						|
ISO2022.G96.5E[Hebrew]		?
 | 
						|
ISO2022.G96.5F[Lat8]		iso8859_8.h
 | 
						|
ISO2022.G96.62[Lat9]		iso8859_9.h
 | 
						|
 | 
						|
KOI8-R				koi8_r.h
 | 
						|
 | 
						|
Microsoft.CP1250		cp1250.h
 | 
						|
Microsoft.CP1251		cp1251.h
 | 
						|
Microsoft.CP1252		cp1252.h
 | 
						|
Microsoft.CP1254		cp1254.h
 | 
						|
Microsoft.CP866			cp866.h
 | 
						|
Microsoft.CP932			cp932.h cp932ext.h
 | 
						|
 | 
						|
iconv->!Unicode:
 | 
						|
================
 | 
						|
 | 
						|
Iconv has the following encodings, which are not present in !Unicode. 
 | 
						|
Providing a suitable data file for !Unicode is trivial. Whether UnicodeLib
 | 
						|
will then act upon the addition of these is unknown.
 | 
						|
This list is ordered as per libiconv's NOTES file.
 | 
						|
 | 
						|
European & Semitic languages:
 | 
						|
 | 
						|
	ISO-8859-16 (iso8859_16.h)
 | 
						|
	KOI8-{U,RU,T} (koi8_xx.h)
 | 
						|
	CP125{3,5,6,7} (cp125n.h)
 | 
						|
	CP850 (cp850.h)
 | 
						|
	CP862 (cp862.h)
 | 
						|
	Mac{Croatian,Romania,Greek,Turkish,Hebrew,Arabic} (mac_foo.h)
 | 
						|
 | 
						|
Japanese:
 | 
						|
 | 
						|
	None afaikt.
 | 
						|
 | 
						|
Simplified Chinese:
 | 
						|
 | 
						|
	GB18030 (gb18030.h, gb18030ext.h)
 | 
						|
	HZ-GB-2312 (hz.h)
 | 
						|
 | 
						|
Traditional Chinese:
 | 
						|
 | 
						|
	CP950 (cp950.h)
 | 
						|
	BIG5-HKSCS (big5hkscs.h)
 | 
						|
 | 
						|
Korean:
 | 
						|
 | 
						|
	CP949 (cp949.h)
 | 
						|
 | 
						|
Armenian:
 | 
						|
 | 
						|
	ARMSCII-8 (armscii_8.h)
 | 
						|
 | 
						|
Georgian:
 | 
						|
 | 
						|
	Georgian-Academy, Georgian-PS (georgian_academy.h, georgian_ps.h)
 | 
						|
 | 
						|
Thai:
 | 
						|
 | 
						|
	CP874 (cp874.h)
 | 
						|
	MacThai (mac_thai.h)
 | 
						|
 | 
						|
Laotian:
 | 
						|
 | 
						|
	MuleLao-1, CP1133 (mulelao.h, cp1133.h)
 | 
						|
 | 
						|
Vietnamese:
 | 
						|
 | 
						|
	VISCII, TCVN (viscii.h, tcvn.h)
 | 
						|
	CP1258 (cp1258.h)
 | 
						|
 | 
						|
Unicode:
 | 
						|
 | 
						|
	BE/LE variants of normal encodings. I assume UnicodeLib handles
 | 
						|
	these, but can't be sure.
 | 
						|
	C99 / JAVA - well, yes.
 | 
						|
 | 
						|
 | 
						|
Iconv Module:
 | 
						|
=============
 | 
						|
 | 
						|
The iconv module is effectively a thin veneer around UnicodeLib. However,
 | 
						|
8bit encodings are implemented within the module rather than using the
 | 
						|
support in UnicodeLib. The rationale for this is simply that, although
 | 
						|
UnicodeLib will understand (and act upon - reportedly...) additions to
 | 
						|
the ISO2022 Unicode resource, other encodings are ignored. As the vast
 | 
						|
majority of outstanding encodings fall into this category, and the code
 | 
						|
is fairly simple, it made sense to implement it within the module.
 | 
						|
 | 
						|
With use of the iconv module, the list of outstanding encodings is
 | 
						|
reduced to:
 | 
						|
 | 
						|
	CP1255 (requires state-based transcoding)
 | 
						|
 | 
						|
	GB18030 (not 8bit - reportedly a requirement of PRC)
 | 
						|
	HZ-GB-2312 (not 8bit - supported by IE4)
 | 
						|
 | 
						|
	CP950 (not 8bit - a (MS) variant of Big5)
 | 
						|
	BIG5-HKSCS (not 8bit - again, a Big5 variant)
 | 
						|
 | 
						|
	CP949 (not 8bit)
 | 
						|
 | 
						|
	ARMSCII-8 (easily implemented, if required)
 | 
						|
 | 
						|
	VISCII (easily implemented, if required)
 | 
						|
	CP1258, TCVN (requires state-based transcoding)
 | 
						|
 | 
						|
Additionally, the rest of the CodePage encodings implemented in iconv
 | 
						|
but not listed above (due to omissions from the iconv documentation)
 | 
						|
are implemented by the iconv module.
 |