a thought on coding systems
jcb+xeb at inf.ed.ac.uk
Sat Oct 25 05:10:17 EDT 2008
There are, alas, more coding systems around in the world than there
were. In particular, there's the GBK system, and what's worse, there
are GBK encoded articles on Usenet that claim to be GB2312. Firefox
at least seems to silently accommodate this and treat GB2312 as GBK.
Standard XEmacs doesn't support GBK. There is an add-on, but I haven't
seen whether it works, and it's a typical horrible CCL/conversion
It occurred to me that it would be very easy to make a small extension
to the ISO2022 coding systems: add a new property ('gr-gl-is-g2 or
something) that would mean: if you see a sequence c1,c2 of characters
in the GR,GL ranges, treat it as referring to (c1 & 0x7F, c2) in the
G2 charset of the coding system.
Then one would just add this property to the gb coding system, set g2
to be a new charset representing the lower columns of GBK, and Bob's
(This is also, IMHO, how Big5 should have been handled.)
Thoughts? I might well do it for myself, since nowadays I compile my own
XEmacs everywhere anyway, but I wondered if it might be worth having
in the mainstream - in which case, I'd propose using one of the
remaining free official 2D charsets for the lower half of GBK.
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the XEmacs-Beta