[Bug: 21.5-b25] Problems with latin-unity and VM
kehoea at parhasard.net
Wed Feb 7 07:45:47 EST 2007
Ar an seachtú lá de mí Feabhra, scríobh Joachim Schrod:
> >>>>> "AK" == Aidan Kehoe <kehoea at parhasard.net> writes:
> AK> Ar an seachtú lá de mí Feabhra, scríobh Joachim Schrod:
> >> I would have expected that latin-unity does NOT attempt to change
> >> the encoding at all for such files -- after all, they are declared
> >> as binary and the notion of Latin characters in binary files makes
> >> no sense.
> AK> We (XEmacs) don’t distinguish iso-8859-1 and binary in your sense;
> Ah -- that I didn't know. Reading the Coding System section of the
> XEmacs manual, it didn't seem so, there differences between binary and
> iso-8859-1 are explicitly named.
They are not--‘no character code conversion [...] for non-Latin-1 byte
values’ is what it says. It is badly and unclearly put, though.
> In contrast, the coding system `binary' specifies no character
> code conversion at all--none for non-Latin-1 byte values and none
> for end of line. This is useful for reading or writing binary
> files, tar files, and other files that must be examined verbatim.
> But with that information your explanation gets clearer. Though I have
> to say that I would have naively answered your question
> AK> Consider; how can you interpret a sequence of octets on disk as
> AK> U+5357, the Han character for ‘southwards,’ without abandoning the
> AK> treatment as ‘binary’--a sequence of octets--and checking instead
> AK> for ISO-2022-1 or UTF-8 sequences?
> as follows: In buffers with coding system 'binary there must not be
> the character U+5357, by definition, because no such octet exists.
> When the buffer-file-coding-system-for-read is set to 'binary, such a
> character would not be constructed at all. Yanking that character in
> such a buffer would signal an error. I also would have expected any
> attempt to set buffer-file-coding-system to 'binary in a buffer with
> such a character to signal an error.
I’m not aware of any environment that implements that behaviour--though it
would seem more correct. Are you? Non-Unicode Windows apps, for example,
trash data when people type in or paste characters that don’t occur in the
app’s code page.
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)
More information about the XEmacs-Beta