[Q21.5] Handle UTF-8 more robustly;
pass through information about incorrect sequences
Aidan Kehoe
kehoea at parhasard.net
Mon Jul 23 05:29:14 EDT 2007
Ar an tríú lá is fiche de mí Iúil, scríobh Stephen J. Turnbull:
> Aidan Kehoe writes:
>
> > You know that our internal string encoding is not exposed to Lisp,
> > except via CCL, right? I don’t object to your asking to document
> > it, but I wonder what provokes the question.
>
> I'm not talking about the internal encoding. I want to know what
> happens if you edit the buffer a buffer containing a representation of
> non-UTF-8 stuff, and then use/save the result. The AUCTeX processing
> of TeX error messages described by David Kastrup would be a use case.
> Another would be people trying to recover text from a core dump.
Yes, this solves David’s use-case, and Joachim’s from here:
http://mid.gmane.org/f2g834$sds$1@sea.gmane.org . I’m not sure I want to
document that right now, since I use the jit-ucs character sets for encoding
the error octets, which makes it virtually impossible to actually search for
such octets and be sure that you’ve found them.
I also am not sure I want to add a new character set for them (which would
make searching trivial) before comitting another patch on my hard disk,
which removes the ordering from the leading bytes, so that any leading byte
can be used for a charset of any dimension.
> If the result is predictable and documented, that could be useful to
> people who are deliberately working with buffers that do not contained
> well-formed encoded test.
Yes.
--
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)
More information about the XEmacs-Patches
mailing list