how to save œ other than utf8

Stephen J. Turnbull stephen at xemacs.org
Tue Dec 2 23:02:26 EST 2008


Aidan Kehoe writes:

 > Heuristics can come close for this particular pair. ISO-8859-1 text that
 > includes "¤", for example, is rare, while ISO-8859-15 text that includes
-------------- next part --------------
 > "?", with the same octet value, is close. 

The problem with (one-character frequency-based) heuristics is that
they often fail spectacularly.  For example, I have seen plain-text
documents by Americans that use CURRENCY SYMBOL as a fancy bullet in
lists.  No Euros there.  And the Japanese are great fans of using
non-word-forming characters in, er, "expressive" ways.

Frequency-based heuristics (including dyad and triad heuristics) would
of course be a good idea to implement.  The hard part is going to be a
UI that allows people who don't need them or are actively hindered by
them to disable them.

Note that another advantage of frequency-based heuristics is that they
allow language detection, which mere character usage doesn't, really.


More information about the XEmacs-Beta mailing list