how to save œ other than utf8
Stephen J. Turnbull
stephen at xemacs.org
Tue Dec 2 23:02:26 EST 2008
Aidan Kehoe writes:
> Heuristics can come close for this particular pair. ISO-8859-1 text that
> includes "¤", for example, is rare, while ISO-8859-15 text that includes
-------------- next part --------------
> "?", with the same octet value, is close.
The problem with (one-character frequency-based) heuristics is that
they often fail spectacularly. For example, I have seen plain-text
documents by Americans that use CURRENCY SYMBOL as a fancy bullet in
lists. No Euros there. And the Japanese are great fans of using
non-word-forming characters in, er, "expressive" ways.
Frequency-based heuristics (including dyad and triad heuristics) would
of course be a good idea to implement. The hard part is going to be a
UI that allows people who don't need them or are actively hindered by
them to disable them.
Note that another advantage of frequency-based heuristics is that they
allow language detection, which mere character usage doesn't, really.
More information about the XEmacs-Beta