[PATCH] Pay more attention to the the locale's coding systems on
startup
Aidan Kehoe
kehoea at parhasard.net
Mon Nov 13 03:56:56 EST 2006
Ar an triú lá déag de mí na Samhain, scríobh stephen at xemacs.org:
> VETO
>
> Correctness before convenience, please. I agree with you that the
> language environment stuff is broken with regards to understanding
> about various environments, especially those using UTF-8, but this is
> patching an inconvenience without fixing the real problem.
I don’t think “correct” is possible for this problem. “Better” is
possible, but the POSIX locale format lacks standardisation, and the various
platforms have taken full advantage of that to create needlessly
incompatible locale names. I mean, there’s an en_IE.iso885915 at euro locale on
this Debian box; what have they got against dashes?
> Aidan Kehoe writes:
>
> > 1. It doesn’t deal with the @modifier syntax in Unix locale specifications,
> > which are structured like so:
>
> To deal with the @modifier syntax you'll need to provide a hook. At
> least in POSIX.1 the @modifier stuff was basically implementation
> and/or site-specific. There are common ones, you can provide a
> default implementation if you like.
>
> > 2. It’s possible for me to start up in, for example, the directory
> > "/tmp/aidan/за родину!" with a LC_CTYPE setting of en_US.UTF-8, and have
> > XEmacs treat the current directory’s name as being encoded in UTF-8.
>
> LC_CTYPE is a user preference on a multilingual system, but the file
> system encoding is typically a site-wide parameter, sometimes enforced
> by (some subsystem of) the OS. As I understand it, under Mac OS X
> with HFS+ file names are UTF-8 with the maximally decomposed
> canonicalization. Period. IIRC Japanese VFAT file systems are always
> Shift JIS. If file-name-coding-system is reset to something else,
> these systems will generate very confusing errors. On other systems
> I've used, the file system is EUC-JP even though I used ja_JP.UTF8 or
> en_US.ISO8859-1 for the default content encoding. Depending on what
> I'm doing, it's often convenient to change locales. If XEmacs were to
> pay attention to the locale for this parameter, hilarity (not to
> mention annoyance) would ensue.
Hilarity ensues as-is. When I start up in "/tmp/aidan/за родину!" , I can’t
open the current directory, even though my init files set the file name
coding systems appropriately, and even when I have
XEMACSDEBUG="(define-coding-system-alias 'native 'utf-8)" in my environment.
> This should be handled by setting file-name-coding-system in the
> site-start.el file, not by paying attention to user locale settings.
?!?! There should be code to work it out for each platform. In these days of
personal Linux distributions, and Mac OS X, the system administrator implied
in the site-start.el concept is becoming rarer and rarer.
> > 2006-11-12 Aidan Kehoe <kehoea at parhasard.net>
> >
> > * mule/mule-cmds.el (get-language-environment-from-locale):
> > If the Unix locale matches, and the language environment doesn't
> > use the coding system specified in that locale, create a new one
> > on the fly that does, and return that language environment.
> > <== #### NOTE NOTE NOTE bogus trailing whitespace
>
> I wish you'd not do this. Some of my private tools expect the regexp
> "\n\n\d{4}-\d{2}-\d{2} " to match start-of-log-entry, and for "\n\n"
> to separate stanzas in a log entry.
Okay, noted.
--
Santa Maradona, priez pour moi!
More information about the XEmacs-Patches
mailing list