[Q] Non-Latin-1 escapes can lead to corrupted ELC code.

Stephen J. Turnbull stephen at xemacs.org
Mon May 7 02:25:10 EDT 2007


QUERY

Aidan Kehoe writes:

 > Without this patch, the following test file is compiled incorrectly: 
 > 
 > (defvar Pravda "\u05bf\u05e0\u05d0\u05d2\u05d4\u05d0")

I VETO that, you Capitalist Lackey!  That should be DEFCONST!!

Non-mandatory suggestion/question:  How hard would it be for non-mule
to recognize the new Unicode escapes and signal an `unimplemented'
error?  If that can be done correctly, it would be one small step to a
non-Mule Unicode- enabled XEmacs.

 > +            (let ((case-fold-search nil))
 > +              (re-search-forward 
 > +               (concat "[^\000-\377]" 
 > +                       #r"\\u[0-9a-fA-F]\{4,4\}\|\\U[0-9a-fA-F]\{8,8\}")
 > +               nil t)))

Don't you need an OR in the regexp?

               (concat "[^\000-\377]" 
                       #r"\|\\u[0-9a-fA-F]\{4,4\}\|\\U[0-9a-fA-F]\{8,8\}")
                          ^
HERE ---------------------+

 > +            ;; Look for any non-Latin-1 literals or Unicode character
 > +            ;; escapes. Also catches them in comments, which is actually
 > +            ;; irrelevant to us, but implementing a more complex algorithm
 > +            ;; is not worth the trade-off.

Non-mandatory suggestion:  Wouldn't

  (let ((case-fold-search nil)
        (mule-re (concat "[^\000-\377]" 
                         #r"\|\\u[0-9a-fA-F]\{4,4\}\|\\U[0-9a-fA-F]\{8,8\}")))
    (catch 'need-to-escape-quote
      (while (re-search-forward mule-re nil t)
        (skip-chars-backward ";" (point-at-bol))
        (if (bolp)
            (throw 'need-to-escape-quote t))
          (forward-line 1))))

do the trick for avoiding triggering on comments?  Since it's
compile-time and essentially one-pass, performance is really not that
big an issue.  Whether this is a good idea is another question; I'm of
two minds about that.



More information about the XEmacs-Patches mailing list