XEmacs cannot display U+FFFD (REPLACEMENT CHARACTER) correctly

Mike FABIAN mfabian at suse.de
Fri Jul 20 13:32:54 EDT 2007


XEmacs 21.5.x cannot display U+FFFD correctly. In "normal"
text files, a wrong glyph is shown, in web-pages viewed
with w3m.el only garbage Chinese characters are shown after
the first occurence of U+FFFD.

The problem seems to be that XEmacs maps U+FFFD to Big5:

(split-char (string-to-char (decode-coding-string "\357\277\275" 'utf-8)))
=> (chinese-big5-1 35 110)

and the reason for this seems to be that BIG5.TXT in the XEmacs
sources (which comes originally from Unicode.org) maps several Big5
characters to U+FFFD.


#       WARNING!  It is currently impossible to provide round-trip
compatibility
#               between BIG5 and Unicode.  
#
#       A number of characters are not currently mapped because
#               of conflicts with other mappings.  They are as follows:
#
#       BIG5        Description                    Comments
#
#       0xA15A      SPACING UNDERSCORE             duplicates A1C4
#       0xA1C3      SPACING HEAVY OVERSCORE        not in Unicode
#       0xA1C5      SPACING HEAVY UNDERSCORE       not in Unicode
#       0xA1FE      LT DIAG UP RIGHT TO LOW LEFT   duplicates A2AC
#       0xA240      LT DIAG UP LEFT TO LOW RIGHT   duplicates A2AD
#       0xA2CC      HANGZHOU NUMERAL TEN           conflicts with A451 mapping
#       0xA2CE      HANGZHOU NUMERAL THIRTY        conflicts with A4CA mapping
#
#       We currently map all of these characters to U+FFFD REPLACEMENT
CHARACTER.
#               It is also possible to map these characters to their
duplicates, or to
#               the user zone.  

To verify this, I made the attached patch which comments out
all lines which map Big5 characters to U+FFFD.

With this patch, U+FFFD is displayed correctly in plain text files
(tested with an Xft build of XEmacs and a suitable font which has
a glyph for U+FFFD).

With this patch, web-pages containing U+FFFD are "mostly" correctly
displayed with w3m.el.  "mostly" because U+FFFD is displayed as '???' 
(3 question marks) in that case which is still not correct.  But the
rest of such web-pages displays correctly.

I'm not sure whether commenting out the lines in BIG5.TXT which
map characters to U+FFFD is the right solution.

Maybe these characters should be mapped to the user zone instead
as suggested in the comment at the top of BIG5.TXT?

But at least this patch should help to illustrate where the problem
comes from.

For more details, please have a look at

    http://bugzilla.novell.com/show_bug.cgi?id=293109

-- 
Mike FABIAN   <mfabian at suse.de>   http://www.suse.de/~mfabian
睡眠不足はいい仕事の敵だ。
I � Unicode

-------------- next part --------------
A non-text attachment was scrubbed...
Name: bugzilla-293109-w3m-el-under-xemacs-cannot-display-utf-8-encoded-web-pages-containing-fffd.patch
Type: text/x-patch
Size: 1915 bytes
Desc: not available
Url : http://lists.xemacs.org/pipermail/xemacs-beta/attachments/20070720/8cc16524/bugzilla-293109-w3m-el-under-xemacs-cannot-display-utf-8-encoded-web-pages-containing-fffd.bin


More information about the XEmacs-Beta mailing list