[PATCH] Handle bytes in the range 0x80-0xC0 better when dealing with ISO-IR 196.

Aidan Kehoe kehoea at parhasard.net
Tue Nov 21 06:01:42 EST 2006




src/ChangeLog addition:

2006-11-21  Aidan Kehoe  <kehoea at parhasard.net>

	* mule-coding.c (iso2022_decode):
	Only take the lower seven bits of any eight-bit character that
	would be illegal in UTF-8, when handling ISO/IR 196 escapes. 


tests/ChangeLog addition:

2006-11-21  Aidan Kehoe  <kehoea at parhasard.net>

	* automated/mule-tests.el (featurep):
	Add a test that ISO/IR 196 escape handling in ISO-2022-based
	charsets don't choke on invalid bytes in UTF-8 text. 


XEmacs Trunk source patch:
Diff command:   cvs -q diff -u
Files affected: tests/automated/mule-tests.el
===================================================================
RCS src/mule-coding.c
===================================================================
RCS

Index: src/mule-coding.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/mule-coding.c,v
retrieving revision 1.38
diff -u -r1.38 mule-coding.c
--- src/mule-coding.c	2006/06/03 17:50:55	1.38
+++ src/mule-coding.c	2006/11/21 10:54:28
@@ -1949,8 +1949,11 @@
 		  counter = 1;
 		}
 	      else
-		/* ASCII, or the lower control characters. */
-		Dynarr_add (dst, c);
+		/* ASCII, or the lower control characters.
+                   
+                   Perhaps we should signal an error if the character is in
+                   the range 0x80-0xc0; this is illegal UTF-8. */
+                Dynarr_add (dst, (c & 0x7f));
 
 	      break;
 	    case 1:
Index: tests/automated/mule-tests.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/tests/automated/mule-tests.el,v
retrieving revision 1.11
diff -u -r1.11 mule-tests.el
--- tests/automated/mule-tests.el	2006/11/20 19:21:56	1.11
+++ tests/automated/mule-tests.el	2006/11/21 10:54:28
@@ -441,6 +441,12 @@
                (eq (aref ccl-vector 4)  
                    (encode-char (make-char 'control-1 31) 'ucs)))))
 
+  ;; This used to crash, at least in debug builds:
+
+  (Assert (decode-coding-string 
+           (string ?\33 ?\45 ?\107 ?\306 ?\222 ?\215 ?\306)
+           'iso-2022-jp))
+
   ;;---------------------------------------------------------------
   ;; Test charset-in-* functions
   ;;---------------------------------------------------------------

-- 
Santa Maradona, priez pour moi!



More information about the XEmacs-Patches mailing list