[PATCH] Update the docstring for make-char

Aidan Kehoe kehoea at parhasard.net
Fri Nov 17 11:26:52 EST 2006


src/ChangeLog addition:

2006-11-17  Aidan Kehoe  <kehoea at parhasard.net>

	* text.c:
	* text.c (Fmake_char):
	`Octet' is incorrect; only the low seven bits are used. Mention
	the MIME use of the term `charset' and cross-reference to
	coding-system-p for the XEmacs implementation of the same
	idea. Add an example call to (decode-char 'ucs ...), move the
	decode-big5-char arguments to hex. 
	* text.c (Fsplit_char):
	BREAKUP_ICHAR assigns charset the character set object, never its
	name, so the get-charset call is superfluous.
	

XEmacs Trunk source patch:
Diff command:   cvs -q diff -u
Files affected: src/text.c
===================================================================
RCS

Index: src/text.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/text.c,v
retrieving revision 1.29
diff -u -r1.29 text.c
--- src/text.c	2006/08/24 21:21:36	1.29
+++ src/text.c	2006/11/17 16:18:32
@@ -4840,24 +4840,33 @@
 /************************************************************************/
 
 DEFUN ("make-char", Fmake_char, 2, 3, 0, /*
-Make a character from CHARSET and octets ARG1 and ARG2.
+Make a character from CHARSET and integers ARG1 and ARG2.
 ARG2 is required only for characters from two-dimensional charsets.
 
-Each octet should be in the range 32 through 127 for a 96 or 96x96
-charset and 33 through 126 for a 94 or 94x94 charset. (Most charsets
-are either 96 or 94x94.) Note that this is 32 more than the values
-typically given for 94x94 charsets.  When two octets are required, the
-order is "standard" -- the same as appears in ISO-2022 encodings,
-reference tables, etc.
+An XEmacs `charset' is a very distinct concept from a MIME charset,
+and unfortunately for the XEmacs documentation, the MIME
+interpretation is today the better known of the two.  For information
+on how we implement the concept that MIME describes using the term,
+see `coding-system-p'.  XEmacs takes its terminology from ISO/IEC
+2022, a much older standard that was mostly implemented in East Asia,
+and which the ECMA makes freely available as Ecma-035.pdf.
+
+The low-order seven bits of each integer should be in the range 32
+through 127 for a 96- or 96x96-character charset and 33 through 126
+for a 94- or 94x94-character charset--most XEmacs charsets are either
+96 or 94x94.  This is 32 more than the values typically given for
+94x94 charsets.  When two integers are required, the order is that
+that appears in in ISO-2022 encodings, and the standard documents'
+reference tables.
 
 \(Note the following non-obvious result: Computerized translation
-tables often encode the two octets as the high and low bytes,
-respectively, of a hex short, while when there's only one octet, it
+tables often encode the two integers as the high and low bytes,
+respectively, of a hex short, while when there's only one integer, it
 goes in the low byte.  When decoding such a value, you need to treat
 the two cases differently when calling make-char: One is (make-char
 CHARSET HIGH LOW), the other is (make-char CHARSET LOW).)
 
-For example, (make-char 'latin-iso8859-2 185) or (make-char
+For example, \(make-char 'latin-iso8859-2 #xB9) or \(make-char
 'latin-iso8859-2 57) will return the Latin 2 character s with caron.
 
 As another example, the Japanese character for "kawa" (stream), which
@@ -4882,20 +4891,19 @@
 
 These are equivalent to:
 
+\(decode-char 'ucs #x5DDD)
 \(make-char 'chinese-gb2312 52 40)
 \(make-char 'japanese-jisx0208 64 110)
 \(make-char 'korean-ksc5601 116 57)
+\(decode-big5-char '(#xA4 . #x74))
 \(make-char 'chinese-cns11643-1 76 87)
-\(decode-big5-char '(164 . 116))
 
-\(All codes above are two decimal numbers except for Big Five and ANSI
-Z39.64, which we don't support.  We add 32 to each of the decimal
-numbers.  Big Five is split in a rather hackish fashion into two
-charsets, `big5-1' and `big5-2', due to its excessive size -- 94x157,
-with the first codepoint in the range 0xA1 to 0xFE and the second in
-the range 0x40 to 0x7E or 0xA1 to 0xFE.  `decode-big5-char' is used to
-generate the char from its codes, and `encode-big5-char' extracts the
-codes.)
+\(We add 32 to each of the decimal numbers passed to make-char.  Big
+Five is split in a rather hackish fashion into two charsets, `big5-1'
+and `big5-2', due to its excessive size -- 94x157, with the first
+codepoint in the range 0xA1 to 0xFE and the second in the range 0x40
+to 0x7E or 0xA1 to 0xFE.  `decode-big5-char' is used to generate the
+char from its codes, and `encode-big5-char' extracts the codes.)
 
 When compiled without MULE, this function does not do much, but it's
 provided for compatibility.  In this case, the following CHARSET symbols
@@ -4933,7 +4941,7 @@
     {
       if (!NILP (arg2))
         invalid_argument
-          ("Charset is of dimension one; second octet must be nil", arg2);
+          ("Charset is of dimension one; second integer must be nil", arg2);
       return make_char (make_ichar (charset, a1, 0));
     }
 
@@ -5005,7 +5013,13 @@
 */
        (character))
 {
-  /* This function can GC */
+  /* [ This function can GC ]
+
+        Apart from argument checking, I disagree, but that's mostly
+        irrelevant, because if you're calling this function often
+        enough that it matters, talk to us, and we'll probably
+        implement something for you in C. Aidan Kehoe, Fri Nov 17
+        16:24:30 2006. */
   struct gcpro gcpro1, gcpro2;
   Lisp_Object charset = Qnil;
   Lisp_Object rc = Qnil;
@@ -5016,7 +5030,7 @@
 
   BREAKUP_ICHAR (XCHAR (character), charset, c1, c2);
 
-  if (XCHARSET_DIMENSION (Fget_charset (charset)) == 2)
+  if (XCHARSET_DIMENSION (charset) == 2)
     {
       rc = list3 (XCHARSET_NAME (charset), make_int (c1), make_int (c2));
     }

-- 
Santa Maradona, priez pour moi!



More information about the XEmacs-Patches mailing list