Braille regression in ben-unicode-internal (and some other unicode char groups)
Jeff Sparkes
jsparkes at gmail.com
Fri Jun 18 09:17:38 EDT 2010
I grabbed the attached HELLO file from gnu.emacs.devel a few months ago.
Somebody had encoded hello into quite a few languages using UTF-8. I
think it's a good test for Ben's unicode branch, which came out quite well.
I was comparing the trunk of XEmacs-21.5b29 against ben-unicode-internal
viewing this file. The unicode branch displayed quite a few more
character, but
there were regressions in various character sets. Since they displayed in
b29,
the fonts must be available.
I tested on cygwin 1.7,with similarly configured binaries:
ben-unicode-internal: ./configure -C --with-unicode-internal --with-mule
xemacs: ./configure -C --with-mule
I copied and pasted this from the b29 binary into gmail on Chrome. XEmacs
displayed it as all ~, but it obviously had the right characters. I've
attached the
braille file here:
⠁⠃⠄⠅⠧⣙
Some of the other samples that didn't display in unicode, although
they could be copied and pasted into Chrome.
Braille ⠓⠑⠇⠇⠕
Cyrillic Supplement ԀԁԂԃԄԅԆԇԈԉԊԋԌԍԎԏ
Ogham ᚛ᚁᚂᚃᚄᚅᚆᚇᚈᚉᚊᚋᚌᚍᚎᚏᚐᚑᚒᚒᚔ᚜
Runic ᛒᛁᛏᚱᛅᛁᛋ᛬ᛚᛅᛁᚠᛅ᛬ᚠᚢᛋᛏᚱᛅ᛬ᚴᚢᚦᛅᚾ᛬ᚦᛅᚾ᛬ᛋᚭᚾ᛬ᛁᛚᛅᚾ᛭
Dingbats ✁✆✇✈✉✌✍✐✒✓✟✠
Letterlike Symbols ℀℁ℂ℃℄℅℆ℇ℈℉ℊℋℌℍℎℏ
Optical Character Recognition ⑀⑁⑂⑃⑄⑅⑆⑇⑈⑉⑊
Technical Symbols ⌀⌁⌂⌃⌐⌑⌒⌠⌡⌰⌱⌲⌳⍀⍁⍂⍐⍑⍒
Yijing Hexagram Symbols ☰☱☲☳☴☵☶☷
Chrome doesn't even display ogham and runic. GNU Emacs 23.50 does.
I don't know if XEmacs
--
Jeff Sparkes
jsparkes at gmail.com
-------------- next part --------------
From: kawabata.taichi at gmail.com
Subject: new Emacs HELLO file??
Newsgroups: gmane.emacs.devel
To: emacs-devel at gnu.org
Date: Sat, 29 Aug 2009 06:27:06 +0900
Dear sirs,
This is just a quick thought and proposal to replace Emacs etc/HELLO file.
As Emacs 23 fully supports UCS/Unicode now, etc/HELLO file may be
extended, so that it covers the languages and scripts that had not been
covered by the previous Emacs versions.
Also, HELLO file may contain symbols and rare or ancient scripts, so
that it contains at least one character from each UCS/Unicode blocks.
That would help Emacs users to quick-check which fonts their Emacsen are
missing.
I've collected the "Hello" entries from various on-line dictionaries
over the Internet (especially omniglot.com is useful). Languages are
selected so that at most two or three languages are shown for each
script (except Latin script, which has several significant languages).
I gathered scripts that I could not find "Hello" entries to separate
entries, and classified them by writing directions.
The proposed etc/HELLO file attached below may contain some
inappropriateness or mistakes, which I appreciate if someone could point
out or fix. Also, most of script samples merely line up the characters.
Refining them to meaningful text is also appreciated.
=====================
The quickest way to view all characters in this HELLO.txt file is to
install "Code2000" font, Mr. George Douros's Unicode Symbol font, "MPH
Damase 2D" font and Cyrillic font by BukyVede. That would cover most of
Unicode 5.1 characters. Beside them, specific Tibetan, Tagalog, Arabic
Supplement, Sinhalese and Sundanese font fill the gap. Still, Font for
"New Tai Lue", "Balinease", "Lepcha" and few other script seems not
available freely for now.
This is a list of ways to say hello in various languages.
(For symbols and some scripts, only sample texts or characters are shown.)
Languages and scripts are classified by writing directions.
1. Scripts written from left to right.
LANGUAGE (NATIVE NAME) HELLO
---------------------- -----
Ainu (アイヌ イタㇰ) イランカラㇷ゚テ
Amharic (አማርኛ) ሠላም
Armenian (Հայերէն) բարև
Bengali (বাংলা) নমস্কার
Braille ⠓⠑⠇⠇⠕
Burmese (မ္ရန္မာ) မင္ဂလာပာ
C printf ("Hello, world!\n");
Cantonese (粵語 / 廣東話) 早晨 / 你好 / 喂
Cherokee (ᏣᎳᎩ ᎧᏬᏂᎯᏍᏗ) ᎣᏏᏲ
Chinese (中文 / 普通话 / 汉语 / 漢語) 你好, 您好
Comanche (Nu̶mu̶ tekwapu̶̲) Marú̶awe!
Cree (ᓀᐦᐃᔭᐍᐏᐣ / ᓀᐦᐃᔭᐤ) ᐊᑕᒥᐢᑳᑐᐃᐧᐣ
Czech (čeština) Dobrý den
Danish (dansk) Hej / Goddag / Halløj
Deseret (𐐼𐐯𐑅𐐨𐑉𐐯𐐻 𐐰𐑊𐑁𐐩𐐺𐐯𐐻) 𐐸𐑩𐑊𐐬
Dutch (Nederlands) Hallo / Dag
Dzongkha (རྫོང་ཁ) སྐུ་གཟུགས་བཟང་པོ
Emacs emacs --no-splash -f view-hello-file
English /ˈɪŋɡlɪʃ/ Hello
French (français) Bonjour / Salut
Georgian (ქართველი) გამარჯობა
German (Deutsch) Guten Tag / Grüß Gott
Greek (ελληνικά) Γειά σας
Greek, Polytonic Ἐμπρός! (on phone)
Gujarati (ગુજરાતી) નમસ્તે
Hindi (हिंदी) नमस्ते ।
Inuktitut (ᐃᓄᒃᑎᑐᑦ) ᐊᐃ / ᐊᐃᓐᖓᐃ
Italian (italiano) Ciao / Buon giorno
Japanese (日本語) 今日は。 / コンニチハ
Javanese (Jawa) System.out.println("Sugeng siang!");
Kannada (ಕನ್ನಡ) ನಮಸ್ಕಾರ
Kanza (Kaáⁿze) ho / hawé
Khmer (ភាសាខ្មែរ) ជំរាបសួរ / សួស្ដី
Koasati (Kowassá:tit) Cikáʔnó!
Korean (한글 / 韓國語) 안녕하세요 / 안녕하십니까 / 안녕
Lao (ພາສາລາວ) ສະບາຍດີ / ຂໍໃຫ້ໂຊກດີ
Malayalam (മലയാളം) നമസ്കാരം
Marathi (मराठी) नमस्कार ।
Mathematics ∀ p ∈ world • hello p □
Oriya (ଓଡ଼ିଆ) ଶୁଣିବେ
Punjabi (ਪੰਜਾਬੀ) ਸਤ ਸ੍ਰੀ ਅਕਾਲ.
Russian (русский) Здра́вствуйте!
Shavian (𐑖𐑭𐑝𐑾𐑯) 𐑣𐑩𐑤𐑴
Sinhala (සිංහල) ආයුබෝවන්
Spanish (español) ¡Hola!
Swedish (på svenska) Hej / Goddag / Hallå
Tagalog (ᜊᜌ᜔ᜊᜌᜒᜈ᜔) ᜋᜊᜓᜑᜌ᜔
Tamil (தமிழ்) வணக்கம்
Telugu (తెలుగు) నమస్కారం
Thai (ภาษาไทย) สวัสดีครับ / สวัสดีค่ะ
Tibetan (བོད་སྐད་) བཀྲ་ཤིས་བདེ་ལེགས༎
Tigrigna (ትግርኛ) ሰላማት
Vietnamese (tiếng Việt) Chào bạn
Yoruba (Yorùbá) Ẹ n lẹ
SCRIPT NAME SAMPLES
---------------------- -----
Balinese (ᬩᬲ ᬩᬮᬶ) ᬓᬔᬕᬖᬗᬘᬙᬚᬛᬜᬝᬞᬟ
Buginese (ᨅᨔ ᨕᨘᨁᨗ) ᨕᨗᨕᨊᨕᨙ ᨔᨛᨄᨒᨚ
Buhid (ᝊᝓᝑᝒᝇ) ᝀᝁᝂᝃᝄᝅ
Carian 𐊠𐊥𐊣𐊹𐊮𐊸 𐊲𐊥𐊰𐊴𐊣𐊺𐊸 𐊽𐊹𐊾𐊩𐊰𐊹𐊸
Carrier (ᑐᑊᘁᗕᑋᗸ) ᗺᗹᗵᗷᗶ
Cham ꨁꨗꨩꨈꨮ
CJK Radicals ⼀⼁⼂⼃⼄⼅⼆⼇⼈⼉ / ⺀⺁⺂⺃⺄⺅⺆⺇⺈⺉
CJK Unified Ideograph Extension-A 㐀㐁㐂㐃㐄㐅㐆㐇㐈㐉㐊㐋㐌㐍㐎㐏
CJK Unified Ideograph Extension-B 𠀀𠀁𠀂𠀃𠀄𠀅𠀆𠀇𠀈𠀉𠀊𠀋𠀌𠀍𠀎𠀏
CJK Unified Ideograph Extension-C 𪜀𪜁𪜂𪜃𪜄𪜅𪜆𪜇𪜈𪜉𪜊𪜋𪜌𪜍𪜎𪜏
Coptic ⲘⲒⲞ.Ⲕ
Cuneiform 𒀀𒀁𒀂𒀃𒀄𒀅𒀆𒀇𒀈𒀉𒀊𒀋𒀌𒀍𒀎𒀏
Cyrillic Supplement ԀԁԂԃԄԅԆԇԈԉԊԋԌԍԎԏ
Cyrillic Extended-A ⷠⷡⷢⷣⷤⷥⷦⷧⷨⷩⷪⷫⷬⷭⷮⷯ
Cyrillic Extended-B ꙀꙁꙂꙃꙄꙅꙆꙇꙈꙉꙊꙋꙌꙍꙎꙏ
Ethiopic Extended ⶀⶁⶂⶃⶄⶅⶆⶇⶈⶉⶊⶋⶌⶍⶎⶏ
Ethiopic Supplement ᎀᎁᎂᎃᎄᎅᎆᎇᎈᎉᎊᎋᎌᎍᎎᎏ
Georgian Supplement ⴀⴁⴂⴃⴄⴅⴆⴇⴈⴉⴊⴋⴌⴍⴎⴏ
Glagolitic ⰙⰂⰍⰌⰇⰟⰘ
Gothic 𐌰𐍄𐍄𐌰 𐌿𐌽𐍃𐌰𐍂 𐌸𐌿 𐌹𐌽
Hanunoo (ᜱᜨᜳᜨᜳᜢ) ᜣᜫᜨᜳᜰᜲ
Kayah Li ꤁꤂꤃꤄꤅
Latin Extended-C ⱠⱡⱢⱣⱤⱥⱦⱧⱨⱩⱪⱫⱬⱭⱮⱯ
Latin Extended-D ꜠꜡ꜢꜣꜤꜥꜦꜧꜨꜩꜪꜫꜬꜭꜮꜯ
Lepcha ᰣᰕᰧᰅ
Limbu ᤀᤁᤂᤃᤄᤅᤆᤇᤈ
Lycian 𐊀𐊁𐊂𐊃𐊄𐊅𐊆𐊇
Lydian 𐤠𐤡𐤢𐤣𐤤
New Tai Lue ᦀᦁᦂᦃᦄᦅᦆᦇ
Ogham ᚛ᚁᚂᚃᚄᚅᚆᚇᚈᚉᚊᚋᚌᚍᚎᚏᚐᚑᚒᚒᚔ᚜
Old Persian 𐎠𐎡𐎢𐎣𐎤𐎥𐎦𐎧𐎨𐎩𐎪𐎫𐎬𐎭𐎮𐎯
Osmanya 𐒀𐒁𐒂𐒃𐒄𐒅𐒆𐒇𐒈𐒉𐒊𐒋𐒌𐒍𐒎𐒏
Phaistos Disc 𐇑𐇛𐇜𐇐𐇡
Phonetic Extensions ᴀᴁᴂᴃᴄᴅᴆᴇᴈᴉᴊᴋᴌᴍᴎᴏ
Phonetic Extension Supplement ᶀᶁᶂᶃᶄᶅᶆᶇᶈᶉᶊᶋᶌᶍᶎᶏ
Rejang ꤰꤱꤲꤴꤵ
Runic ᛒᛁᛏᚱᛅᛁᛋ᛬ᛚᛅᛁᚠᛅ᛬ᚠᚢᛋᛏᚱᛅ᛬ᚴᚢᚦᛅᚾ᛬ᚦᛅᚾ᛬ᛋᚭᚾ᛬ᛁᛚᛅᚾ᛭
Santali (Ol Chiki) ᱟᱲ.ᱟ.
Saurashtra ꢂꢒꢂꢬꢣꢶ
Sundanese ᮀᮁᮂᮃᮄᮅᮆᮇᮈᮉᮊᮋᮌᮍᮎᮏ
Syloti Nagri ꠀꠇꠣꠌꠤꠐꠥꠔꠦ
Tagbanwa (ᝤᝪᝨᝯ) ᝠᝡᝢᝣᝤᝥᝦᝧᝨᝩᝪᝫᝬ
Tifinagh (ⵜⵉⴼⵉⵏⴰⵖ) ⴰⴱⴲⴳⴴⴵⴶⴷⴸⴹⴺⴻⴼⴽⴾⴿ
Tai Le (ᥖᥭᥰᥖᥬᥳᥑᥨᥒᥰ) ᥐᥑᥒᥓᥔᥕᥖᥗᥘᥙᥚᥛᥜᥝᥞᥟ
Ugaritic 𐎀𐎁𐎂𐎃𐎄𐎅𐎆𐎇𐎈𐎉𐎊𐎋𐎌𐎍𐎎𐎏
Vai ꔀꔁꔂꔃꔄꔅꔆꔇꔈꔉꔊꔋꔌꔍꔎꔏ
Yi (ꆇꉙ) ꉷꆀꅇꌫꏦ
2. Scripts written from Right to Left.
LANGUAGE (NATIVE NAME) HELLO
---------------------- -----
Arabic (ةّيبرعلا) مكيلع مالّسلا
Aramaic, Syriac (ܠܫܢܐ ܤܘܪܝܝܐ) ܐܵܝ! / ܐܳܝ!
Dhivehi (ހިވެދި) ކިހިނެތް؟ / ހާލު ކިހިނެތް؟
Hebrew (תירבע) שלום
Persian (فارسى) سلام / درود
Yiddish (ײִדיש / מאַמע לשון) אַ גוטן טאָג
SCRIPT NAME SAMPLES
---------------------- -----
Arabic Supplement ݐݑݒݓݔݕݖݗݘݙݚݛݜݝݞݟ
Cypriot Syllabary 𐠀𐠁𐠂𐠃𐠄𐠅𐠈𐠊𐠋𐠌𐠍𐠎𐠏
Kharoshthi 𐨠𐨡𐨢𐨣𐨤𐨥𐨦𐨧𐨨𐨩𐨪𐨫𐨬𐨭𐨮𐨯
Linear B 𐀀𐀁𐀂𐀃𐀄𐀅𐀆𐀇 / 𐂀𐂁𐂂𐂃𐂄𐂅𐂆𐂇
N'Ko (ߒߞߏ) ߀߁߂߃߄߅߆߇߈߉ߊߋߌߍߎߏ
Old Italic 𐌀𐌁𐌂𐌃𐌄𐌅𐌆𐌇𐌈𐌉𐌊𐌋𐌌𐌍𐌎𐌏
Phoenician 𐤀𐤁𐤂𐤃𐤄𐤅𐤆𐤇𐤈𐤉𐤊𐤋𐤌𐤍𐤎𐤏
3. Scripts written from Top to Bottom
LANGUAGE (NATIVE NAME) HELLO
---------------------- -----
Japanese (日本語) もし〳〵 (Vertical Repeat Mark)
Mongolian (ᠮᠣᠨᠭᠣᠯ ᠪᠢᠴᠢᠭ) ᠰᠠᠢ᠋ᠨ ᠪᠠᠢ᠋ᠨ ᠤᠦ
SCRIPT NAME SAMPLES
---------------------- -----
Kanbun (漢文) 使㆟籍誠不㆚以㆘蓄㆓妻子㆒憂㆗飢寒㆖乱㆙㆑心、
有㆑銭以済㆞医薬㆝。
Manchurian (ᠮᠠᠨᠵᡠ) ᡶᡠᡯᡳ ᡥᡝᠨᡩᡠᠮᡝ᠈ ᡨᠠᠴᡳᠮᠪᡳᠮᡝ ᡝᡵᡳᠨᡩᡝᡵᡳ
ᡠᡵᡳᠪᡠᠴᡳ᠈ ᡳᠨᡠ ᡠᡵᡤᡠᠨ ᠸᠠᡴᠠ᠉
Phags-pa ꡏꡟ ꡋꡞ ꡏꡟ ꡋꡞ ᠂ ꡏ ꡜꡖ ꡏꡟ ꡋꡞ ᠂ ꡓꡞ ꡏꡟ ᠁
4. Numbers and Symbols
SYMBOL NAMES EXAMPLES
---------------------- -----
Aegean Numbers 𐄀𐄁𐄂𐄇 𐄈𐄉𐄊𐄋𐄌𐄍𐄎𐄏
Ancient Greek Musical Notation 𝈀𝈁𝈂𝈃𝈄𝈅𝈆𝈈𝈉𝈊𝈋𝈌𝈍𝈎𝈏
Ancient Greek Numbers 𐅀𐅁𐅂𐅃𐅄𐅅𐅆𐅇𐅈𐅉𐅊𐅋𐅌𐅍𐅎𐅏
Ancient Symbols 𐆐𐆑𐆒𐆓𐆔𐆕𐆖𐆗𐆘𐆙𐆚𐆛
Arrows ←↑↠↡↰↱⇀⇁⟰⟱⟲⟳⤀⤁⤐⤑⤠⤡⤰⤱⥀⥁
Block Elements ▀▁▂▃▄▅▆▇▐░▒▓▔▕▖▗
Box Drawing ┌┐└┘├┤┬┴
Byzantine Musical Symbols 𝀰𝀱𝀲𝀳𝀴𝀵𝀶𝀷
Combining Diacritical Marks à á â ã ā a̅ ă ȧ ä ả å a̋ ǎ a̍ a̎ ȁ
(For Symbols) a⃐ a⃑ a⃒ a⃓ a⃔ a⃕ a⃖ a⃗
(Marks Supplement) a᷀ a᷁ a᷂ a᷃ a᷄ a᷅ a᷆ a᷇ a᷈ a᷉ a᷊ a᷋ a᷌ a᷍ a᷎ a᷏
Control Pictures ␁␂␃␄␅␆␇␈␉␊␋␌␍␎␏
Counting Rod Numerals 𝍠𝍡𝍢𝍣𝍤𝍥𝍦𝍧𝍨𝍩𝍪𝍫𝍬𝍭𝍮𝍯
Currency Symbols $¢£¤¥₠₡₢₣₤₥₦₧₨₩₪₫₭₮₯
Dingbats ✁✆✇✈✉✌✍✐✒✓✟✠
Domino Tiles 🀰🀲🁂🁒🁛🁢🁤🁴🂄🂍
Enclosed Alphanumerics ①②③④⑴⑵⑶⑷⒈⒉⒊⒋⒜⒝⒞⒟ⒶⒷⒸⒹⓐⓑⓒⓓ⓵⓶⓷⓸
Geometric Shapes ■□▢▣▤▥▦▧▰▱▲△▴▵▶▷◀◁◂◃◄◅◆◇◐◑◒◓◔◕◖◗
Ideographic Description Characters 字=⿱宀子
Khmer Symbols ᧠᧡᧢᧣᧤᧥᧦᧧᧨᧩᧪᧫᧬᧭᧮᧯
Letterlike Symbols ℀℁ℂ℃℄℅℆ℇ℈℉ℊℋℌℍℎℏ
Mahjong Tiles 🀀🀁🀂🀃🀆🀅🀄🀇🀏🀐🀘🀙🀡🀢🀦🀪🀫
Mathematical Alphanumeric Symbols 𝐀𝐁𝐂𝐃𝐄𝐅𝐆𝐇𝐈𝐉𝐊𝐋𝐌𝐍𝐎𝐏
Mathematical Operators ∀∂∃∄∈∊∌∑∓√⨀⨁⨂⨃⨐⨠⨡⨢⨰⨱⨲⩀⩁⩂
Mathematical Symbols ⟀⟁⟂⟃⟐⟑⟒⟓⟠⟡⟢⟣⦀⦁⦂⦃⦄
Misellaneous Symbols ☀☁☂☃☄★☆☐☑☒☓☔♀♁♂♃♠♡♢♣♨♰♲♳♴⚀⚁⚐⚒⚓⚠⚡⚢
Modifier Tone Letters ꜀꜁꜂꜃꜄꜅꜆꜇꜈꜉꜊꜋꜌꜍꜎꜏
Musical Symbols 𝆰𝆱𝆲𝆳𝆴𝆵𝆶𝆷𝆸𝆹𝆺𝆹𝅥𝆺𝅥𝆹𝅥𝅮𝆺𝅥𝅮𝆹𝅥𝅯
Number Forms ⅓⅔⅕⅖ⅠⅡⅢⅣⅰⅱⅲⅳↀↁↂↃↄ
Optical Character Recognition ⑀⑁⑂⑃⑄⑅⑆⑇⑈⑉⑊
Superscripts and Subscripts ⁰¹²³²⁴⁵⁶⁷⁸⁹ ₀₁₂₃₄₅₆₇₈₉
Supplemental Punctuation ⸀⸁⸂⸃⸄⸅⸆⸇⸈⸉⸊⸋⸌⸍⸎
Tai Xuan Jing Symbols 𝌰𝌱𝌲𝌳𝌴𝌵𝌶𝌷𝌸𝌹𝌺𝌻𝌼𝌽𝌾𝌿
Technical Symbols ⌀⌁⌂⌃⌐⌑⌒⌠⌡⌰⌱⌲⌳⍀⍁⍂⍐⍑⍒
Yijing Hexagram Symbols ☰☱☲☳☴☵☶☷
5. Special Characters
NAME SAMPLES
---------------------- -----
LANGUAGE TAGS
VARIATION SELECTORS 邊 vs. 邊󠄀, 邊󠄁, 邊󠄂, 邊󠄃, 邊󠄄, 邊󠄅, 邊󠄆, 邊󠄇
-------------- next part --------------
-*- coding: utf-8 -*-
⠁⠃⠄⠅⠧⣙
More information about the XEmacs-Beta
mailing list