Unicode Standards for Scripts of Asia

Enabling support for regional languages of Asia on digital devices

Anshuman Pandey
Department of Linguistics
University of California, Berkeley



Contents



Introduction

Your ability to read these very words on your device is possible because the operating system supports the Latin alphabet. We often take this for granted. Now imagine if your device's software didn't support Latin. You would not have been able to read the previous sentences because they would appear as "□□□ □□□□□□□ ;□□ □□□□ □□□□□ □□□□ □□□□□ ...". You might be lucky if you saw these boxes instead of actual letters because it means that there is at least some support, but there is still something amiss, maybe a font... Now imagine if your device simply does not know about your language or the alphabet that it's written in. You wouldn't see "□□□ □□□□□□□ ;□□ □□□□ □□□□□ □□□□ □□□□□ ...", you wouldn't even be able to see anything. There are languages around the world for which this is the default state. The expansion and growth of digital technologies has changed the manner in which we conceptualize the notion of communication. This technology has also opened up numerous new horizons, several of which were unthinkable some two decades ago. One of these is the potential to provide users of all languages and scripts of the world with the ability to express themselves using digital technology. Another is the promise of technology for preserving visible language, or "alphabets", that are now extinct, but nonetheless culturally valuable.

One of my many ambitions is to enable both of the above, which I do through my contributions to the The Unicode Standard. For those unfamilar with Unicode, it is a standard for representing "alphabets" or, more generically, "scripts" on computer platforms. The "Unicode Standard" is also known as the "International Standard ISO/IEC 10646, 'Information technology -- Universal multiple-octet coded character set (UCS)'".

Many of the world's major writing systems are now supported in Unicode. But, there are more than 170 scripts around the globe that are as of yet unsupported in the standard. Several of these are ancient and historical scripts no longer used, but which are of significant value for scholars who seek to bring to light the literary history and heritage of humanity. Others are local scripts still used by minority communities as a means for expressing and maintaining their culture. Whether classified as a "major" or "lesser-known" medium, every script is a carrier of linguistic information and contributes to the sum total of human knowledge as it is codified in writing. For this reason, all attested scripts of the world must be adapted for usage on digital platforms.

My contributions to Unicode are focused upon the languages and scripts of south, central, and southeast Asia. During 2015, I was fortunate to serve as a Post-Doctoral Researcher in the Department of Linguistics at the University of California, Berkeley. My position was funded by a Google Research Award granted to the Script Encoding Initiative (SEI) at Berkeley. Through the Google award, I was given the unique opportunity to work on languages and scripts used by minority communities, as well as to expand support for major written languages used by millions of people. Without this grant, my work on enabling digital support for these languages and scripts may not have been possible. Although my work at Berkeley is now complete, I continue to work on encoding scripts in Unicode as a labor of love, as my time allows.

This site presents an overview and bibliography of the scripts and individual characters that I have contributed to Unicode. These documents contain detailed histories of writing systems, descriptions of orthographies, examples of usage, and information related to the technical implementation of the scripts. My work has made it possible for users to view or type one of the scripts or characters listed below on their digital device. All of my work is based upon original research that I have performed. Several of the documents I have authored are the only English language resources available for various historical and minority scripts. Additional information on script-encoding projects that I am currently pursuing may be found on my blog. Please direct comments to me at the email address provided at the bottom of this page.



Overview of Projects

Listed below are script blocks and individual characters in Unicode that I have contributed, along with a link to the code chart at the website of the Unicode Consortium.

Characters published or approved for future publication

Projects in progress

  • Final proposals submitted

    ARABIC SIYAQ NUMBER MARK
    Arabic Siyaq Numbers (unification of Diwani and Ottoman Siyaq Numbers)
    BUGINESE SIGN VIRAMA-1
    BUGINESE SIGN VIRAMA-2
    Hanifi Rohingya
    Nandinagari
    Persian Siyaq Numbers
    SOYOMBO SIGN JIHVAMULIYA
    SOYOMBO SIGN UPADHMANIYA

  • Preliminary proposals submitted:

    Balti 'B'
    Dhives Akuru
    Jenticha (Koinch Brehs)
    Kawi
    Kerinci
    Kulitan
    Lampung
    Old Sogdian
    Sogdian
    Tolong Siki
    Vatteluttu

  • Research ongoing

    Angka Bejagung
    Bagada
    Balti 'A'
    Bhujinmol
    Bima (Mbojo)
    Buginese additions
    Chalukya
    Chola
    Coorgi-Cox Alphabet
    Devanagari additions
    Dhimal / Dham
    Eskaya
    Gangga Malayu
    Gurung Khema (Tamu Khema Phri)
    Iban
    Kadamba
    Khambu Rai
    Khatt-i Baburi
    Khojki additions
    Kirat Rai
    Landa
    Limbu additions
    Lontara bilang-bilang
    Lota Ende
    Magar Akkha
    Marchung
    Minahasa, Old
    Minangkabau
    Miscellaneous Symbols and Pictographs additions
    Naasioi
    Old Uighur
    Oriya Karani
    Pallava
    Pau Cin Hau Syllabary
    Pungchen
    Pungchung
    Pyu
    Rakhawunna
    Ranjana
    Rejang additions
    Satavahana
    Sharada additions
    Siddham additions
    Sirmauri
    Soyombo additions
    Sumbawa (Satera Jontal)
    Tai Yo (Tai Do)
    Tangsa (Khimhun)
    Tangsa (Mossang)
    Tani Lipi
    Tikamuli
    Wancho
    Zou (Zolai)


Documentation for Script Projects

  • Rohingya
    • See "Hanifi Rohingya"


Presentations

Please see "Technical Presentations" in my
list of presentations for talks related to Unicode.


Appreciation



Last updated: 3 May 2016