Unicode Standards for Scripts of Asia
Enabling support for regional languages of Asia on digital devicesAnshuman Pandey
Department of Linguistics
University of California, Berkeley
IntroductionYour ability to read these very words on your device is possible because the operating system supports the Latin alphabet. We often take this for granted. Now imagine if your device's software didn't support Latin. You would not have been able to read the previous sentences because they would appear as "□□□ □□□□□□□ ;□□ □□□□ □□□□□ □□□□ □□□□□ ...". You might be lucky if you saw these boxes instead of actual letters because it means that there is at least some support, but there is still something amiss, maybe a font... Now imagine if your device simply does not know about your language or the alphabet that it's written in. You wouldn't see "□□□ □□□□□□□ ;□□ □□□□ □□□□□ □□□□ □□□□□ ...", you wouldn't even be able to see anything. There are languages around the world for which this is the default state. The expansion and growth of digital technologies has changed the manner in which we conceptualize the notion of communication. This technology has also opened up numerous new horizons, several of which were unthinkable some two decades ago. One of these is the potential to provide users of all languages and scripts of the world with the ability to express themselves using digital technology. Another is the promise of technology for preserving visible language, or "alphabets", that are now extinct, but nonetheless culturally valuable.
One of my many ambitions is to enable both of the above, which I do through my contributions to the The Unicode Standard. For those unfamilar with Unicode, it is a standard for representing "alphabets" or, more generically, "scripts" on computer platforms. The "Unicode Standard" is also known as the "International Standard ISO/IEC 10646, 'Information technology -- Universal multiple-octet coded character set (UCS)'".
Many of the world's major writing systems are now supported in Unicode. But, there are more than 170 scripts around the globe that are as of yet unsupported in the standard. Several of these are ancient and historical scripts no longer used, but which are of significant value for scholars who seek to bring to light the literary history and heritage of humanity. Others are local scripts still used by minority communities as a means for expressing and maintaining their culture. Whether classified as a "major" or "lesser-known" medium, every script is a carrier of linguistic information and contributes to the sum total of human knowledge as it is codified in writing. For this reason, all attested scripts of the world must be adapted for usage on digital platforms.
My contributions to Unicode are focused upon the languages and scripts of south, central, and southeast Asia. During 2015, I was fortunate to serve as a Post-Doctoral Researcher in the Department of Linguistics at the University of California, Berkeley. My position was funded by a Google Research Award granted to the Script Encoding Initiative (SEI) at Berkeley. Through the Google award, I was given the unique opportunity to work on languages and scripts used by minority communities, as well as to expand support for major written languages used by millions of people. Without this grant, my work on enabling digital support for these languages and scripts may not have been possible. Although my work at Berkeley is now complete, I continue to work on encoding scripts in Unicode as a labor of love, as my time allows.
This site presents an overview and bibliography of the scripts and individual characters that I have contributed to Unicode. These documents contain detailed histories of writing systems, descriptions of orthographies, examples of usage, and information related to the technical implementation of the scripts. My work has made it possible for users to view or type one of the scripts or characters listed below on their digital device. All of my work is based upon original research that I have performed. Several of the documents I have authored are the only English language resources available for various historical and minority scripts. Additional information on script-encoding projects that I am currently pursuing may be found on my blog. Please direct comments to me at the email address provided at the bottom of this page.
Overview of ProjectsListed below are script blocks and individual characters in Unicode that I have contributed, along with a link to the code chart at the website of the Unicode Consortium.
Characters published or approved for future publication
Projects in progress
Documentation for Script Projects
PresentationsPlease see "Technical Presentations" in my list of presentations for talks related to Unicode.
Last updated: 3 May 2016