The Unicode Consortium (Unicode, Inc.) takes the honor of being selected as the 2019 winner of the Annual Bruessard Award. This not-for-profit organization was selected in recognition of its tireless effort to globally simplify and standardize written human communications. Its public arm, Unicode.org , devised a uniform and universal standard for identifying characters—including an electronic transmission encoding scheme. The Unicode Consortium accomplished this feat by devising a system known as Uniform Transformation Format or UTF encodings. The Unicode Consortium has assigned—and is poised to assign—a unique identifier to every written character, number, and symbol used by humans in the past, present, and conceivably in the future. Through its work, the Unicode Consortium is making all of the world's characters, numbers, and symbols amenable to electronic entry in a standardized format. Further, by making a one-to-one linkage between the characters, numbers, and symbols of different languages, it becomes possible to more easily and more consistently translate, convert and swap documents created in different languages. More importantly, the Unicode Consortium has made it possible for humans across the globe to more easily consume the human stock of knowledge—in their native language or tongue. In effect, the Unicode Consortium is further fostering the knowledge-for-all phenomenon made possible by the emergence of the World Wide Web.
Unicode is language. Unicode speaks the world's languages.
Most Commonly Used Languages on the Internet
Some Contemporary Languages
Some of the World's Most Commonly Spoken Languages
Saying "I Love You" in Many Languages: The Wallpaper
Introducing Unicode.org
This web browser does not support HTML5 videos. Try updating your browser or using the latest Firefox, Chrome, Safari, or Edge web browser.
Check out this bonus multilingual application to demonstrates Unicode / UTF in action. Because this bonus multilingual application is being presented on the World Wide Web, it has been restricted to the world's most commonly spoken languages (in terms of millions of speakers) including the 6 official languages on the United Nations, namely, English, Arabic, Russian, Chinese, French, and Spanish.
Bonus Multilingual Application:
Click This Button to View the Bonus Multilingual Application
Watch (Stevie Wonder, The Secret Life Of Plants)
VIDEO
Watch (Vangelis, Theme From Antarctica)
VIDEO
It should be noted that the above bonus multilingual application was not created in a vacuuum. In conjunction with making use of the Unicode's CLDR (Common Locale Data Repository) data and leveraging Unicode's UTF-8 encoding scheme, the above multilingual application made extensive use of other related web resources. For instance, the above bonus multilingual application also made extensive use of the JavaScript Internationalization and Localization Library known as Globalize.js. Globalize.js initially was developed by Microsoft Corporation. Globalize.js currently is maintained by Rafael Xavier de Souza as its lead. The above bonus multilingual application also followed TAU's (Tizen Advanced UI) implementation of Globalize.js. TAU Globalize was released by Samsung Electronics Co., Ltd. with Hosup Choi currently as its primary maintainer. See, for instance, the Credit page of this website for a full list of resources used to create this 2019 Winner page including the resources used to create the bonus multilingual application. Another example of Unicode's multilingual capabilities can be gleaned by clicking this link . The following graphic summarizes how the above bonus multilingual application was implemented, except in this instance, at the first or top level would be website (namely, Tizen Mobile Web) and at the second level would be Globalize.js (namely, Globalize.js's i18n implementation).
Internationalization Layers
Written human communications began humbly enough. Written human communications began with characters or letters. In turn, letters formed words, and words formed sentences. And, of course, sentences were the building blocks of human communications. Though a complicated human feat, written communications is as simple as that.
What is the origin of human communications? How did humans come to speak many languages? There are two prevailing perspectives. The first one is the religious perspective. The second one is the scientific perspective.
Take the religious perspective, in general, and the Christian perspective, in particular. According to the Christian religious faith and its sacred Holy Bible text, in the beginning, humans only spoke one language. In its explanation of how humans came to speak many languages, the Holy Bible recounts the Tower of Babel story. According to the Tower of Babel story, in effect, human arrogance and egoism greatly disillusioned God. Consequently, these undesirable human traits led God to cause a rift between humans, principally, by making humans speak many languages. The following video provides a simple recounting of the Tower of Babel story.
Watch (animated Kids Bible | TOWER OF BABEL | Latest English Bible Stories For Kids)
VIDEO
Now, consider the scientific perspective. Scientists, generally speaking, think that spoken human languages originated as far back as 100,000 years ago, or around the same time humans first appeared on Earth (see, for instance, the article titled Language in Wikipedia.org). Contrary to Christian Biblical dogma, science contends that humans have never really spoken one, universal language. The following images provide a general synopsis of the scientific perspective on the evolution of spoken and written human languages.
Cosmic Calendar
The Age of Humans
Indo-European and Uralic Language Tree
Development of Writing
To be sure, according to Wikipedia.org, Cuneiform is perhaps the earliest known human writing system. Cuneiform was created by the Sumerian people around 3200 BC (or before the appearance of Jesus Christ on Earth) in southern Mesopotamia or what is now recognized as southern Iraq.
Now, fast forward from 100,000 years ago to the 20th century (or, more specifically, fast forward to the late 1900's). In 1969, the USA successfully tested the Internet. In 1969, USA micro-computer makers such as Commodore, Apple, and Tandy began introducing personal computers to the world. In 1981, USA computer maker IBM popularized the use of personal computers. In 1989, the World Wide Web (WWW) was created by Sir Tim Berners-Lee , and it was implemented in 1990. The Internet in conjunction with the personal computer and the World Wide Web later would combine to revolutionize human communications and interactions on a global scale by merging and unifying all kinds of technologies (such as radio, television, telephone, electronic mail, text chats, webcam chats, teleconference chats, file sharing, document collaboration, and so forth) under a single, unified umbrella known as cyberspace.
What is the common denominator here? The common denominator is this: The Internet, personal computers, and the World Wide Web, by far and predominantly, had English-speaking origins. As a result, logically, the Internet, personal computers, and the World Wide Web, initially, were launched with English-speaking users in mind. Concomitantly, most of the early websites and computer operating systems appeared in English. It did not take long, however, before the personal computer and World Wide Web became global phenomena. Non-English computer character encoding systems and websites began appearing in countries all across the globe. The World Wide Web, suddenly, had become multilingual. However, there was one big problem, and it was this: each country, more or less, had its own unique way of displaying information and implementing websites in each country's respective language. To adapt computers and websites to languages other than English or the Latin alphabets, various encoding schemes were created. The outcome of this development posed a problem because documents and websites created in one language or country could not always easily be converted, viewed, and consumed on a computer that used a different language. Words and sentences got lost in the conversion or translation. The source of the problem was the different character encoding schemes being deployed in different countries across the globe.
How was this problem of multiple language encodings to be resolved? To address this encoding problem, in 1991, the Unicode Consortium made its debut. Unicode facilitated the emergence of truly multilingual software applications and a multilingual World Wide Web. The development of Unicode had the effect of making it possible for the world's population to tap into and benefit from the gigantic stock of human knowledge via the use of personal computers and the World Wide Web. Unicode.org brought order to this encoding chaos by devising a universal alphabet system and an encoding scheme to match. In this scheme, each character, number, and symbol, regardless of language, was assigned a unique hexadecimal identifier.
To summarize, computers only can understand and manipulate zeroes and ones, which is known as binary code. Inside computers, the number 1 represents the "on" signal, and the number 0 represents the "off" signal. Each digit of the two digits (0 and 1) represents 1 bit. Computers transport, store, and manipulate data in chunks of bytes, whereby 8 bits or series of zeroes and ones combine to form a 1-byte unit. Encoding is a process of taking the numbers, characters, and symbols used in everyday life and transforming them into their binary equivalents that computers can understand and manipulate. And, as explained by Chris Hager , "Unicode uses 16 bits (2 bytes) per code-point and furthermore associates each code-point with one of 17 planes. Therefore Unicode provides 2¹⁶ = 65,536 unique code-points per plane, with 2¹⁶ * 17 = 1,114,112 maximum total unique code-points." In other words, it its original inception, Unicode was conceived as a 16-bit encoding system.
Bits and Bytes
To further conceptualize this discussion, the following graphic illustrates how the word Wikipedia gets translated into its equivalent binary code as read and understood by computers.>
Wikipedia in Binary
Again, Unicode originally was envisioned as constituting a 16-bit binary codespace, which would have made it possible to encode 65,536 characters or 2¹⁶. It soon became apparent that 65,536 characters or code points would not be enough characters to capture the world's languages. So, Unicode.org extended its codespace by an additional 1,048,576 characters or code points, hence, 2²⁰. The original 65,536 code points along with the additional 1,048,576 characters combined to expand Unicode's codespace to its current capacity of capturing 1,114,112 characters (65,536 + 1,048,576 = 1,114,112). These 1,114,112 code points are laid out to span 17 planes with each plane containing 65,536 code points.
There are several ways to view how Unicode is structured or laid out. At a high level, Unicode's layout variously consists of planes , blocks , scripts , charts , and characters . Most of the attention, typically, is focused on Unicode's original Plane 0 where its initial 65,536 code points reside. The emojis are becoming increasingly popular. As emojis expand, they are expected to consume a greater amount of Unicode's overall codespace. The following table and graphics illustrate Unicode's structure.
Unicode's Codespace
Planes
Description
Codespace Range
Size (Bits)
Cumulative
0
Basic Multilingual Plane (BMP)
U+0000 to U+FFFF
65,536
65,536
1
Supplementary Multilingual Plane (SMP)
U+10000 to U+1FFFF
65,536
131,072
2
Supplementary Ideographic Plane (SIP)
U+20000 to U+2FFFF
65,536
196,608
3
Tertiary Ideographic Plane (TIP)
U+30000 to U+3FFFF
65,536
262,144
4
Unassigned
U+40000 to U+4FFFF
65,536
327,680
5
Unassigned
U+50000 to U+5FFFF
65,536
393,216
6
Unassigned
U+60000 to U+6FFFF
65,536
458,752
7
Unassigned
U+70000 to U+7FFFF
65,536
524,288
8
Unassigned
U+80000 to U+8FFFF
65,536
589,824
9
Unassigned
U+90000 to U+9FFFF
65,536
655,360
10
Unassigned
U+A0000 to U+AFFFF
65,536
720,896
11
Unassigned
U+B0000 to U+BFFFF
65,536
786,432
12
Unassigned
U+C0000 to U+CFFFF
65,536
851,968
13
Unassigned
U+D0000 to U+DFFFF
65,536
917,504
14
Supplementary Special-purpose Plane (SSP)
U+E0000 to U+EFFFF
65,536
983,040
15
Supplementary Private Use Area-A
U+F0000 to U+FFFFF
65,536
1,048,576
16
Supplementary Private Use Area-B
U+100000 to U+10FFFF
65,536
1,114,112
Total
1,114,112
Unicode's Codespace Map Spanning 17 Planes
Plane 0: Basic Multilingual Plane (BMP) Snapshot
Plane 0: Basic Multilingual Plane (BMP) Up Close 1 of 2
Plane 0: Basic Multilingual Plane (BMP) Up Close 2 of 2
BMP Datatable
NOTE: Significant portions of the above BMP datatable were taken from EntryLevelProgrammer.com .
The Unicode way of encoding does get a bit complicated. One of the complications arises from the fact that American alphabets and numbers (known as the ASCII standard) fully can be represented with less than 128 code points. Therefore, for USA computer makers using English as the base language, an 8-bit (2⁸=256 code points) character encoding scheme was more than sufficient for transmitting data to English-oriented computers for processing (with an extra 128 code points to spare). This 8-bit English orientation of computers initially gave rise to an 8-bit character encoding scheme.
Unicode's rendition of an 8-bit encoding scheme is known as UTF-8. The complication relates to Unicode's unique expansion of UTF-8 to stretch beyond the range of 256 code points. The expansion enabled data to be transmitted in 1, 2, 3, or 4-byte packets to incorporate non-English characters such as the characters used by, say, the Japanese, Chinese, Korean, Arabic, Hindi, and so forth, languages. Some refer to this expansion of UTF-8 as the UTF-8 hack. Adding to Unicode's complexity was its further adoption of other encoding schemes such as its 16-bit and 32-bit methods of encoding known as UTF-16 and UTF-32, not to mention other obscure challenges such as correctly searching and sorting different Unicode characters. The following graphic illustrates the UTF-8 encoding strategy for capturing all code points.
UTF-8 Encoding Strategies
To confuse the situation further, Unicode adopted the hexadecimal format (generally understood by mathematicians) to represent its code points rather than the decimal format understood by less mathematically inclinded humans or the binary format understood by computers. One reason for selecting the hexadecimal format to represent code points was because hexadecimal is a base 16 number system. Unicode initially was conceived to be a 16-bit encoding system. So, the "interchangeable" factor of 16-bit Unicode with base 16 hexadecimal played a role in the selection of hexadecimal as a code point representational format.
I do not purport to be any type of an authority or subject-matter expert on the technical aspects of the inner workings of Unicode and computers. Therefore, I will leave it to the following videos to explain how Unicode, encoding, and computers work in a little more depth.
Much like sign language exists as a form of communications for the deaf and Braille exists as a form of communications for the blind, Unicode exists as another form of communications for the computer. Suffice it to say that Unicode.org was not the first or only organization or corporation to attempt to devise a universal alphabet. For instance, the TRON Project preceded the Unicode Consortium in this multilingual endeavor. The Unicode encoding standard emerged as the popularly accepted universal encoding standard mainly because large, influential, multinational American computer corporations (such as Xerox, Apple, Sun, Microsoft, IBM etc.) supported and contributed to the Unicode encoding standard. The Unicode encoding standard emerged as the popularly accepted standard because prominent USA hardware and software makers were some of the early adopters of the Unicode standard as they propelled the computer industry forward through countless innovations.
American Sign Language
Braille Alphabet Card
Unicode Guide, 2005
The following slide show provides a brief overview of some leading events in contemporary human communications leading to the debut of Unicode. Also, the next two videos pay tribute to Unicode.
Becky Willrich - Unicode, Oh Unicode
This web browser does not support HTML5 videos. Try updating your browser or using the latest Firefox, Chrome, Safari, Opera, or Edge web browser.
History of Unicode: Slideshow
Unicode's Timeline :
Version 1.0.0 published October 1991 with 7,161 characters assigned of the 1,114,112 total code points
Version 2.0.0 published July 1996 with with 38,950 characters assigned of the 1,114,112 total code points
Version 3.0.0 published September 1999 with 49,259 characters assigned of the 1,114,112 total code points
Version 4.0.0 published April 2003 with 96,447 characters assigned of the 1,114,112 total code points
Version 5.0.0 published July 2006 with 99,089 characters assigned of the 1,114,112 total code points
Version 6.0.0 published October 2010 with 109,449 characters assigned of the 1,114,112 total code points
Version 7.0.0 published June 2014 with 113,021 characters assigned of the 1,114,112 total code points
Version 8.0.0 published June 2015 with 120,737 characters assigned of the 1,114,112 total code points
Version 9.0.0 published June 2016 with 128,237 characters assigned of the 1,114,112 total code points
Version 10.0.0 published June 2017 with 136,690 characters assigned of the 1,114,112 total code points
Version 11.0.0 published June 2018 with 137,374 characters assigned of the 1,114,112 total code points
Version 12.0.0 published March 2019 with 137,928 characters assigned of the 1,114,112 total code points
Unicode's Adopt a Character
Siri Poarangan, decodeunicode
This web browser does not support HTML5 videos. Try updating your browser or using the latest Firefox, Chrome, Safari, Opera, or Edge web browser.
According to W3Techs , when it comes to website encoding, as of 2019, Unicode comprised a whopping 95% market share of all character encoding schemes in use on the World Wide Web. The next graphic also illustrates Unicode's UTF-8 growth trend in terms of its use on the World Wide Web.
Unicode's UTF-8 Growth Trend on the World Wide Web
What is the biggest takeaway from this tribute to Unicode.org? The biggest takeaway resides in the fact that Unicode.org attempts to bridge the human divide through facilitating global human communications. The Unicode endeavor neatly fits into a broader human endeavor of cooperation and unity rather than one of human bickering and division. There is so much confusion, disagreement, misunderstanding, and ignorance within the human family. There also is so much crime occurring within the human family, for example, murder, torture, assault, battery, robbery, theft, arson, kidnapping, vandalism, sexual abuse, piracy, hacking, and identity theft. Humans must find a way to overcome their multifarious existential challenges to both human civilization and life on Earth. A starting point would be for humans genuinely to show courtesy and respect for one another despite their multifarious differences. Thanks to the creation of the World Wide Web and the emergence of organizations such as Unicode.org, perhaps a glimmer of light flickers brightly at the end of the tunnel.
Moving beyond multilingual Unicode, it would be remiss of me to not take this opportunity to revisit the big picture. Humans now have moved into the 21st century. A new millenium has begun. Contemporary humans are presented with an array of challenges and disputes to overcome ranging from natural disasters, climate change, poverty, disease, migration, traffiking, substance abuse, wars, the prospect of nuclear warfare, violence, hatred, gunplay etc. Given the billions of humans on Earth and the racial, religious, political, cultural, and so forth, differences to divide them, how will humans ever accomplish the gargantum task of getting on the "same page" of communications? How will humans ever accomplish the gargantum task of understanding one another and fostering an enduring life of harmony and prosperity on Earth?
Another challenge for 21st century humans involves taking it a step farther than the existence of a universal alphabet. Already humans have fostered a somewhat universal measurement system known as the metric system. The next step involves the creation of a universal language, thus, completing the Tower of Babel language circle, so to speak. With the advent of new technologies such as artificial intelligence and deep learning, computer companies such as the following ones are making giant strides at completing the language circle:
These machine language companies presently can take any language or many languages and instantaneously convert them into another language. Conceivably, a common human language should lead to increased human understanding. On the one hand, it is widely held that the World Wide Web possesses the potential for being the great human unifier. It is thought that, by giving all humans access to this global informational network, then humans would proceed to freely exchange ideas for the betterment of humanity. In reality, there is a lot of good stuff and equally a lot of bad stuff on the World Wide Web. Although originally conceived to be a great social unifier, the social media phenomenon has exposed both the good and bad aspects of human nature. Social media, at times, can be more polarizing than harmonizing. Rather than leading to greater cooperation and unity among humans, some social media activities, in some instances, appear to elicit some of the worse aspects of human behavior leading to deeper human divisions and schisms.
Here's to unity and peace on Earth:
NESDIS's (NOAA National Environmental Satellite, Data, and Information Service) Golden Sun
This web browser does not support HTML5 videos. Try updating your browser or using the latest Firefox, Chrome, Safari, Opera, or Edge web browser.
Watch (Maze featuring Frankie Beverly, Golden Time Of Day)
VIDEO
Slices of the Sun
Watch (L.T.D., It Must End)
VIDEO
Earth Globe
Watch (Lionel Richie, Love Will Find A Way)
VIDEO
Earth Map
Watch (Maze featuring Frankie Beverly, Love Is)
VIDEO
World Flags
Watch (Herbie Hancock, Sunlight)
VIDEO
Earth System Diagram
Watch (Earth, Wind & Fire, Faces)
VIDEO
Solar System
Watch (S.O.S. Band, Do It Now)
VIDEO
Animation Shows May 9, 2016 Transit of Mercury Across the Face of the Sun
This web browser does not support HTML5 videos. Try updating your browser or using the latest Firefox, Chrome, Safari, Opera, or Edge web browser.
Watch (Stevie Wonder, As If You Read My Mind)
VIDEO
Arecibo Message
Watch (Ramsey Lewis featuring Earth, Wind & Fire, Sun Goddess)
VIDEO
Arrival of the Space Aliens
Watch (Al Jarreau, Mornin')
VIDEO
Beyond the Solar System
Watch (Brick, Southern Sunset)
VIDEO
Essence of Life
The question becomes this: Now that you have arrived into being, how will you choose to make use of the privilege to participate in Earth's grandiose miracle of life? Hopefully, you will choose to live a constructive, productive, and positive span of life. For, as the saying goes, time waits for no one. Always remember that it is never too late to turn it into something good no matter how old or young you happen to be.
This web browser does not support HTML5 videos. Try updating your browser or using the latest Firefox, Chrome, Safari, Opera, or Edge web browser.
This web browser does not support HTML5 videos. Try updating your browser or using the latest Firefox, Chrome, Safari, Opera, or Edge web browser.
This web browser does not support HTML5 videos. Try updating your browser or using the latest Firefox, Chrome, Safari, Opera, or Edge web browser.
This web browser does not support HTML5 videos. Try updating your browser or using the latest Firefox, Chrome, Safari, Opera, or Edge web browser.
Video top left: The time lapse movie spans fertilization, ooplasmic segregation, and early divisions in the egg of the tunicate Phalusia mammilata | CIL:11962, Phallusia mammilata, egg. cellimagelibrary.org. Dataset
Video top right: The time lapse movie shows mitosis in a cell of the liquid endosperm of the African Blood Lily Haemanthus katherinae observed with phase contrast | CIL:11952, Haemanthus katharinae, endosperm. cellimagelibrary.org. Dataset
Video bottom left: Time lapse movie showing several sequences of mitosis in pollen mother cells of the Easter Lily | CIL:11957, Lilium maritimum, pollen mother cell. cellimagelibrary.org. Dataset
Video bottom right: Phase contrast movie demonstrating how paramecia contractile vacuoles regulate water pressure within the protozoan's body | CIL:40986, Paramecium. cellimagelibrary.org. Dataset
Watch (Hubert Laws, Life Cycles)
VIDEO
P. Lutus' World Clock
⌛
Who will be next? The next Annual Bruessard Award winner will be announced on 1-December-2020. Stay tuned.