Unicode is a standard that aims to maintain a universal consistency in representing the characters as integers, and thus provides the basis for processing, storing and interchanging the text data in any language. The standard has been well accepted by a majority of the modern software and information technology protocols, since it has been able to cover the needs of almost all users by including all characters (technical symbols, punctuation’s, etc.) for all the modern as well as ancient writing systems of the world. Further, it enables access to the software products and web sites across multiple platforms, languages, and countries without the need for re-engineering. The recently launched versions of the program are further enhanced and inclusive. For instance, Unicode 6.3 launched in September 2013 contains more than 110,000 characters covering 100 scripts and a multitude of symbols. In Unicode each code point has a single General Category property, with the major categories being Letter, Punctuation, Mark, Number, Symbol, and Separator further having set of subdivisions. However, this is not to say that these categories are useful for every purpose, since legacy encodings have used multiple characteristics per single code point. The Unicode Character Standard is found to primarily encode scripts rather than languages, which are characterized by a set of symbols common for more than one language that have a historically related derivation. These set of symbols are then unified into a single collection identified as a single script, which eventually serve as inventories of symbols which are drawn upon to write particular languages. In many cases, a single script may serve to write tens or even hundreds of languages (e.g., the Latin script). Unicode works in close association with W3C and ISO, particularly ISO/IEC/JTC 1/SC2/WG2, which are responsible for maintaining the International Standard synchronized with the Unicode Standard, namely ISO/IEC 10646. The Unicode Standard, with its Annexes and character (Unicode Character Database, 2014), Unicode Technical Standards and reports (Unicode Technical Report, 2016), Unicode Technical Notes and the Unicode Locales project, and the Common Locale Data Repository have been recognized as the most accepted publications of the Unicode Consortium.

Original Reference Article:

  • Jain, C. (2017). Evolving a Model for National Digital Repository of Indian Government Publications using Institutional Repository Infrastructure.

