Gujarati/How to use Unicode in creating Gujarati script

Gujarati alphabet mainly includes 34 consonants (ornamented sounds), 2 compound characters that are treated as consonants (not lexically though), and 14 vowels (pure sounds). Overall, the writing system comprises 94 legitimate and recognized distinct symbols or shapes. In the current Unicode 4.1 implementation, however, only some of these symbols have been incorporated as glyphs or shapes. The remaining shapes are created by conjunctions.

Introductory knowledge of Gujarati language and script can be obtained from

Given a constructed Gujarati syllable, it can be logically divided into the following parts based on the position of the shapes involved.

Examples (clock-wise from top-left): 1. Post-based (Right) 2. Below-based (Lower) 3. Pre-based (Left) 4. Above-based (Upper). We will use these conventions in our further discussion.

Substitution, in the sense applicable here, means replacing a set or group of characters or shapes with a single character or shape. In practical terms, this translates as – 1) multiple key-strokes will generate a single shape; and 2) the resultant shape will keep transforming itself (based on certain rules) in accordance with the user's key-strokes or inputs.

Substitution can happen when you add one or more shapes in any of the positions other than the baseline area (see illustration above).

The Unicode range for Gujarati script is from U+0A80 to U+0AFF. The ISCII Code-page identifier for Gujarati script is 57010.

The table below shows the glyphs that are implemented in Unicode standard 4.0.0. Gray boxes indicate the code-points that are reserved/unused.

Half-forms of consonants are used in pre-base position. For consonants that do not have distinct glyph for half-forms, a Halant (્) is used to create half-forms as follows:

(Note the Half-form of મ, which is used here in conjunction with ય) Note: Half-form is not created for the base glyph even if the syllable ends with a Halant.

(special glyph ડ્ર. Notice the two lower-based marks, as compared to only one in the previous example.)

Following characters, which are part of the Gujarati alphabet, but are not explicitly created as glyphs in Unicode character-set, can be generated as indicated below:

Following are the main character substitutions which are required to address the complexity of the language and to generate various character forms of the script:

The half-form conjunctions, one of the most common occurrences of the script, are created by pre-base substitutions.

Also, the special use of this substitution is in creating I-Matra (and its appropriately aligned shape) as shown below:

Consonants of the Gujarati script do not have post-based forms. Primarily, post-based substitution is used to create visarga out of vowels, and is also applied for "I-Matra" substitutions as follows (which will precede any above-based substitution, if applied as well):

(Compare the special shape જી – a result of post-based substitution – with another result of similar conbination using a character like લ, which will generate: લ +ી = લી)