Arabic alphabet
The
Arabic alphabet is the principal script used for writing the
Arabic language. As the
alphabet of the language of the
Quran, the holy book of
Islam, its influence spread with that of Islam and it has been, and still is, used to write other languages without any linguistic roots in Arabic, such as
Persian and
Urdu). (See fuller list below.)
It is often necessary to add or modify certain letters in order to adapt this alphabet to the phonology of the target languages.
The Arabic alphabet is composed of 28 basic letters and is written from right to left. There is no difference between written and printed letters; the concepts of upper and lower case letters does not exist (thus the writing is unicase). On the other hand, most of the letters are attached to one another, even when printed, and their appearance changes as a function of whether they are preceded or followed by other letters or stand alone (that is, there is contextual variation). The Arabic alphabet is an
abjad, a term describing writing in which the vowels are not explicitly written; so the reader must know the language in order to restore them. However, in editions of the Quran or in didactic works a vocalization notation in the form of
diacritic marks is used. Moreover, in vocalized texts, there is a series of other diacritics of which the most modern are an indication of vowel omission
(sukūn) and the lengthening of consonants
(šadda).
This alphabet can be traced back to the alphabet\nused to write the Nabataean dialect of
Aramaic, itself descended from
Phoenician (which, among others, gave rise to the
Greek alphabet and, thence, to Latin letters, etc.). The first example of a text in the Arabic alphabet appeared in
512 A.D. It wasn't until the
7th century that dots were added above and below the letters to differentiate them (the Aramaic model had fewer\nphonemes than the Arabic, so in the early writings\na single letter might represent several phonemes).
The Arabic alphabet can be
transliterated and
transcribed in various ways. The preferred method in this document will be
DIN-31635. It can be encoded using several
character sets, including
ISO-8859-6 and
Unicode, thanks to the "Arabic segment", entries U+0600 to U+06FF. However, these two sets do not indicate for each of the characters the in-context form they should take. It is left to the rendering engine to select the proper
glyph to display for each character.
When one wants to encode a particular written form of a character, there are extra code points provided in Unicode which can be used to express the exact written form desired.\nThe
Arabic presentation forms A (U+FB50 to U+FDFF) and
Arabic presentation forms B (U+FE70 to U+FEFF) contain most of the characters with contextual variation as well as the extended characters appropriate for other languages. These effects are better achieved in UNICODE \nby usiong the zero-width joiner and non-joiner, as these presentation forms are deprecated in Unicode, and should generally only be used within the internals of text-rendering software, or for backwards compatibility with implementations that rely on the hard-coding of glyph forms.
Finally, the Unicode encoding of Arabic is
in logical order, that is, the characters are entered, and stored in computer memory, in the order that they are written and pronounced without worrying about the direction in which they will be displayed on paper or on the screen. Again, it is left to the rendering engine to present the characters in the correct direction. In this regard, if the Arabic words on this page are written left to right, it is an indication that the Unicode rendering engine used to display them is out-of-date. For more information about encoding Arabic, consult the Unicode manual available at
http://www.unicode.org/
Presentation of the alphabet
The transcription and the transliteration mainly follow the DIN 31635 standard; the alternatives belonging to other standards are indicated after the oblique bar.
Notice that the horizontal-line diacritic above\nthe long vowels is often replaced by a circumflex, \nbecuase it happens to be easier to type in many keyboards.
A transliteration from Arabic must allow the reconstruction\nof the original Arabic letters, so it \nshows the characters which are not pronounced or which are pronounced as others. A phonemic transcription indicates\nonly the pronunciation. See below for more details. The phonemic transcription (somewhat simplified here) follows the conventions of the International Phonetic Alphabet: for more details concerning the pronunciation of Arabic, consult the article on Arabic pronunciation.
SATTS, the Standard Arabic Technical Transliteration System, is a US military standard transliteration of Arabic letters to the Latin alphabet.
Primary letters
\n
\n \n | Stand-alone | \n Initial | \n Medial | \n Final | \n Name | \n Trans. | \n Value | \n
\n \n | ﺀ | أ ؤ إ ئ ٵ ٶ ٸ ځ, etc. | hamza | ʾ / ’ et ‚ | [ʔ] | \n
\n \n | ﺍ | — | ﺎ | ʾalif | ā / â | [aː] | \n
\n \n | ﺏ | ﺑ | ﺒ | ﺐ | bāʾ | b | [b] | \n
\n \n | ﺕ | ﺗ | ﺘ | ﺖ | tāʾ | t | [t] | \n
\n \n | ﺙ | ﺛ | ﺜ | ﺚ | ṯāʾ | ṯ / th | [θ] | \n
\n \n | ﺝ | ﺟ | ﺠ | ﺞ | ǧīm | ǧ / j / dj | [ʤ] | \n
\n \n | ﺡ | ﺣ | ﺤ | ﺢ | ḥāʾ | ḥ | [ħ] | \n
\n \n | ﺥ | ﺧ | ﺨ | ﺦ | ḫāʾ | ḫ / ẖ / kh | [x] | \n
\n \n | ﺩ | — | ﺪ | dāl | d | [d] | \n
\n \n | ﺫ | — | ﺬ | ḏāl | ḏ / dh | [ð] | \n
\n \n | ﺭ | — | ﺮ | rāʾ | r | [r] | \n
\n \n | ﺯ | — | ﺰ | zāy | z | [z] | \n
\n \n | ﺱ | ﺳ | ﺴ | ﺲ | sīn | s | [s] | \n
\n \n | ﺵ | ﺷ | ﺸ | ﺶ | šīn | š / sh | [ʃ] | \n
\n \n | ﺹ | ﺻ | ﺼ | ﺺ | ṣād | ṣ | [sˁ] | \n
\n \n | ﺽ | ﺿ | ﻀ | ﺾ | ḍād | ḍ | [dˁ], [ðˤ] | \n
\n \n | ﻁ | ﻃ | ﻄ | ﻂ | ṭāʾ | ṭ | [tˁ] | \n
\n \n | ﻅ | ﻇ | ﻈ | ﻆ | zāʾ | ẓ | [zˁ], [ðˁ] | \n
\n \n | ﻉ | ﻋ | ﻌ | ﻊ | ʿayn | ʿ / ‘ | [ʔˤ] | \n
\n \n | ﻍ | ﻏ | ﻐ | ﻎ | ġayn | ġ / gh | [ɣ] | \n
\n \n | ﻑ | ﻓ | ﻔ | ﻒ | fāʾ | f | [f] | \n
\n \n | ﻕ | ﻗ | ﻘ | ﻖ | qāf | q / ḳ | [q] | \n
\n \n | ﻙ | ﻛ | ﻜ | ﻚ | kāf | k | [k] | \n
\n \n | ﻝ | ﻟ | ﻠ | ﻞ | lām | l | [l] | \n
\n \n | ﻡ | ﻣ | ﻤ | ﻢ | mīm | m | [m] | \n
\n \n | ﻥ | ﻧ | ﻨ | ﻦ | nūn | n | [n] | \n
\n \n | ﻩ | ﻫ | ﻬ | ﻪ | hāʾ | h | [h] | \n
\n \n | ﻭ | — | ﻮ | wāw | w | [w] | \n
\n \n | ﻱ | ﻳ | ﻴ | ﻲ | yāʾ | y | [j] | \n
\n
Letters lacking an initial or medial version are never tied to the following letter, even in a word. As to
ﺀ hamza,, it has only a single graphic, since it is never tied to a preceding or following letter.
Other letters
\n\n \n | Stand-alone | \n Initial | \n Medial | \n Final | \n Name | \n Trans. | \n Value | \n
\n \n | ﺁ | — | ﺂ | ʾalif madda | ʾā | [ʔaː] | \n
\n \n | ﺓ | — | ﺔ | tāʾ marbūṭa | h or t / Ø / h / ẗ | [a], [at] | \n
\n \n | ﻯ | — | ﻰ | ʾalif maqṣūra | ā / ỳ | [aː] | \n
\n \n | ﻻ | — | ﻼ | lām ʾalif | lā | [laː] | \n
\n
Notes
Writing the hamza
\nInitially, the letter ʾalif indicated a occlusive glottal, or glottal stop, transcribed by [ʔ], confirming the alphabet came from the same Phoenician origin. Now it is used in the same manner as in other abjads, with
yāʾ and
wāw, as a
mater lectionis, that is to say, a consonant standing in for a long vowel (see below). In fact, over the course of time its phonetic value has been obscured, since,
ʾalif serves principally to replace phonemes or to serve as a graphic support for certain diacritics.
The Arabic alphabet now mainly uses the
hamza to indicate a
glottal stop, which can appear anywhere in a word. This letter, however, does not function like the others: it can be written alone or on a support in which case it becomes a diacritic:
- alone : ء ; \n* with a support : إ ,أ (above and under a ʾalif), ؤ (above a wāw), ئ (above a yāʾ 'without points or yāʾ hamza).
The details of writing of the
hamza are discussed below, after that of the vowels and syllable-division marks, because their functions are related.
Ligatures
\nThe only compulsory ligature is lām+'alif. All other ligatures (yaa - meem, etc) are optional.
Some fonts include a
ﷺ [should provide an image]
(Sall-allahu alayhi wasallam) glyph and an \n
ﷲ [should provide an image]
(Allah) glyph.\nThe former is used after any mention of the name of the\nHoly Prophet (may Allah bless him and give him peace).\nThe latter is a makeshift for the incompetency of most\ntext processors, which are incapable of displaying the\nHoly Name correctly because of their buggy display\nof vowel marks.
Diacritics
Vowels
Arabic short vowels are generally not written, except sometimes in sacred texts (such as the Quran) and didactics, which are known as vocalised texts.
Short vowels may be written with diacritics placed above or below the consonant that precedes them in the syllable. (All Arabic vowels, long and short, follow a consonant; contrary to appearances: there
is a consonant at the start of a name like Ali in Arabic
ʾAlī or a word like
ʾalif.)
Long "a" following a consonant other than hamzah\nis written with a short-"a" mark on the consonant \nplus an alif after it\n(
ʾalif). Long "i" is a mark for short "i" plus \na yaa
yāʾ, and long u is mark for short u plus waaw,\nso aā = ā, iy = ī and uw = ū);
Long "a" following a hamzah sound may be representend by \nan alif-madda or by a floating hamzah followed by an alif.
In an un-vocalised text (one in which the short \nvowels are not marked), the long vowels are \nrepresented by the consonant in question (alif, yaa, waaw). \nLong vowels written in the middle of a word are treated like consonants taking
sukūn (see below) in a text that has full diacritics.
For clarity, vowels
will be placed above or below the letter د
dāl so it is necessary to read the results [da], [di], [du], etc. Please note, د
dāl is one of the six letters that do not connect to the left, and is used in this demonstration for clarity. Most other letters connect to
ʾalif,
wāw' and yāʾ''.
\n \n | Simple vowels | \n Name | \n Trans. | \n Value | \n
\n \n | دَ | fatḥa | a | [a] | \n
\n \n | دِ | kasra | i | [i] | \n
\n \n | دُ | ḍamma | u | [u] | \n
\n \n | دَا | fatḥa ʾalif | ā | [aː] | \n
\n \n | دَى | fatḥa ʾalif maqṣūra | ā / aỳ | [aː] | \n
\n \n | دِي | kasra yāʾ | ī / iy | [iː] | \n
\n \n | دُو | ḍamma wāw | ū / uw | [uː] | \n
\n
\n| tanwiin letters: |
\n| ً , ٍ , ٌ | used to produce the grammatical endings /an/, /in/, and /un/ respectively. ً is usually used in combination with ا ( اً ). | \n
Syllabation signs and others
Shadda
\nّ shadda marks gemination of a consonant; kasra (see below) moves to between the shadda and the geminate consonant when present.
Sukūn
An Arabic syllable can be open (ended by a vowel) or closed (ended by a consonant).\n* open: C[onsonant]V[owel]; \n* closed: CVC(C).
When the syllable is closed, we can indicate that the consonant that closes it does not carry a vowel by marking it with a sign called sukūn, which takes the form "°", to remove any ambiguity, especially when the text is not vocalised: it's necessary to remember that a standard text is only composed of series of consonants; thus, the word qalb, "heart", is written qlb. Sukūn allows us to know where not to place a vowel: qlb could, in effect, be read /qVlVbV/, but written with a sukūn over the l and the b, it can only be interpreted as the form /qVlb/ (as for knowing which vowel to use, the word has to be memorised); we write this قلْبْ (without ligature: قلْبْ).
You might think that in a vocalised text sukūn \nis not necessary, because the lack of vowel after\na consonant might be signalled by simply not writing\nany mark above it, so قِلْبْ would be redundant. That is not so because such a convention \n("lack of any vowel mark means lack of vowel sound")\ndoes not exist: k + u + t + b may indeed be read\n"kutib". Such a rule would make sense if \neverybody writing a vowel mark were forced\nto write all vowel marks in the same word,\nand that is not the case. In fact, you may \nwrite as many or as few \nof the vowel marks as you like.
In the Quran, however, all vowel marks \nmust be written: there, sukuun over a letter \n(other than the alif indicating long "a")\nindicates that it is pronounced but not followed\nby a short vowel, while the lack of any sign\nover a letter (other than alif) indicates that\nthe consonant is not pronounced.
Outside of the qur'aan, putting a sukuun above a\nyaa' which indicates long ee, or above a \nwaaw which stands for long oo, is extremely rare,\nto the point that yaa with sukuun will be unambiguously \nread as the diphthong ai (as in Englis "eye") and waaw with sukuun will be read au (as in English "cow").
So, the word zauǧ, "husband", can be written simply zwǧ : زوج (which\nmight be also read "zooj" if such a word existed); or with sukūn \nزوْجْ\nwhich is unambiguously "zowj";\nor with sukūn and vowels: زَوْج.
The letters \n \nmwsyqā \n(موسيقى with a ʾalif maqṣūra at the end of the word)\nwill be read most naturally as the word "mooseekaa"\n("music"). If you were to write sukuuns above the \nwaaw, yaa and alif, you'd get \nموْسيْقىْ,\nwhich looks like "mowsaykay".\n(note that an ʾalif maqṣūra is an alif and never takes sukūn, so when you put a sukuun above it it loks like a yaa deprived of its two dots below).
You cannot place a sukuun on the final letter j of "zawj"\neven if you don't pronounce a vowel there, \nbecause fully vocalised texts are always written as if \nthe ighraab vowels were in fact pronounced, and this word \ncan never have a sukuun as an ighraab. Let's take the sentence "ahmad zawj sharr", meaning "Ahmed is a\nbad husband". The theoretical pronunciation with the ighraab vowels is "ahmadu zaujun sharr". Interestingly,\nregardless of the fact \nthat most people say "ahmad zauj sharr", you cannot write the mark for sukuun over that j; you either leave it marless, or use the mark for "un". By the same token, you can leave the final r of this sentence either completely unmarked or topped \nwith a shadda plus "un", but a sukuun never belngs there, regardless of the fact that the only correct pronunciation of "sharrun" at the end of an utterence is "shar".
Arabic numerals
\nThere are two kinds of numerals used in Arabic writing; standard Arabic numerals, and "EastArab" numerals, used in Arab writing in Iran,
Pakistan and
India. In Arabic, these numbers are referred to as "Indian numbers" (أرقام هندية). In most of present-day North Africa, the usual Western numerals are used; in medieval times, a slightly different set (from which, via Italy, Western "Arabic numerals" derive) was used.\n
\n\n\n\n| Standard numerals | \n| ٠ | 0 | \n| ١ | 1 | \n| ٢ | 2 | \n| ٣ | 3 | \n| ٤ | 4 | \n| ٥ | 5 | \n| ٦ | 6 | \n| ٧ | 7 | \n| ٨ | 8 | \n| ٩ | 9 | \n \n | \n\n\n| EastArab numerals | \n| ۰ | 0 | \n| ۱ | 1 | \n| ۲ | 2 | \n| ۳ | 3 | \n| ۴ | 4 | \n| ۵ | 5 | \n| ۶ | 6 | \n| ۷ | 7 | \n| ۸ | 8 | \n| ۹ | 9 | \n \n | \n
\n
Arabic alphabets of other languages
\nArabic script is not used solely for writing Arabic, but for a variety of languages. In each language it is used for, it has been modified to fit the language's sound system. There are
phonemes not found in Arabic, but found in, for instance, Persian and Malay and Urdu - especially since those three languages are not related to Arabic. For example, the Arabic language lacks a "P" sounding letter, so many languages add their own "P" in the script, though the symbol used may differ between languages. These modifications tend to fall into groups; so all the Indian and Turkic languages written in Arabic tend to use the Persian modified letters, whereas West African languages tend to imitate those of Ajami, and Indonesian ones those of
Jawi.
The Arabic alphabet is currently used for: \n*
Dari,
Pashto, and
Uzbek in
Afghanistan \n*
Persian (Farsi) and
Azeri in
Iran (though
Azeri is written in
Latin and
Cyrillic scripts in
Azerbaijan)\n*
Malay, known as
Jawi, in
Brunei and formerly in
Malaysia and
Indonesia\n*
Urdu,
Kashmiri,
Sindhi, and
Baluchi in
Pakistan\n*
Punjabi in
Pakistan, where it is known as
Shahmukhi\n*
Kurdish and
Turkmen in Northern
Iraq, while in
Turkey Roman script is used for Kurdish.\n*
Uyghur and
Kazakh in northwest
China (
Xinjiang)\n*
Wolof (at zaouias), known as Wolofal\n*
Hausa for many purposes, especially religious (known as Ajami)\n*
Comorian (Comorian) in the
Comoros, side by side with the
Latin alphabet (neither is official)
In the past, it has also been used to represent other languages: \n* Fulani, known as Ajami\n*
Sanskrit has also been written in Arabic script, though it is more well known as using the
Devanagari script - the same script used for writing the
Hindi language \n*
Somali\n*
Swahili\n*
Turkish in the
Ottoman Empire was written in Arabic script until
Atatürk declared the change to
Roman script in
1928. This form of Turkish is now known as
Ottoman Turkish and is held by many to be a different language, due to its much higher percentage of Persian and Arabic loanwords.\n*
Turkmen in
Turkmenistan\n* Chaghatay across Central Asia\n*
Songhay in West Africa, particularly in
Timbuktu.\n*
Berber in North Africa, particularly
Tachelhit in
Morocco.\n*
Nubian\n*
Afrikaans (among the "Cape Malays")\n*
Bosnian\n*
Belarusan (among ethnic
Tatars)\n*
Spanish, when the Moors ruled Spain
See also:\n*
Jawi\n*
Arabic numerals\n*
Unicode characters for the Arabic alphabet\n*Arabic alphabet/from the French Wikipedia\n*
Arabic calligraphy, considered an
art form in its own right.
External links
\n* Arab writing and calligraphy\n*
Article about Arabic alphabet\n*
Arabic alphabet and calligraphy\n*
aralpha (freeware) to learn the characters\n----------------\n
This article contains major sections of text from the very detailed article Arabic alphabet/from the French Wikipedia, which has been partially translated into English. Further translation of that page, and its incorporation into the text here, are welcomed.
Category:Alphabetic writing systems
\n\n\n\n\n\n\n\n\n\n\n