Arabic is one of the major languages that have been given attention by machine translation (MT) researchers since the very early days of MT and specifically in the U.S. The language has always been considered "due to its morphological, syntactic, phonetic and phonological properties [to be] one of the most difficult languages for written and spoken language processing."[1] Arabic "differs tremendously in terms of its characters, morphology and diacritization from other languages."[1] Accordingly, researchers cannot always import solutions from other languages, and today Arabic machine translation still needs more efforts to be improved, mainly in the area of semantic representation systems, which are essential for achieving high-quality translation.
In 2022, Abu Dhabi's Technology Innovation Institute (TII) unveiled 'Noor,' the world's largest natural language processing model for Arabic language translation. Prior to this, the largest Arabic model was AraGPT, a model trained on 1.5 billion parameters. TII trained Noor on 10 billion parameters.
Particularistic approaches describe the linguistic features of Arabic and use them for a local processing approach specific to the internal linguistic system of Arabic. They are concerned with the morphological and semantic aspects of Arabic. Sakhr is one of the Arabic-speaking groups developing systematically machine processing of Arabic.[2]
Universalist approaches use the methods and systems proved to be useful in other languages like English or French making some adaptations if necessary. The focus here is on the syntactic aspects of the linguistic system in general. This approach is followed by most of the companies producing software applications for Arabic.
Original source: https://en.wikipedia.org/wiki/Arabic machine translation.
Read more |