Project Description

Natural language processing is a vibrant and active research area in computer science and
with the abundance of electronic data available today, it is important to be able to analyze
and interpret this data in real time. Although there has been much research on analyzing
the English language, the same cannot be said about other languages, in this case, the
Arabic language.

The Arabic language has received a lot of news coverage in recent years and it is clear
that tools for analyzing and translating Arabic texts are much needed and in high demand.
Unfortunately, the number of tools currently available is very small.

Part-of-speech (POS) tagging is the process of assigning grammatical parts-of-speech to words
in running text. For example in the sentence “the cat sat on the mat”, ‘the’ is a definite
article, ‘cat’ is a singular noun, ‘sat’ is a past tense verb, ‘on’ is a preposition, ‘the’ is a
definite article and ‘mat’ is a singular noun. The parts-of-speech are determined from the
context of the word within the sentence. It is usually possible to determine the POS tag of
a word from the POS tags of the two or three surrounding words.

Annotating raw text with grammatical tags has many uses in various areas of natural
language processing such as information retrieval, speech recognition and machine
translation. For example, it is important for machine translation to know that “chair” is a
verb and not a noun in the sentence “I chair the PUCC meeting once a week”.

Many techniques have been used to develop automatic POS taggers for different
languages. These techniques include rule-based taggers that use language rules and
statistical taggers that rely on statistical data gathered from a large amount of training
data. In recent years, taggers have been developed that use artificial intelligence
techniques such as machine learning techniques and neural networks.

For this project, we will develop an Arabic POS tagger using neural networks.