Project Description
Natural
language processing is a vibrant and active research area in computer science
and
with the abundance of electronic data available today, it is important to be
able to analyze
and interpret this data in real time. Although there has been much research
on analyzing
the English language, the same cannot be said about other languages, in this
case, the
Arabic language.
The Arabic language has received a lot of news coverage in recent years and
it is clear
that tools for analyzing and translating Arabic texts are much needed and in
high demand.
Unfortunately, the number of tools currently available is very small.
Part-of-speech (POS) tagging is the process of assigning grammatical parts-of-speech
to words
in running text. For example in the sentence “the cat sat on the mat”,
‘the’ is a definite
article, ‘cat’ is a singular noun, ‘sat’ is a past tense
verb, ‘on’ is a preposition, ‘the’ is a
definite article and ‘mat’ is a singular noun. The parts-of-speech
are determined from the
context of the word within the sentence. It is usually possible to determine
the POS tag of
a word from the POS tags of the two or three surrounding words.
Annotating raw text with grammatical tags has many uses in various areas of
natural
language processing such as information retrieval, speech recognition and machine
translation. For example, it is important for machine translation to know that
“chair” is a
verb and not a noun in the sentence “I chair the PUCC meeting once a week”.
Many techniques have been used to develop automatic POS taggers for different
languages. These techniques include rule-based taggers that use language rules
and
statistical taggers that rely on statistical data gathered from a large amount
of training
data. In recent years, taggers have been developed that use artificial intelligence
techniques such as machine learning techniques and neural networks.
For this project, we will develop an Arabic POS tagger using neural networks.