Arabic Text Classification Using Support Vector Machines

Tarek F. Gharib, Mena B. Habib, Zaki T. Fayed

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Text classification (TC) is the process of classifying documents into a predefined set of categories based on their content. Arabic language is highly inflectional and derivational language which makes text mining a complex task. In this paper we applied the Support Vector Machines (SVM) model in classifying Arabic text documents. The results compared with the other traditional classifiers Bayes classifier, K-Nearest Neighbor classifier and Rocchio classifier. Two experiments used to test the different classifiers. The first uses the training set as the test set, and the second uses Leave one testing method. Experimental results performed on a set of 1132 document show that Rocchio classifier gives better results when the size of feature set is small while SVM outperform the other classifiers when the size of the feature set is large enough. Classification rate exceeds 90% when using more than 4000 feature. Leave one method gives more realistic results over the use of training set as a test set.
Original languageEnglish
Pages (from-to)192-199
Number of pages8
JournalInternational Journal of Computers and Their Applications
Volume16
Issue number4
DOIs
Publication statusPublished - 1 Dec 2009
Externally publishedYes

Keywords

  • Text mining
  • text categorization
  • Arabic language
  • Support Vector Machines

Cite this