Text classification (TC) is the process of classifying documents into a predefined set of categories based on their content. Arabic language is highly inflectional and derivational language which makes text mining a complex task. In this paper we applied the Support Vector Machines (SVM) model in classifying Arabic text documents. The results compared with the other traditional classifiers Bayes classifier, K-Nearest Neighbor classifier and Rocchio classifier. Two experiments used to test the different classifiers. The first uses the training set as the test set, and the second uses Leave one testing method. Experimental results performed on a set of 1132 document show that Rocchio classifier gives better results when the size of feature set is small while SVM outperform the other classifiers when the size of the feature set is large enough. Classification rate exceeds 90% when using more than 4000 feature. Leave one method gives more realistic results over the use of training set as a test set.
|Number of pages||8|
|Journal||International Journal of Computers and Their Applications|
|Publication status||Published - 1 Dec 2009|
- Text mining
- text categorization
- Arabic language
- Support Vector Machines