Arabic Text Classification Using Support Vector Machines

Tarek F. Gharib; Mena B. Habib; Zaki T. Fayed

doi:10.7603/s40601-016-0016-9

Arabic Text Classification Using Support Vector Machines

Tarek F. Gharib, Mena B. Habib, Zaki T. Fayed

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Text classification (TC) is the process of classifying documents into a predefined set of categories based on their content. Arabic language is highly inflectional and derivational language which makes text mining a complex task. In this paper we applied the Support Vector Machines (SVM) model in classifying Arabic text documents. The results compared with the other traditional classifiers Bayes classifier, K-Nearest Neighbor classifier and Rocchio classifier. Two experiments used to test the different classifiers. The first uses the training set as the test set, and the second uses Leave one testing method. Experimental results performed on a set of 1132 document show that Rocchio classifier gives better results when the size of feature set is small while SVM outperform the other classifiers when the size of the feature set is large enough. Classification rate exceeds 90% when using more than 4000 feature. Leave one method gives more realistic results over the use of training set as a test set.

Original language	English
Pages (from-to)	192-199
Number of pages	8
Journal	International Journal of Computers and Their Applications
Volume	16
Issue number	4
DOIs	https://doi.org/10.7603/s40601-016-0016-9
Publication status	Published - 1 Dec 2009
Externally published	Yes

Keywords

Text mining
text categorization
Arabic language
Support Vector Machines

Access to Document

10.7603/s40601-016-0016-9Licence: CC BY

Cite this

@article{88e52388a25140eab08b3116e290a204,

title = "Arabic Text Classification Using Support Vector Machines",

abstract = "Text classification (TC) is the process of classifying documents into a predefined set of categories based on their content. Arabic language is highly inflectional and derivational language which makes text mining a complex task. In this paper we applied the Support Vector Machines (SVM) model in classifying Arabic text documents. The results compared with the other traditional classifiers Bayes classifier, K-Nearest Neighbor classifier and Rocchio classifier. Two experiments used to test the different classifiers. The first uses the training set as the test set, and the second uses Leave one testing method. Experimental results performed on a set of 1132 document show that Rocchio classifier gives better results when the size of feature set is small while SVM outperform the other classifiers when the size of the feature set is large enough. Classification rate exceeds 90% when using more than 4000 feature. Leave one method gives more realistic results over the use of training set as a test set.",

keywords = "Text mining, text categorization, Arabic language, Support Vector Machines",

author = "Gharib, {Tarek F.} and Habib, {Mena B.} and Fayed, {Zaki T.}",

note = "http://eprints.eemcs.utwente.nl/19331/",

year = "2009",

month = dec,

day = "1",

doi = "10.7603/s40601-016-0016-9",

language = "English",

volume = "16",

pages = "192--199",

journal = "International Journal of Computers and Their Applications",

issn = "1076-5204",

publisher = "International Society for Computers and Their Applications",

number = "4",

}

TY - JOUR

T1 - Arabic Text Classification Using Support Vector Machines

AU - Gharib, Tarek F.

AU - Habib, Mena B.

AU - Fayed, Zaki T.

N1 - http://eprints.eemcs.utwente.nl/19331/

PY - 2009/12/1

Y1 - 2009/12/1

N2 - Text classification (TC) is the process of classifying documents into a predefined set of categories based on their content. Arabic language is highly inflectional and derivational language which makes text mining a complex task. In this paper we applied the Support Vector Machines (SVM) model in classifying Arabic text documents. The results compared with the other traditional classifiers Bayes classifier, K-Nearest Neighbor classifier and Rocchio classifier. Two experiments used to test the different classifiers. The first uses the training set as the test set, and the second uses Leave one testing method. Experimental results performed on a set of 1132 document show that Rocchio classifier gives better results when the size of feature set is small while SVM outperform the other classifiers when the size of the feature set is large enough. Classification rate exceeds 90% when using more than 4000 feature. Leave one method gives more realistic results over the use of training set as a test set.

AB - Text classification (TC) is the process of classifying documents into a predefined set of categories based on their content. Arabic language is highly inflectional and derivational language which makes text mining a complex task. In this paper we applied the Support Vector Machines (SVM) model in classifying Arabic text documents. The results compared with the other traditional classifiers Bayes classifier, K-Nearest Neighbor classifier and Rocchio classifier. Two experiments used to test the different classifiers. The first uses the training set as the test set, and the second uses Leave one testing method. Experimental results performed on a set of 1132 document show that Rocchio classifier gives better results when the size of feature set is small while SVM outperform the other classifiers when the size of the feature set is large enough. Classification rate exceeds 90% when using more than 4000 feature. Leave one method gives more realistic results over the use of training set as a test set.

KW - Text mining

KW - text categorization

KW - Arabic language

KW - Support Vector Machines

U2 - 10.7603/s40601-016-0016-9

DO - 10.7603/s40601-016-0016-9

M3 - Article

SN - 1076-5204

VL - 16

SP - 192

EP - 199

JO - International Journal of Computers and Their Applications

JF - International Journal of Computers and Their Applications

IS - 4

ER -