# A Novel Active Learning Method Using SVM for Text Classification

Author Biography:
• Mouloud Koudil received the Ph.D.degree in computer science from l Ecole nationale Supérieure d'Informatique (ESI), Algeria in 2002.He is currently a full time professor and rector of the same institution.
His research interests include wireless sensor networks, networks on chips, and hardware/software codesign.
E-mail:m_koudil@esi.dz

Mouldi Bedda received the Ph.D.degree in electrical engineering from the University Nancy 2, France in 1985.From 1985 to 2006, he worked with the University Badji Mokhtar Annaba, Algeria.He was the director of Automatic and Signals Laboratory from 2001 to 2006.Since 2006, he is a full professor at the college of engineering of Al Jouf university KSA.He supervised several Ph.D.students in speech processing, biomedical signals, hand written recognition and image processing.
His research interests include speech processing, biomedical signals, hand written recognition and image processing.
E-mail:mouldi_bedda@yahoo.fr

Noureddine Ghoggail received the State Engineer degree in electronics from the University of Batna, Algeria in 2000, and the Ph.D.degree in information and communication technologies in Department of Information Engineering and Computer Science, University of Trento, Italy.He is currently an assistant professor at University of Batna in Algeria.
His research interests include pattern recognition and evolutionary computation methodologies for remote sensing image analysis.
E-mail:ghoggalinour@gmail.com

• Corresponding author: Mohamed Goudjil received the M.Sc.degree in computer engineering from Boumerdes University, Algeria in 2008.He is currently a Ph.D.degree candidate in computer engineering at Ecole nationale Supérieure d'Informatique (ESI), Algeria.From 2005 to 2008, he was a researcher at Advanced Technologies&Resarchs Centre and a lecturer for seven years in different universities.
His research interests include text classification, arabic language processing and machine learning.
E-mail:m_goudjil@esi.dz (Corresponding author)
ORCID iD:0000-0003-1712-7617
• Accepted: 2015-06-03
• Published Online: 2018-07-25
• Support vector machines (SVMs) are a popular class of supervised learning algorithms, and are particularly applicable to large and high-dimensional classification problems. Like most machine learning methods for data classification and information retrieval, they require manually labeled data samples in the training stage. However, manual labeling is a time consuming and errorprone task. One possible solution to this issue is to exploit the large number of unlabeled samples that are easily accessible via the internet. This paper presents a novel active learning method for text categorization. The main objective of active learning is to reduce the labeling effort, without compromising the accuracy of classification, by intelligently selecting which samples should be labeled. The proposed method selects a batch of informative samples using the posterior probabilities provided by a set of multi-class SVM classifiers, and these samples are then manually labeled by an expert. Experimental results indicate that the proposed active learning method significantly reduces the labeling effort, while simultaneously enhancing the classification accuracy.
Corresponding author: Mohamed Goudjil received the M.Sc.degree in computer engineering from Boumerdes University, Algeria in 2008.He is currently a Ph.D.degree candidate in computer engineering at Ecole nationale Supérieure d'Informatique (ESI), Algeria.From 2005 to 2008, he was a researcher at Advanced Technologies&Resarchs Centre and a lecturer for seven years in different universities.
His research interests include text classification, arabic language processing and machine learning.
E-mail:m_goudjil@esi.dz (Corresponding author)
ORCID iD:0000-0003-1712-7617

Abstract: Support vector machines (SVMs) are a popular class of supervised learning algorithms, and are particularly applicable to large and high-dimensional classification problems. Like most machine learning methods for data classification and information retrieval, they require manually labeled data samples in the training stage. However, manual labeling is a time consuming and errorprone task. One possible solution to this issue is to exploit the large number of unlabeled samples that are easily accessible via the internet. This paper presents a novel active learning method for text categorization. The main objective of active learning is to reduce the labeling effort, without compromising the accuracy of classification, by intelligently selecting which samples should be labeled. The proposed method selects a batch of informative samples using the posterior probabilities provided by a set of multi-class SVM classifiers, and these samples are then manually labeled by an expert. Experimental results indicate that the proposed active learning method significantly reduces the labeling effort, while simultaneously enhancing the classification accuracy.

