Volume 15 Number 3
June 2018
Article Contents
Mohamed Goudjil, Mouloud Koudil, Mouldi Bedda and Noureddine Ghoggali. A Novel Active Learning Method Using SVM for Text Classification. International Journal of Automation and Computing, vol. 15, no. 3, pp. 290-298, 2018. doi: 10.1007/s11633-015-0912-z
Cite as: Mohamed Goudjil, Mouloud Koudil, Mouldi Bedda and Noureddine Ghoggali. A Novel Active Learning Method Using SVM for Text Classification. International Journal of Automation and Computing, vol. 15, no. 3, pp. 290-298, 2018.

# A Novel Active Learning Method Using SVM for Text Classification

Author Biography:
• Mouloud Koudil received the Ph.D.degree in computer science from l Ecole nationale Supérieure d'Informatique (ESI), Algeria in 2002.He is currently a full time professor and rector of the same institution.
His research interests include wireless sensor networks, networks on chips, and hardware/software codesign.
E-mail:m_koudil@esi.dz

Mouldi Bedda received the Ph.D.degree in electrical engineering from the University Nancy 2, France in 1985.From 1985 to 2006, he worked with the University Badji Mokhtar Annaba, Algeria.He was the director of Automatic and Signals Laboratory from 2001 to 2006.Since 2006, he is a full professor at the college of engineering of Al Jouf university KSA.He supervised several Ph.D.students in speech processing, biomedical signals, hand written recognition and image processing.
His research interests include speech processing, biomedical signals, hand written recognition and image processing.
E-mail:mouldi_bedda@yahoo.fr

Noureddine Ghoggail received the State Engineer degree in electronics from the University of Batna, Algeria in 2000, and the Ph.D.degree in information and communication technologies in Department of Information Engineering and Computer Science, University of Trento, Italy.He is currently an assistant professor at University of Batna in Algeria.
His research interests include pattern recognition and evolutionary computation methodologies for remote sensing image analysis.
E-mail:ghoggalinour@gmail.com

• Corresponding author: Mohamed Goudjil received the M.Sc.degree in computer engineering from Boumerdes University, Algeria in 2008.He is currently a Ph.D.degree candidate in computer engineering at Ecole nationale Supérieure d'Informatique (ESI), Algeria.From 2005 to 2008, he was a researcher at Advanced Technologies&Resarchs Centre and a lecturer for seven years in different universities.
His research interests include text classification, arabic language processing and machine learning.
E-mail:m_goudjil@esi.dz (Corresponding author)
ORCID iD:0000-0003-1712-7617
• Accepted: 2015-06-03
• Published Online: 2018-07-25
• Support vector machines (SVMs) are a popular class of supervised learning algorithms, and are particularly applicable to large and high-dimensional classification problems. Like most machine learning methods for data classification and information retrieval, they require manually labeled data samples in the training stage. However, manual labeling is a time consuming and errorprone task. One possible solution to this issue is to exploit the large number of unlabeled samples that are easily accessible via the internet. This paper presents a novel active learning method for text categorization. The main objective of active learning is to reduce the labeling effort, without compromising the accuracy of classification, by intelligently selecting which samples should be labeled. The proposed method selects a batch of informative samples using the posterior probabilities provided by a set of multi-class SVM classifiers, and these samples are then manually labeled by an expert. Experimental results indicate that the proposed active learning method significantly reduces the labeling effort, while simultaneously enhancing the classification accuracy.
•  [1] F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002.  doi: 10.1145/505282.505283 [2] B. Settles. Active Learning Literature Survey. Computer Sciences Technical Report, 1648, University of Wisconsinadison, USA, 2010. [3] D. D. Lewis, W. A. Gale. A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Springer-Verlag, New York, USA, 1994. [4] C. Persello, L. Bruzzone. Active and semisupervised learning for the classification of remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 11, pp. 6937-6956, 2014.  doi: 10.1109/TGRS.2014.2305805 [5] G. Chen, T. J. Wang, L. Y. Gong, P. Herrera. Multi-class support vector machine active learning for music annotation. International Journal of Innovative Computing, Information and Control, vol. 6, no. 3, pp. 921-930, 2010. [6] S. Tong, D. Koller. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, vol. 2, pp. 45-66, 2002. [7] S. A. A. Balamurugan, R. Rajaram. Effective and efficient feature selection for large-scale data using Bayestheorem. International Journal of Automation and Computing, vol. 6, no. 1, pp. 62-71, 2009.  doi: 10.1007/s11633-009-0062-2 [8] J. A. Mangai, V. S. Kumar, S. A. alias Balamurugan. A novel feature selection framework for automatic web page classification. International Journal of Automation and Computing, vol. 9, no. 4, pp. 442-448, 2012.  doi: 10.1007/s11633-012-0665-x [9] I. Hmeidi, B. Hawashin, E. El-Qawasmeh. Performance of KNN and SVM classifiers on full word Arabic articles. Advanced Engineering Informatics, vol. 22, no. 1, pp. 106-111, 2008.  doi: 10.1016/j.aei.2007.12.001 [10] B. Trstenjak, S. Mikac, D. Donko. KNN with TF-IDF based framework for text categorization. Procedia Engineering, vol. 69, pp. 1356-1364, 2014.  doi: 10.1016/j.proeng.2014.03.129 [11] S. Gazzah, N. E. B. Amara. Neural networks and support vector machines classifiers for writer identification using arabic script. The International Arab Journal of Information Technology, vol. 5, no. 1, pp. 92-101, 2008. [12] W. Lam, Y. Q. Han. Automatic textual document categorization based on generalized instance sets and a metamodel. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 5, pp. 628-633, 2003.  doi: 10.1109/TPAMI.2003.1195997 [13] Q. Shen, R. Jensen. Rough sets, their extensions and applications. International Journal of Automation and Computing, vol. 4, no. 3, pp. 217-228, 2007.  doi: 10.1007/s11633-007-0217-y [14] L. Messikh, M. Bedda, N. Doghmane. Binary phoneme classification using fixed and adaptive segment-based neural networkapproach. The International Arab Journal of Information Technology, vol. 8, no. 1, pp. 48-51, 2011. [15] T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning Chemnitz, Springer, Chemnitz, Germany, pp. 137-142, 1998. [16] Y. M. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval, vol. 1, no. 1-2, pp. 69-90, 1999. [17] T. Luo, K. Kramer, S. Samson, A. Remsen, D. B. Goldgof, L. O. Hall, T. Hopkins. Active learning to recognize multiple types of plankton. In Proceedings of the 17th International Conference on Pattern Recognition, IEEE, Cambridge, USA, vol. 3, pp. 478-481, 2004. [18] M. Goudjil, M. Koudil, N. Hammami, M. Bedda, M. Alruily. Arabic text categorization using SVM active learning technique: An overview. In Proceedings of World Congress on Computer and Information Technology, IEEE, Sousse, Tunisia, 2013. [19] P. Mitra, C. A. Murthy, S. K. Pal. A probabilistic active support vector learning algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 3, pp. 413-418, 2004.  doi: 10.1109/TPAMI.2004.1262340 [20] G. Schohn, D. Cohn. Less is more: Active learning with support vector machines. In Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, USA, pp. 839-846, 2000. [21] K. Brinker. Incorporating diversity in active learning with support vector machines. In Proceedings of the 20th International Conference on Machine Learning, ACM, Washington, USA, pp. 59-66, 2003. [22] Y. Baram, R. El-Yaniv, K. Luz. Online choice of active learning algorithms. Journal of Machine Learning Research, vol. 5, pp. 255-291, 2004. [23] N. Roy, A. McCallum. Toward optimal active learning through monte carlo estimation of error reduction. In Proceedings of the 18th International Conference on Machine Learning, Bellevue, USA, pp. 441-448, 2001. [24] A. K. McCallumzy, K. Nigamy. Employing EM and poolbased active learning for text classification. In Proceedings of the 15th International Conference on Machine Learning, Madison, USA, pp. 350-358, 1998. [25] S. C. H. Hoi, R. Jin, M. R. Lyu. Large-scale text categorization by batch mode active learning. In Proceedings of the 15th International Conference on World Wide Web, ACM, New York, USA, pp. 633-642, 2006. [26] M. Goudjil, M. Bedda, M. Koudil, N. Ghoggali. Using active learning in text classification of quranic sciences. In Proceedings of International Conference on Advances in Information Technology for the Holy Quran and its Science, Taibah University, Madinah, Saudi Arabia, pp. 209-213, 2013. [27] M. Goudjil. Text Categorization using reduced training set. Research Journal of Applied Sciences, Engineering and Technology. vol. 10, no. 12, pp. 1363-1369, 2015.  doi: 10.19026/rjaset.10.1835 [28] V. N. Vapnik. Statistical Learning Theory, New York, USA:Wiley, 1998. [29] N. Ghoggali, F. Melgani, Y. Bazi. A multiobjective genetic SVM approach for classification problems with limited training samples. IEEE Transactions on Geoscience and Remote Sensing, vol. 47, no. 6, pp. 1707-1718, 2009.  doi: 10.1109/TGRS.2008.2007128 [30] T. Hastie, R. Tibshirani. Classification by pairwise coupling. The Annals of Statistics, vol. 26, no. 2, pp. 451-471, 1998.  doi: 10.1214/aos/1028144844 [31] K. B. Duan, S. S. Keerthi. Which is the best multiclass SVM method? An empirical study. In Proceedings of the 6th International Workshop, MCS 2005, California, USA, pp. 278-285, 2005. [32] T. F. Wu, C. J. Lin, R. C. Weng. Probability estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research, vol. 5, pp. 975-1005, 2003. [33] C. C. Chang, C. J. Lin. LIBSVM:A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, Article number 27, 2011. [34] M. K. Li, I. K. Sethi. Confidence-based active learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1251-1261, 2006.  doi: 10.1109/TPAMI.2006.156 [35] B. Demir, C. Persello, L. Bruzzone. Batch-mode activelearning methods for the interactive classification of remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 3, pp. 1014-1031, 2011.  doi: 10.1109/TGRS.2010.2072929 [36] M. Sassano. An empirical study of active learning with support vector machines for Japanese word segmentation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, USA, pp. 505-512, 2002. [37] S. C. H, Hoi, R. Jin, M. R. Lyu. Batch mode active learning with applications to text categorization and image retrieval. IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1233-1248, 2009.  doi: 10.1109/TKDE.2009.60 [38] A. Cardoso-Cachopo, A. L. Oliveira. Semi-supervised single-label text categorization using centroid-based classifiers. In Proceedings of the ACM Symposium on Applied Computing, ACM, Seoul, Korea, pp. 844-851, 2007. [39] K. S. Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, vol. 60, no. 5, pp. 493-502, 2004.  doi: 10.1108/00220410410560573 [40] G. Salton, C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, vol. 24, no. 5, pp. 513-523, 1988.
•  [1] Harita Reddy, Namratha Raj, Manali Gala, Annappa Basava. Text-mining-based Fake News Detection Using Ensemble Methods . International Journal of Automation and Computing, 2020, 17(2): 210-221.  doi: 10.1007/s11633-019-1216-5 [2] Zhan Li, Sheng-Ri Xue, Xing-Hu Yu, Hui-Jun Gao. Controller Optimization for Multirate Systems Based on Reinforcement Learning . International Journal of Automation and Computing, 2020, 17(3): 417-427.  doi: 10.1007/s11633-020-1229-0 [3] Maryam Aljanabi, Mohammad Shkoukani, Mohammad Hijjawi. Ground-level Ozone Prediction Using Machine Learning Techniques: A Case Study in Amman, Jordan . International Journal of Automation and Computing, 2020, 17(5): 667-677.  doi: 10.1007/s11633-020-1233-4 [4] Ziheng Chen, Hongshik Ahn. Item Response Theory Based Ensemble in Machine Learning . International Journal of Automation and Computing, 2020, 17(5): 621-636.  doi: 10.1007/s11633-020-1239-y [5] Jiao Yin, Jinli Cao, Siuly Siuly, Hua Wang. An Integrated MCI Detection Framework Based on Spectral-temporal Analysis . International Journal of Automation and Computing, 2019, 16(6): 786-799.  doi: 10.1007/s11633-019-1197-4 [6] G. Dharanibai, Anupama Chandrasekharan, Zachariah C. Alex. Automated Segmentation of Left Ventricle Using Local and Global Intensity Based Active Contour and Dynamic Programming . International Journal of Automation and Computing, 2018, 15(6): 673-688.  doi: 10.1007/s11633-018-1112-4 [7] Pavla Bromová,  Petr Škoda,  Jaroslav Vážný. Classification of Spectra of Emission Line Stars Using Machine Learning Techniques . International Journal of Automation and Computing, 2014, 11(3): 265-273.  doi: 10.1007/s11633-014-0789-2 [8] Nassim Laouti,  Sami Othman,  Mazen Alamir,  Nida Sheibat-Othman. Combination of Model-based Observer and Support Vector Machines for Fault Detection of Wind Turbines . International Journal of Automation and Computing, 2014, 11(3): 274-287.  doi: 10.1007/s11633-014-0790-9 [9] Sheng-Ye Yan, Xin-Xing Xu, Qing-Shan Liu. Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance . International Journal of Automation and Computing, 2014, 11(5): 480-488.  doi: 10.1007/s11633-014-0833-2 [10] Li-Jie Zhao, Tian-You Chai, De-Cheng Yuan. Selective Ensemble Extreme Learning Machine Modeling of Effluent Quality in Wastewater Treatment Plants . International Journal of Automation and Computing, 2012, 9(6): 627-633 .  doi: 10.1007/s11633-012-0688-3 [11] Lei Liu, Feng Yang, Peng Zhang, Jing-Yi Wu, Liang Hu. SVM-based Ontology Matching Approach . International Journal of Automation and Computing, 2012, 9(3): 306-314.  doi: 10.1007/s11633-012-0649-x [12] Zhao-Pin Su, Jian-Guo Jiang, Chang-Yong Liang, Guo-Fu Zhang. Path Selection in Disaster Response Management Based on Q-learning . International Journal of Automation and Computing, 2011, 8(1): 100-106.  doi: 10.1007/s11633-010-0560-2 [13] Hussein Al-Bahadili, Shakir M. Hussain. A Bit-level Text Compression Scheme Based on the ACW Algorithm . International Journal of Automation and Computing, 2010, 7(1): 123-131.  doi: 10.1007/s11633-010-0123-6 [14] Xiao-Yuan Luo, Mei-Jie Shang, Cai-Lian Chen, Xin-Ping Guan. Guaranteed Cost Active Fault-tolerant Control of Networked Control System with Packet Dropout and Transmission Delay . International Journal of Automation and Computing, 2010, 7(4): 509-515.  doi: 10.1007/s11633-010-0534-4 [15] Hai-Ping Du, Nong Zhang. Robust Active Suspension Design Subject to Vehicle Inertial Parameter Variations . International Journal of Automation and Computing, 2010, 7(4): 419-427.  doi: 10.1007/s11633-010-0523-7 [16] Zhen Luo,  Feng Tian,  Xiao-Ping Sun. Measuring Acoustic Wave Transit Time in Furnace Based on Active Acoustic Source Signal . International Journal of Automation and Computing, 2007, 4(1): 52-55.  doi: 10.1007/s11633-007-0052-1 [17] Xun Chen,  Thitikorn Limchimchol. Monitoring Grinding Wheel Redress-life Using Support Vector Machines . International Journal of Automation and Computing, 2006, 3(1): 56-62.  doi: 10.1007/s11633-006-0056-2 [18] You-Qing Wang, Dong-Hua Zhou, Li-Heng Liu. Robust and Active Fault-tolerant Control for a Class of Nonlinear Uncertain Systems . International Journal of Automation and Computing, 2006, 3(3): 309-313.  doi: 10.1007/s11633-006-0309-0 [19] Mohammed Alamgir Hossain, Mohammad Osman Tokhi. Real-time Design Constraints in Implementing Active Vibration Control Algorithms . International Journal of Automation and Computing, 2006, 3(3): 252-262.  doi: 10.1007/s11633-006-0252-0 [20] L. Meng,  Q. H. Wu. Fast Training of Support Vector Machines Using Error-Center-Based Optimization . International Journal of Automation and Computing, 2005, 2(1): 6-12.  doi: 10.1007/s11633-005-0006-4
###### 通讯作者: 陈斌, bchen63@163.com
• 1.

沈阳化工大学材料科学与工程学院 沈阳 110142

Figures (3)  / Tables (7)

## A Novel Active Learning Method Using SVM for Text Classification

• ###### Corresponding author:Mohamed Goudjil received the M.Sc.degree in computer engineering from Boumerdes University, Algeria in 2008.He is currently a Ph.D.degree candidate in computer engineering at Ecole nationale Supérieure d'Informatique (ESI), Algeria.From 2005 to 2008, he was a researcher at Advanced Technologies&Resarchs Centre and a lecturer for seven years in different universities.    His research interests include text classification, arabic language processing and machine learning.    E-mail:m_goudjil@esi.dz (Corresponding author)    ORCID iD:0000-0003-1712-7617

Abstract: Support vector machines (SVMs) are a popular class of supervised learning algorithms, and are particularly applicable to large and high-dimensional classification problems. Like most machine learning methods for data classification and information retrieval, they require manually labeled data samples in the training stage. However, manual labeling is a time consuming and errorprone task. One possible solution to this issue is to exploit the large number of unlabeled samples that are easily accessible via the internet. This paper presents a novel active learning method for text categorization. The main objective of active learning is to reduce the labeling effort, without compromising the accuracy of classification, by intelligently selecting which samples should be labeled. The proposed method selects a batch of informative samples using the posterior probabilities provided by a set of multi-class SVM classifiers, and these samples are then manually labeled by an expert. Experimental results indicate that the proposed active learning method significantly reduces the labeling effort, while simultaneously enhancing the classification accuracy.

Mohamed Goudjil, Mouloud Koudil, Mouldi Bedda and Noureddine Ghoggali. A Novel Active Learning Method Using SVM for Text Classification. International Journal of Automation and Computing, vol. 15, no. 3, pp. 290-298, 2018. doi: 10.1007/s11633-015-0912-z
 Citation: Mohamed Goudjil, Mouloud Koudil, Mouldi Bedda and Noureddine Ghoggali. A Novel Active Learning Method Using SVM for Text Classification. International Journal of Automation and Computing, vol. 15, no. 3, pp. 290-298, 2018.
Reference (40)

/