IJCCE 2018 Vol.7(3): 45-57 ISSN: 2010-3743
DOI: 10.17706/IJCCE.2018.7.3.45-57
DOI: 10.17706/IJCCE.2018.7.3.45-57
Contextual Feature Weighting Using Knowledge beyond the Repository Knowledge
Kazem Qazanfari, Abdou Youssef
Abstract—Bag of words, bigram, or more complex combinations of words are the most among general and widely used features in text classification. However, in almost all real-world text classification problems, the distribution of the available training dataset for each class often does not match the real distribution of the class concept, which reduces the accuracy of the classifiers. Let W(f) and R(f) be the discriminating power of feature f based on the world knowledge and the repository knowledge, respectively. In an ideal situation, W(f)= R(f) is desirable; however, in most situations, W(f) and R(f) are not equal and sometimes they are quite different, because the repository knowledge and the world knowledge do not have the same statistics about the discriminating power of feature f. In this paper, this phenomenon is called inadequacy of knowledge and we show how this phenomenon could reduce the performance of the text classifiers. To solve this issue, a novel feature weighting method is proposed which combines two bodies of knowledge, world knowledge and repository knowledge, using a particular transformation T. In this method, if both the world knowledge and the repository knowledge indicate a significantly high (resp., low) discriminating power of feature f, the weight of this feature is increased (resp., decreased); otherwise, the weight of the feature will be determined by a linear combination of the two weights. Experimental results show that the performance of classifiers like SVM, KNN and Bayes improves significantly if the proposed feature weighting method is applied on the contextual features such as bigram and unigram. It is shown also that pruning some words from the dataset using the proposed feature weighting method could improve the performance of the text classifier when the feature sets are created using Doc2vec.
Index Terms—Feature weighting, feature extraction, text classification, transfer learning.
Kazem Qazanfari and Abdou Youssef are with The George Washington University, Washington, DC, USA.
Index Terms—Feature weighting, feature extraction, text classification, transfer learning.
Kazem Qazanfari and Abdou Youssef are with The George Washington University, Washington, DC, USA.
Cite:Kazem Qazanfari, Abdou Youssef, "Contextual Feature Weighting Using Knowledge beyond the Repository Knowledge," International Journal of Computer and Communication Engineering vol. 7, no. 3, pp. 45-57, 2018.
General Information
ISSN: 2010-3743 (Online)
Abbreviated Title: Int. J. Comput. Commun. Eng.
Frequency: Quarterly
DOI: 10.17706/IJCCE
Editor-in-Chief: Dr. Maode Ma
Abstracting/ Indexing: INSPEC, CNKI, Google Scholar, Crossref, EBSCO, ProQuest, and Electronic Journals Library
E-mail: ijcce@iap.org
-
Dec 29, 2021 News!
IJCCE Vol. 10, No. 1 - Vol. 10, No. 2 have been indexed by Inspec, created by the Institution of Engineering and Tech.! [Click]
-
Mar 17, 2022 News!
IJCCE Vol.11, No.2 is published with online version! [Click]
-
Dec 29, 2021 News!
The dois of published papers in Vol. 9, No. 3 - Vol. 10, No. 4 have been validated by Crossref.
-
Dec 29, 2021 News!
IJCCE Vol.11, No.1 is published with online version! [Click]
-
Sep 16, 2021 News!
IJCCE Vol.10, No.4 is published with online version! [Click]
- Read more>>