Volume 7 Number 3 (Jul. 2018)
Home > Archive > 2018 > Volume 7 Number 3 (Jul. 2018) >
IJCCE 2018 Vol.7(3): 45-57 ISSN: 2010-3743
DOI: 10.17706/IJCCE.2018.7.3.45-57

Contextual Feature Weighting Using Knowledge beyond the Repository Knowledge

Kazem Qazanfari, Abdou Youssef
Abstract—Bag of words, bigram, or more complex combinations of words are the most among general and widely used features in text classification. However, in almost all real-world text classification problems, the distribution of the available training dataset for each class often does not match the real distribution of the class concept, which reduces the accuracy of the classifiers. Let W(f) and R(f) be the discriminating power of feature f based on the world knowledge and the repository knowledge, respectively. In an ideal situation, W(f)= R(f) is desirable; however, in most situations, W(f) and R(f) are not equal and sometimes they are quite different, because the repository knowledge and the world knowledge do not have the same statistics about the discriminating power of feature f. In this paper, this phenomenon is called inadequacy of knowledge and we show how this phenomenon could reduce the performance of the text classifiers. To solve this issue, a novel feature weighting method is proposed which combines two bodies of knowledge, world knowledge and repository knowledge, using a particular transformation T. In this method, if both the world knowledge and the repository knowledge indicate a significantly high (resp., low) discriminating power of feature f, the weight of this feature is increased (resp., decreased); otherwise, the weight of the feature will be determined by a linear combination of the two weights. Experimental results show that the performance of classifiers like SVM, KNN and Bayes improves significantly if the proposed feature weighting method is applied on the contextual features such as bigram and unigram. It is shown also that pruning some words from the dataset using the proposed feature weighting method could improve the performance of the text classifier when the feature sets are created using Doc2vec.

Index Terms—Feature weighting, feature extraction, text classification, transfer learning.

Kazem Qazanfari and Abdou Youssef are with The George Washington University, Washington, DC, USA.

Cite:Kazem Qazanfari, Abdou Youssef, "Contextual Feature Weighting Using Knowledge beyond the Repository Knowledge," International Journal of Computer and Communication Engineering vol. 7, no. 3, pp. 45-57, 2018.

General Information

ISSN: 2010-3743 (Online)
Abbreviated Title: Int. J. Comput. Commun. Eng.
Frequency: Quarterly
Editor-in-Chief: Dr. Maode Ma
Abstracting/ Indexing: EI (INSPEC, IET), Google Scholar, Crossref, EBSCO, ProQuest, and Electronic Journals Library
E-mail: ijcce@iap.org
  • Jun 20, 2019 News!

    IJCCE Vol. 6, No. 3 - Vol. 7, No. 3 have been indexed by EI (Inspec) Inspec, created by the Institution of Engineering and Tech.!   [Click]

  • Sep 10, 2019 News!

    The dois of published papers in Vol. 8, No. 1 - Vol. 8, No. 4 have been validated by Crossref.

  • Sep 05, 2019 News!

    IJCCE Vol.8, No.4 is published with online version!   [Click]

  • Jun 20, 2019 News!

    IJCCE Vol.8, No.3 is published with online version!   [Click]

  • Apr 02, 2019 News!

    IJCCE Vol.8, No.2 is published with online version!   [Click]

  • Read more>>