Using Unlabeled Data to Improve Author Identification - Volume 2 Number 3 (May 2013) - IJCCE
Volume 2 Number 3 (May 2013)
Home > Archive > 2013 > Volume 2 Number 3 (May 2013) >
IJCCE 2013 Vol.2(3): 236-240 ISSN: 2010-3743
DOI: 10.7763/IJCCE.2013.V2.179

Using Unlabeled Data to Improve Author Identification

R. Guzmán Cabrera, J. R. Guzmán Sepúlveda, J. A. Gordillo Sosa , M. Torres Cisneros, and J. Herrera Cabral
Abstract—Authorship attribution may be considered as a text categorization problem. Text categorization requires a large number of training examples which are particularly difficult to obtain in the case of authorship attribution task. In this paper, we investigate the possibility of using Web-based text-mining methods for the identification of the author of a given poem. In particular, we propose a semi-supervised method that is specially suited to work with just few training examples in order to tackle the problem of the lack of data with the same writing style. The results obtained on poem categorization show that this method may significantly improve the classification accuracy and it is appropriate to handle the attribution of short documents.

Index Terms—Authorship attribution, text classification, machine learning.

R. Guzmán-Cabrera and M. Torres Cisneros are with the Grupo de NanoBioFotónica, DICIS, Universidad de Guanajuato, Salamanca, Gto., México (e-mail: guzmanc@ugto.mx, mtorres@ugto.mx).
J. R. Guzmán-Sepulveda is with the Departamento de Electrónica, UAM Reynosa-Rodhe, Universidad Autónoma de Tamaulipas, Carr. Reynosa-San Fernando S/N, Reynosa, Tamaulipas 88779, México (e-mail: jrafael_guzmans@yahoo.com.mx)
J. A. Gordillo-Sosa and Joel Herrera Cabral are with the Depto. de TIC. Univ. Tecnológica del Suroeste de Gto. Carr. Valle-Huanímaro km.1.2, Valle de Santiago, Gto. México (e-mail: antgor@antoniogordillo.com, jherrera@utsoe.edu.mx).
A. González Parada is with Universidad de Guanajuato, DICIS, Salamanca, Gto., México (e-mail: gonzaleza@ugto.mx).

Cite:R. Guzmán Cabrera, J. R. Guzmán Sepúlveda, J. A. Gordillo Sosa , M. Torres Cisneros, and J. Herrera Cabral, "Using Unlabeled Data to Improve Author Identification," International Journal of Computer and Communication Engineering vol. 2, no. 3, pp. 页码, 2013.

General Information

ISSN: 2010-3743
Frequency: Quarterly
Editor-in-Chief: Dr. Maode Ma
Abstracting/ Indexing: EI (INSPEC, IET), Google Scholar, Crossref, ProQuest, and Electronic Journals Library
E-mail: ijcce@iap.org
  • Aug 06, 2018 News!

    IJCCE Vol. 5, No. 6 - Vol. 6, No. 2 have been indexed by EI (Inspec) Inspec, created by the Institution of Engineering and Tech.!   [Click]

  • Jul 30, 2018 News!

     IJCCE Vol.7, No.3 is published with online version!   [Click]

  • May 30, 2018 News!

    IJCCE Vol.7, No.2 is published with online version!   [Click]

  • Nov 07, 2017 News!

    IJCCE Vol. 5, No. 5 has been indexed by EI (Inspec) Inspec, created by the Institution of Engineering and Tech.!   [Click]

  • Jun 28, 2017 News!

    IJCCE Vol. 5, No. 4 has been indexed by EI (Inspec) Inspec, created by the Institution of Engineering and Tech.!   [Click]

  • Read more>>