Volume 3 Number 3 (May 2014)
Home > Archive > 2014 > Volume 3 Number 3 (May 2014) >
IJCCE 2014 Vol.3(3): 184-188 ISSN: 2010-3743
DOI: 10.7763/IJCCE.2014.V3.316

Detection of Intra-Sentential Code-Switching Points Using Word Bigram and Unigram Frequency Count

Arianna Clarisse R. Bacatan, Bryan Loren D. Castillo, Marjorie Janelle T. Majan, Verlia F. Palermo, and Ria A. Sagum
Abstract—Detecting code-switching points is important, especially with the increasing globalism and multilingualism. However, this is a challenging task, but with the help of computers and technology, this can be done easily. In this paper, an approach to effectively detect code-switching points in a Tagalog-English text input, especially those with alternating English and Tagalog words, is presented. The approach uses the frequency counts of word bigrams and unigrams from language models which were trained from an existing and available corpus. For the testing, 3 test data categories were used – twitter posts, conversations, and short stories. The test data were composed of a total of 3088 English and Tagalog words. The results show that the system’s accuracy of properly identifying English and Tagalog words ranged from 81% - 95%, while the F-measure ranged from 72% - 95%. The research can be extended and improved using other n-grams, stemming, and searching algorithms.

Index Terms—Code-switching point detection, intra-sentential code-switching, word bigram, word unigram.

A. C. Bacatan, B. L. Castillo, M. J. Majan, and V. Palermo are with the University of Santo Tomas, Manila, Philippines (e-mail: ariannabacatan@gmail.com, bryan.loren.castillo@gmail.com, marjoriemajan@gmail.com, lian.palermo@gmail.com).
R. Sagum is with the Polytechnic University of the Philippines, Sta. Mesa, Manila and is also with the University of Santo Tomas, Manila (e-mail: riasagum31@yahoo.com).

Cite:Arianna Clarisse R. Bacatan, Bryan Loren D. Castillo, Marjorie Janelle T. Majan, Verlia F. Palermo, and Ria A. Sagum, "Detection of Intra-Sentential Code-Switching Points Using Word Bigram and Unigram Frequency Count," International Journal of Computer and Communication Engineering vol. 3, no. 3, pp. 184-188, 2014.

General Information

ISSN: 2010-3743 (Online)
Abbreviated Title: Int. J. Comput. Commun. Eng.
Frequency: Quarterly
Editor-in-Chief: Dr. Maode Ma
Abstracting/ Indexing: INSPEC, CNKI, Google Scholar, Crossref, EBSCO, ProQuest, and Electronic Journals Library
E-mail: ijcce@iap.org
  • Dec 29, 2021 News!

    IJCCE Vol. 10, No. 1 - Vol. 10, No. 2 have been indexed by Inspec, created by the Institution of Engineering and Tech.!   [Click]

  • Mar 17, 2022 News!

    IJCCE Vol.11, No.2 is published with online version!   [Click]

  • Dec 29, 2021 News!

    The dois of published papers in Vol. 9, No. 3 - Vol. 10, No. 4 have been validated by Crossref.

  • Dec 29, 2021 News!

    IJCCE Vol.11, No.1 is published with online version!   [Click]

  • Sep 16, 2021 News!

    IJCCE Vol.10, No.4 is published with online version!   [Click]

  • Read more>>