Volume 2 Number 3 (May 2013)
Home > Archive > 2013 > Volume 2 Number 3 (May 2013) >
IJCCE 2013 Vol.2(3): 309-313 ISSN: 2010-3743
DOI: 10.7763/IJCCE.2013.V2.194

A Very Fast Algorithm for Detecting Partially Plagiarized Documents Using FM-Index

Chang SeokOck, JongKyuSeo, Sung-Hwan Kim, and Hwan-Gue Cho
Abstract—Sequence alignment and fingerprinting are two of the most common methods for plagiarism detection because of their powerful performances. The disadvantage of using these methods is that if the size of the target document is increase, the string processing cost also increases. We use disk-based techniques and Genome assembly used in Next Generation Sequencing (NGS) to overcome this disadvantage. By combining the two methods, we propose a method for very-fast plagiarism detection in a large Korean corpus. The method is based on the Burrows-Wheeler Transform (BWT) and the FM-index for BWT search. For efficient detection, we extract initial consonants from the Korean corpus and build data structures for indexing the extracted initial consonants. We then split the suspected plagiarism query document into several pieces and perform the query search. Finally, we analyze the results of the search to detect the plagiarized sections. Our proposed method shows a maximum of 0.96 precision and 1.0 recall. In the future, we plan to investigate various ways of improving the search algorithm through optimization, and user-specific visualization methods.

Index Terms—Burrows-wheeler transform, FM-index, plagiarism detection.

The authors are with the Dept. of Computer Engineering, Pusan National University, Busan, South Korea (e-mail: csock@pusan.ac.kr, maniasjk@pusan.ac.kr, sunghwan@pusan.ac.kr, hgcho@pusan.ac.kr).

Cite:Chang SeokOck, JongKyuSeo, Sung-Hwan Kim, and Hwan-Gue Cho, "A Very Fast Algorithm for Detecting Partially Plagiarized Documents Using FM-Index," International Journal of Computer and Communication Engineering vol. 2, no. 3, pp. 309-313, 2013.

General Information

ISSN: 2010-3743 (Online)
Abbreviated Title: Int. J. Comput. Commun. Eng.
Frequency: Quarterly
Editor-in-Chief: Dr. Maode Ma
Abstracting/ Indexing: INSPEC, CNKI, Google Scholar, Crossref, EBSCO, ProQuest, and Electronic Journals Library
E-mail: ijcce@iap.org
  • Dec 29, 2021 News!

    IJCCE Vol. 10, No. 1 - Vol. 10, No. 2 have been indexed by Inspec, created by the Institution of Engineering and Tech.!   [Click]

  • Mar 17, 2022 News!

    IJCCE Vol.11, No.2 is published with online version!   [Click]

  • Dec 29, 2021 News!

    The dois of published papers in Vol. 9, No. 3 - Vol. 10, No. 4 have been validated by Crossref.

  • Dec 29, 2021 News!

    IJCCE Vol.11, No.1 is published with online version!   [Click]

  • Sep 16, 2021 News!

    IJCCE Vol.10, No.4 is published with online version!   [Click]

  • Read more>>