Typo handling in searching of Quran verse based on phonetic similarities
DOI:
https://doi.org/10.26594/register.v6i2.2065Keywords:
autocomplete, Damerau–Levenshtein distance, phonetic similarity, Quran, typographical errorAbstract
The Quran search system is a search system that was built to make it easier for Indonesians to find a verse with text by Indonesian pronunciation, this is a solution for users who have difficulty writing or typing Arabic characters. Quran search system with phonetic similarity can make it easier for Indonesian Muslims to find a particular verse. Lafzi was one of the systems that developed the search, then Lafzi was further developed under the name Lafzi+. The Lafzi+ system can handle searches with typo queries but there are still fewer variations regarding typing error types. In this research Lafzi++, an improvement from previous development to handle typographical error types was carried out by applying typo correction using the autocomplete method to correct incorrect queries and Damerau Levenshtein distance to calculate the edit distance, so that the system can provide query suggestions when a user mistypes a search, either in the form of substitution, insertion, deletion, or transposition. Users can also search easily because they use Latin characters according to pronunciation in Indonesian. Based on the evaluation results it is known that the system can be better developed, this can be seen from the accuracy value in each query that is tested can surpass the accuracy of the previous system, by getting the highest recall of 96.20% and the highest Mean Average Precision (MAP) reaching 90.69%. The Lafzi++ system can improve the previous system.References
M. M. Hamzah, "Peran dan Pengaruh Fatwa Mui dalam Arus Transformasi Sosial Budaya di Indonesia," Millah: Jurnal Studi Agama, vol. 12, no. 1, pp. 127-154, 2017.
I. Humaini, T. Yusnitasari, L. Wulandari, D. Ikasari and H. Dutt, "Informatian Retrieval of Indonesian Translated version of Al Quran and Hadith Bukhori Muslim," in International Conference on Sustainable Energy, Electronics, and Computing Systems (SEEMS), Greater Noida, India, 2018, 2018.
J. .. Pardeshi and B. Nandwalkar, "Survey on: Rule Based Phonetic Search for Slavic Surnames," Int.J.Computer Technology & Applications, vol. 7, no. 1, pp. 65-68, 2016.
M. A. Istiadi, "Sistem Pencarian Ayat Al-Quran Berbasis Kemiripan Fonetis," Institut Pertanian Bogor, Bogor, 2012.
W. Satriady, M. A. Bijaksana and K. M. Lhaksmana, "Quranic Latin Query Correction as a Search Suggestion," Procedia Computer Science, vol. 157, pp. 183-190, 2019.
V. C. Mawardi, R. Rudy and D. S. Naga, "Fast and Accurate Spelling Correction Using Trie and Damerau-levenshtein Distance Bigram," TELKOMNIKA, vol. 16, no. 2, pp. 827-833, 2018.
T. N. Maghfira, I. Cholissodin and A. W. Widodo, "Deteksi Kesalahan Ejaan dan Penentuan Rekomendasi Koreksi Kata yang Tepat Pada Dokumen Jurnal JTIIK Menggunakan Dictionary Lookup dan Damerau-Levenshtein Distance," Jurnal Pengembangan Teknlogi Informasi dan Ilmu Komputer (J-PTIIK), vol. 1 , no. 6, pp. 498-506, 2017.
G. R. Bunt, iMuslims: Rewiring the House of Islam, Chapel Hill, North Carolina, United States: University of North Carolina Press, 2009.
J.-F. Yeh, L.-T. Chang, C.-Y. Liu and T.-W. Hsu, "Chinese Spelling Check based on N-gram and String Matching Algorithm," in Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications, Taipei, Taiwan, 2017.
M. M. Hossain, M. F. Labib, A. S. Rifat, A. K. Das and M. Mukta, "Auto-correction of English to Bengali Transliteration System using Levenshtein Distance," in 7th International Conference on Smart Computing & Communications (ICSCC), Sarawak, Malaysia, 2019.
K. Balabaeva, A. Funkner and S. Kovalchuk, "Automated Spelling Correction. for Clinical Text Mining in Russian," in Medical Informatic Europe Conference Conference, 2020.
S. J. Putra, M. N. Gunawan and A. Suryatno, "Tokenization and N-Gram for Indexing Indonesian Translation of the Quran," in 6th International Conference on Information and Communication Technology (ICoICT), Bandung, 2018.
B. C. Gencosman, H. C. Ozmutlu and S. Ozmutlu, "Character n-gram application for automatic new topic identification," Information Processing & Management, vol. 50, no. 6, pp. 821-856, 2014.
K. Srinivasa and B. N. S. Devi, "GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents," Journal of The Institution of Engineers (India): Series B , vol. 98, p. 467–476, 2017.
P. Náther, "N-gram based Text Categorization," Comenius University, Bratislava, Slovakia, 2005.
N. Nizamkari, "Mining typos in text," in IEEE 7th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, 2016.
S. Thaiprayoon, A. Kongthon and C. Haruechaiyasak, "ThaiQCor 2.0: Thai Query Correction via Soundex and Word Approximation," in 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA), Krabi, 2018.
M. Castelli, R. Dondi, G. Mauri and I. Zoppis, "Comparing incomplete sequences via longest common subsequence," Theoretical Computer Science, vol. 796, pp. 272-285, 2019.
R. Khan, M. Ahmad and M. Zakarya, "Longest Common Subsequence Based Algorithm for Measuring Similarity Between Time Series: A New Approach," World Applied Sciences Journal, vol. 24, no. 9, pp. 1192-1198, 2013.
G. Kawade, S. Sahu, S. Upadhye, N. Korde and M. Motghare, "An analysis on computation of longest common subsequence algorithm," in International Conference on Intelligent Sustainable Systems (ICISS), Palladam, 2017.
C. Blum and M. J. Blesa, "Hybrid techniques based on solving reduced problem instances for a longest common subsequence problem," Applied Soft Computing, vol. 62, pp. 15-28, 2018.
M. R. Islam, C. M. K. Saifullah, Z. T. Asha and R. Ahamed, "Chemical reaction optimization for solving longest common subsequence problem for multiple string," Soft Comput, vol. 23, p. 5485–5509, 2019.
R. Gabrys, E. Yaakobi and O. Milenkovic, "Codes in the Damerau Distance for Deletion and Adjacent Transposition Correction," IEEE Transactions on Information Theory, vol. 64, no. 4, pp. 2550-2570, 2018.
C. Zhao and S. Sahni, "Efficient computation of the Damerau-Levenshtein distance between biological sequences," in IEEE 7th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), Orlando, FL, 2017.
J. Kysela, "A Comparison of Text String Similarity Algorithms for POI Name Harmonisation," in Lecture Notes in Computer Science, Cham, Springer, 2018.
A. Anton, "Romanian Biometric Word List for Public Key Fingerprint Validation," in IEEE 12th International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, 2018.
A. F. A. Nwesri, "Effective Retrieval Techniques for Arabic Text," RMIT University, Melbourne, Victoria, Australia, 2008.
A. Samuelsson, "Weighting Edit Distance to Improve Spelling Correction in Music Entity Search," KTH Royal Institute of Technology, Stockholm, 2017.
Downloads
Published
How to Cite
Issue
Section
License
Please find the rights and licenses in Register: Jurnal Ilmiah Teknologi Sistem Informasi. By submitting the article/manuscript of the article, the author(s) agree with this policy. No specific document sign-off is required.
1. License
The non-commercial use of the article will be governed by the Creative Commons Attribution license as currently displayed on Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
2. Author(s)' Warranties
The author warrants that the article is original, written by stated author(s), has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author(s).
3. User/Public Rights
Register's spirit is to disseminate articles published are as free as possible. Under the Creative Commons license, Register permits users to copy, distribute, display, and perform the work for non-commercial purposes only. Users will also need to attribute authors and Register on distributing works in the journal and other media of publications. Unless otherwise stated, the authors are public entities as soon as their articles got published.
4. Rights of Authors
Authors retain all their rights to the published works, such as (but not limited to) the following rights;
Copyright and other proprietary rights relating to the article, such as patent rights,
The right to use the substance of the article in own future works, including lectures and books,
The right to reproduce the article for own purposes,
The right to self-archive the article (please read out deposit policy),
The right to enter into separate, additional contractual arrangements for the non-exclusive distribution of the article's published version (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal (Register: Jurnal Ilmiah Teknologi Sistem Informasi).
5. Co-Authorship
If the article was jointly prepared by more than one author, any authors submitting the manuscript warrants that he/she has been authorized by all co-authors to be agreed on this copyright and license notice (agreement) on their behalf, and agrees to inform his/her co-authors of the terms of this policy. Register will not be held liable for anything that may arise due to the author(s) internal dispute. Register will only communicate with the corresponding author.
6. Royalties
Being an open accessed journal and disseminating articles for free under the Creative Commons license term mentioned, author(s) aware that Register entitles the author(s) to no royalties or other fees.
7. Miscellaneous
Register will publish the article (or have it published) in the journal if the article’s editorial process is successfully completed. Register's editors may modify the article to a style of punctuation, spelling, capitalization, referencing and usage that deems appropriate. The author acknowledges that the article may be published so that it will be publicly accessible and such access will be free of charge for the readers as mentioned in point 3.