Query Expansion menggunakan Word Embedding dan Pseudo Relevance Feedback
DOI:
https://doi.org/10.26594/register.v5i1.1385Keywords:
Pseudo Relevant Feedback, Query Expansion, Word EmbeddingAbstract
Kata kunci merupakan hal terpenting dalam mencari sebuah informasi. Penggunaan kata kunci yang tepat menghasilkan informasi yang relevan. Saat penggunaannya sebagai query, pengguna menggunakan bahasa yang alami, sehingga terdapat kata di luar dokumen jawaban yang telah disiapkan oleh sistem. Sistem tidak dapat memproses bahasa alami secara langsung yang dimasukkan oleh pengguna, sehingga diperlukan proses untuk mengolah kata-kata tersebut dengan mengekspansi setiap kata yang dimasukkan pengguna yang dikenal dengan Query Expansion (QE). Metode QE pada penelitian ini menggunakan Word Embedding karena hasil dari Word Embedding dapat memberikan kata-kata yang sering muncul bersama dengan kata-kata dalam query. Hasil dari word embedding dipakai sebagai masukan pada pseudo relevance feedback untuk diperkaya berdasarkan dokumen jawaban yang telah ada. Metode QE diterapkan dan diuji coba pada aplikasi chatbot. Hasil dari uji coba metode QE yang diterapkan pada chatbot didapatkan nilai recall, precision, dan F-measure masing-masing 100%; 70% dan 82,35 %. Hasil tersebut meningkat 1,49% daripada chatbot tanpa menggunakan QE yang pernah dilakukan sebelumnya yang hanya meraih akurasi sebesar 68,51%. Berdasarkan hasil pengukuran tersebut, QE menggunakan word embedding dan pseudo relevance feedback pada chatbot dapat mengatasi query masukan dari pengguna yang ambigu dan alami, sehingga dapat memberikan jawaban yang relevan kepada pengguna.
Keywords are the most important words and phrases used to obtain relevant information on content. Although users make use of natural languages, keywords are processed as queries by the system due to its inability to process. The language directly entered by the user is known as query expansion (QE). The proposed QE in this research uses word embedding owing to its ability to provide words that often appear along with those in the query. The results are used as inputs to the pseudo relevance feedback to be enriched based on the existing documents. This method is also applied to the chatbot application and precision, and F-measure values of the results obtained were 100%, 70%, 82.35% respectively. The results are 1.49% better than chatbot without using QE with 68.51% accuracy. Based on the results of these measurements, QE using word embedding and pseudo which gave relevance feedback in chatbots can resolve ambiguous and natural user’s input queries thereby enabling the system retrieve relevant answers.
References
Agung, G. (2011, April 20). 17 Pertanyaan Yang Sering Ditanyakan Ibu Hamil. Retrieved from Dr. Gregorius Agung, SpOG: http://greg-spog.com/kebidanan-kandungan/17-pertanyaan-yang-sering-ditanyakan-ibu-hamil/
Buckley, C., Salton, G., & Allan, J. (1994). The Effect of Adding Relevance Information in a Relevance Feedback Environment. SIGIR ’94 (pp. 292-300). London: Springer.
Dalpiaz, F., Ferrari, A., Franch, X., & Palomares, C. (2018). Natural Language Processing for Requirements Engineering: The Best Is Yet to Come. IEEE Software, 35(5), 115-119.
Dierk, S. F. (1972). The SMART retrieval system: Experiments in automatic document processing. IEEE Transactions on Professional Communication, PC-15(1), 17.
Domarco, D., & Iswari, N. M. (2017). Rancang Bangun Aplikasi Chatbot Sebagai Media Pencarian Informasi Anime Menggunakan Regular Expression Pattern Matching. ULTIMATICS: Jurnal Ilmu Teknik Informatika, 9(1), 19-24.
Fitriana, D. A. (2016, September 1). Gizi Seimbang Ibu Hamil. Retrieved from Jurusan Gizi Fakultas Kedokteran Universitas Brawijaya: http://gizi.fk.ub.ac.id/gizi-seimbang-ibu-hamil/
Indarini, N. (2018, Juli 17). Kumpulan Pertanyaan Seputar 'Bolehkah Ibu Hamil Makan...'. Retrieved from HaiBunda.com: https://www.haibunda.com/kehamilan/20180716143654-49-23095/kumpulan-pertanyaan-seputar-bolehkah-ibu-hamil-makan
Kuzi, S., Shtok, A., & Kurland, O. (2016). Query Expansion Using Word Embeddings. CIKM '16 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (pp. 1929-1932). Indianapolis, Indiana, USA: ACM.
Lee, H.-Y., & Lee, L.-S. (2014). Improved Semantic Retrieval of Spoken Content by Document/Query Expansion with Random Walk Over Acoustic Similarity Graphs. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(1), 80-94.
Liu, Q., Huang, H., Lut, J., Gao, Y., & Zhang, G. (2017). Enhanced word embedding similarity measures using fuzzy rules for query expansion. 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). Naples, Italy: EEE.
Ludviani, R., Hayati, K. F., Arifin, A. Z., & Purwitasari, D. (2015). Optimasi Pembobotan pada Query Expansion dengan Term Relatedness to Query-Entropy based (TRQE). Jurnal Buana Informatika, 6(3), 203-212.
Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G., Juan, E. S., . . . Ferro, N. (2015). Experimental IR Meets Multilinguality, Multimodality, and Interaction. 6th International Conference of the CLEF Association (CLEF'15). Toulouse, France: Springer.
Nie, L., Jiang, H., Ren, Z., Sun, Z., & Li, X. (2016). Query Expansion Based on Crowd Knowledge for Code Search. IEEE Transactions on Services Computing, 9(5), 771-783.
Ooi, J., Ma, X., Qin, H., & Liew, S. C. (2015). A survey of query expansion, query suggestion and query refinement techniques. 2015 4th International Conference on Software Engineering and Computer Systems (ICSECS). Kuantan, Malaysia: IEEE.
Putra, F. N., Effendi, A., & Arifin, A. Z. (2018). Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen. Jurnal Linguistik Komputasional, 1(1), 17-22.
Rattinger, A., Goff, J.-M. L., & Guetl, C. (2018). Local Word Embeddings for Query Expansion based on Co-Authorship and Citations. BIR 2018 Workshop on Bibliometric-enhanced Information Retrieval (pp. 46-53). Grenoble, France: CEUR-WS.
Reshma, E. U., & Remya, P. C. (2017). A review of different approaches in natural language interfaces to databases. 2017 International Conference on Intelligent Sustainable Systems (ICISS). Palladam, India: IEEE.
Şenel, L. K., Utlu, İ., Yücesoy, V., Koç, A., & Çukur, T. (2018). Semantic Structure and Interpretability of Word Embeddings. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(10), 1769 -1779.
Singh, R., Paste, M., Shinde, N., Patel, H., & Mishra, N. (2018). Chatbot using TensorFlow for small Businesses. 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT). Coimbatore, India: IEEE.
Vaidyanathan, R., Das, S., & Srivastava, N. (2015, February 18). Query Expansion Strategy based on Pseudo Relevance Feedback and Term Weight Scheme for Monolingual Retrieval. Retrieved from arXiv: https://arxiv.org/abs/1502.05168
Wang, X., Fang, H., & Zhai, C. (2008). A Study of Methods for Negative Relevance Feedback. SIGIR '08 Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 219-226). Singapore: ACM.
Xu, B., Lin, H., Lin, Y., Yang, L., & Xu, K. (2018). Improving Pseudo-Relevance Feedback With Neural Network-Based Word Representations. IEEE Access, 6, 62152-62165.
Yan, R., & Gao, G. (2017). Pseudo-Based Relevance Analysis for Information Retrieval. 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI). Boston, MA, USA: IEEE.
Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent Trends in Deep Learning Based Natural Language Processing [Review Article]. IEEE Computational Intelligence Magazine, 13(3), 55 -75.
Downloads
Published
How to Cite
Issue
Section
License
Please find the rights and licenses in Register: Jurnal Ilmiah Teknologi Sistem Informasi. By submitting the article/manuscript of the article, the author(s) agree with this policy. No specific document sign-off is required.
1. License
The non-commercial use of the article will be governed by the Creative Commons Attribution license as currently displayed on Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
2. Author(s)' Warranties
The author warrants that the article is original, written by stated author(s), has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author(s).
3. User/Public Rights
Register's spirit is to disseminate articles published are as free as possible. Under the Creative Commons license, Register permits users to copy, distribute, display, and perform the work for non-commercial purposes only. Users will also need to attribute authors and Register on distributing works in the journal and other media of publications. Unless otherwise stated, the authors are public entities as soon as their articles got published.
4. Rights of Authors
Authors retain all their rights to the published works, such as (but not limited to) the following rights;
Copyright and other proprietary rights relating to the article, such as patent rights,
The right to use the substance of the article in own future works, including lectures and books,
The right to reproduce the article for own purposes,
The right to self-archive the article (please read out deposit policy),
The right to enter into separate, additional contractual arrangements for the non-exclusive distribution of the article's published version (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal (Register: Jurnal Ilmiah Teknologi Sistem Informasi).
5. Co-Authorship
If the article was jointly prepared by more than one author, any authors submitting the manuscript warrants that he/she has been authorized by all co-authors to be agreed on this copyright and license notice (agreement) on their behalf, and agrees to inform his/her co-authors of the terms of this policy. Register will not be held liable for anything that may arise due to the author(s) internal dispute. Register will only communicate with the corresponding author.
6. Royalties
Being an open accessed journal and disseminating articles for free under the Creative Commons license term mentioned, author(s) aware that Register entitles the author(s) to no royalties or other fees.
7. Miscellaneous
Register will publish the article (or have it published) in the journal if the article’s editorial process is successfully completed. Register's editors may modify the article to a style of punctuation, spelling, capitalization, referencing and usage that deems appropriate. The author acknowledges that the article may be published so that it will be publicly accessible and such access will be free of charge for the readers as mentioned in point 3.