Peringkasan dokumen berita Bahasa Indonesia menggunakan metode Cross Latent Semantic Analysis
DOI:
https://doi.org/10.26594/register.v3i2.1161Keywords:
cross latent semantic analysis, Document summarization, Latent semantic analysis, News, RSS-Feed, berita, peringkasan dokumenAbstract
Peringkasan dokumen berita Bahasa Indonesia dapat membantu untuk menemukan ide-ide pokok atau informasi penting lain dari sebuah berita. Berita umumnya terdiri atas banyaknya paragraf menjadi sebab diperlukan sebuah sistem untuk mengekstrak informasi, sehingga mampu memberikan ide pokok atau informasi penting yang tepat kepada pembaca, tanpa harus membaca secara detail keseluruhan isi berita tersebut, di samping itu dapat dimanfaatkan guna keperluaan Really Simple Syndication Feed (RSS-Feed). Penelitian ini memaparkan peringkasan dokumen berita berbahasa Indonesia menggunakan metode Cross Latent Semantic Analysis (CLSA) dan Latent Semantic Analysis (LSA). Untuk menguji seberapa baik hasil ringkasan yang dilakukan CLSA penelitian ini menggunakan 240 artikel berita yang diambil dari halaman portal www.kompas.com dan dua pakar yang berlatar belakang bidang yang berbeda. Hasil ringkasan CLSA dengan compression rate 30% memperoleh nilai F-Measure 0.72%. Penelitian ini juga menemukan fakta bahwa CLSA lebih baik dari metode LSA yang merupakan cikal bakal dari metode CLSA, walaupun skor hasil F-Measure keduanya tidak berbeda jauh.
Summarizing news documents in Bahasa serves to find main ideas or any other important information from a piece of news. A system to extract the information from ones consisting of many paragraphs is then deemed necessary in order to present precise main ideas or important information to the readers without them having to read the entire passage of news documents, in addition to become useful for Really Simple Syndication Feed (RSS-Feed). This article discusses summarizing news documents in Bahasa using Cross Latent Semantic Analysis (CLSA). To test if the summary resulted from CLSA qualified, this study examines 240 news articles retrieved from www.kompas.com and employs two experts from different fields. The summary resulted from CLSA with a compression rate of 30% obtains an F-Measure of 0.72%. This study also evidently indicates that CLSA has better performance from Latent Semantic Analysis (LSA) which was the initial system for CLSA, despite both F-Measure percentages being only slightly different.
References
Asian, J. (2007). Effective Techniques for Indonesian Text Retrieval. Melbourne: RMIT University.
Badry, R. M., Eldin, A. S., & Elzanfally, D. S. (2013). Text Summarization within the Latent Semantic Analysis Framework: Comparative Study. International Journal of Computer Applications, 81(11), 40-45.
Das, D., & Martins, A. F. (2007). A Survey on Automatic Text Summarization. Literature Survey for the Language and Statistics II course at CMU, 192-195.
Geetha, J. K., & Deepamala, N. (2015). Kannada text summarization using Latent Semantic Analysis. International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1508-1512). Pune: IEEE.
Gong, Y., & Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis. Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 19-25). New Orleans: ACM.
Gotami, N. S., Indriati, I., & Dewi, R. K. (2018). Peringkasan Teks Otomatis Secara Ekstraktif Pada Artikel Berita Kesehatan Berbahasa Indonesia Dengan Menggunakan Metode Latent Semantic Analysis. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, 2(9), 2821-2828.
Gunawan, F. E., Juandi, A. V., & Soewito, B. (2015). An automatic text summarization using text features and singular value decomposition for popular articles in Indonesia language. 2015 International Seminar on Intelligent Technology and Its Applications (ISITIA) (pp. 27-32). Surabaya: IEEE. doi:10,1109/ISITIA.2015.7219948
Mustaqhfiri, M., Abidin, Z., & Kusumawati, R. (2011). Peringkasan teks otomatis berita berbahasa Indonesia menggunakan metode Maximum Marginal Relevance. MATICS, 4(4), 134-147.
Najibullah, A., & Mingyan, W. (2015). Otomatisasi peringkasan dokumen sebagai pendukung sistem manajemen surat. Register: Jurnal Ilmiah Teknologi Sistem Informasi, 1(1), 1-6.
Ozsoy, M. G., Cicekli, I., & Alpaslan, F. N. (2010). Text summarization of Turkish texts using latent semantic analysis. Proceedings of the 23rd International Conference on Computational Linguistics (pp. 869-876). Beijing: ACM.
Steinberger, J., & Ježek, K. (2004). Using Latent Semantic Analysis in Text Summarization and Summary Evaluation. Proc. ISIM ’04, (pp. 93–100).
Torres-Moreno, J.-M. (2014). Automatic text summarization (Vol. 5). Hoboken: Wiley-ISTE.
Viva, T. (2016, Maret 16). Riset: Konsumsi Berita Online Kalahkan Televisi. Retrieved from Viva: https://www.viva.co.id/digital/digilife/748454-riset-konsumsi-berita-online-kalahkan-televisi
Winata, F., & Rainarli, E. (2016). Implementasi Cross method Latent Semantic Analysis untuk meringkas dokumen berita Berbahasa Indonesia. Techno.Com, 15(4), 266-277.
Zeniarja, J., Salam, A., Luthfiarta, A., Handoko, L. B., & Jamhari, M. (2013). Integrasi peringkas dokumen otomatis dengan penggabungan metode fitur dan metode Latent Semantic Analysis (LSA) sebagai Feature Reduction. Seminar Nasional Teknologi Informasi & Komunikasi Terapan 2013 (SEMANTIK 2013) (pp. 191-197). Semarang: Universitas Dian Nuswantoro.
Downloads
Published
How to Cite
Issue
Section
License
Please find the rights and licenses in Register: Jurnal Ilmiah Teknologi Sistem Informasi. By submitting the article/manuscript of the article, the author(s) agree with this policy. No specific document sign-off is required.
1. License
The non-commercial use of the article will be governed by the Creative Commons Attribution license as currently displayed on Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
2. Author(s)' Warranties
The author warrants that the article is original, written by stated author(s), has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author(s).
3. User/Public Rights
Register's spirit is to disseminate articles published are as free as possible. Under the Creative Commons license, Register permits users to copy, distribute, display, and perform the work for non-commercial purposes only. Users will also need to attribute authors and Register on distributing works in the journal and other media of publications. Unless otherwise stated, the authors are public entities as soon as their articles got published.
4. Rights of Authors
Authors retain all their rights to the published works, such as (but not limited to) the following rights;
Copyright and other proprietary rights relating to the article, such as patent rights,
The right to use the substance of the article in own future works, including lectures and books,
The right to reproduce the article for own purposes,
The right to self-archive the article (please read out deposit policy),
The right to enter into separate, additional contractual arrangements for the non-exclusive distribution of the article's published version (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal (Register: Jurnal Ilmiah Teknologi Sistem Informasi).
5. Co-Authorship
If the article was jointly prepared by more than one author, any authors submitting the manuscript warrants that he/she has been authorized by all co-authors to be agreed on this copyright and license notice (agreement) on their behalf, and agrees to inform his/her co-authors of the terms of this policy. Register will not be held liable for anything that may arise due to the author(s) internal dispute. Register will only communicate with the corresponding author.
6. Royalties
Being an open accessed journal and disseminating articles for free under the Creative Commons license term mentioned, author(s) aware that Register entitles the author(s) to no royalties or other fees.
7. Miscellaneous
Register will publish the article (or have it published) in the journal if the article’s editorial process is successfully completed. Register's editors may modify the article to a style of punctuation, spelling, capitalization, referencing and usage that deems appropriate. The author acknowledges that the article may be published so that it will be publicly accessible and such access will be free of charge for the readers as mentioned in point 3.