Klasifikasi jenis kejadian menggunakan kombinasi NeuroNER dan Recurrent Convolutional Neural Network pada data Twitter
DOI:
https://doi.org/10.26594/register.v4i2.1242Keywords:
deteksi kejadian, ekstraksi informasi, NeuroNER, RCNN, incident detection, information extractionAbstract
Sistem deteksi kejadian dari data Twitter bertujuan untuk mendapatkan data secara real-time sebagai alternatif sistem deteksi kejadian yang murah. Penelitian tentang sistem deteksi kejadian telah dilakukan sebelumnya. Salah satu modul utama dari sistem deteksi kejadian adalah modul klasifikasi jenis kejadian. Informasi dapat diklasifikasikan sebagai kejadian penting jika memiliki entitas yang merepresentasikan di mana lokasi kejadian terjadi. Beberapa penelitian sebelumnya masih memanfaatkan fitur ‘buatan tangan’, maupun fitur model berbasis pipeline seperti n-gram sebagai penentuan fitur kunci klasifikasi yang tidak efektif dengan performa kurang optimal. Oleh karena itu, diusulkan penggabungan metode Neuro Named Entity Recognition (NeuroNER) dan klasifier Recurrent Convolutional Neural Network (RCNN) yang diharapkan dapat melakukan deteksi kejadian secara efektif dan optimal. Pertama, sistem melakukan pengenalan entitas bernama pada data tweet untuk mengenali entitas lokasi yang terdapat dalam teks tweet, karena informasi kejadian haruslah memiliki minimal satu entitas lokasi. Kedua, jika tweet terdeteksi memiliki entitas lokasi maka akan dilakukan proses klasifikasi kejadian menggunakan klasifier RCNN. Berdasarkan hasil uji coba, disimpulkan bahwa sistem deteksi kejadian menggunakan penggabungan NeuroNER dan RCNN bekerja dengan sangat baik dengan nilai rata-rata precision, recall, dan f-measure masing-masing 94,87%, 92,73%, dan 93,73%.
The incident detection system from Twitter data aims to obtain real-time information as an alternative low-cost incident detection system. One of the main modules in the incident detection system is the classification module. Information is classified as important incident if it has an entity that represents where the incident occurred. Some previous studies still use 'handmade' features as well as feature-based pipeline models such as n-grams as the key features for classification which are deemed as ineffective. Therefore, this research propose a combination of Neuro Named Entity Recognition (NeuroNER) and Recurrent Convolutional Neural Network (RCNN) as an effective classification method for incident detection. First, the system perform named entity recognition to identify the location contained in the tweet text because the event information should have at least one location entity. Then, if the location is successfully identified, the incident will be classified using RCNN. Experimental result shows that the incident detection system using combination of NeuroNER and RCNN works very well with the average value of precision, recall, and f-measure 92.44%, 94.76%, and 93.53% respectively.
References
Bharti, S. K., Vachha, B., Pradhan, R. K., Babu, K. S., & Jena, S. K. (2016). Sarcastic sentiment detection in tweets streamed in real time: a big data approach. Digital Communications and Networks, 2(3), 108-121.
Dernoncourt, F., Lee, J. Y., & Szolovits, P. (2017). NeuroNER: an easy-to-use program for named-entity recognition based on neural networks. Proceedings of the 2017 EMNLP System Demonstrations (pp. 97-102). Copenhagen: Association for Computational Linguistics.
Gelernter, J., & Balaji, S. (2013). An algorithm for local geoparsing of microtext. GeoInformatica, 17(4), 635-667.
Gu, Y., Qian, Z. S., & Chen, F. (2016). From Twitter to detector: Real-time traffic incident detection using social media data. Transportation research part C: emerging technologies, 67, 321-342.
Hasby, M., & Khodra, M. L. (2013). Optimal path finding based on traffic information extraction from Twitter. International Conference on ICT for Smart Society. Jakarta: IEEE.
He, K., Li, Y., Soundarajan, S., & Hopcroft, J. E. (2018). Hidden Community Detection in Social Networks. Information Sciences, 425(January 2018), 92-106.
Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. CORR.
Jianqiang, Z., & Xiaolin, G. (2017). Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Access, 5, 2870-2879.
Lai, S., Xu, L., Liu, K., & Zhao, J. (2015). Recurrent Convolutional Neural Networks for Text Classification. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (pp. 2267-2273). Austin: AAAI.
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural architectures for named entity recognition. Proceedings of NAACL-HLT 2016 (pp. 260–270). San Diego: Association for Computational Linguistics.
Liu, X., & Zhou, M. (2013). Two-stage NER for tweets with clustering. Information Processing & Management, 49(1), 264-273.
Najibullah, A., & Mingyan, W. (2015). Otomatisasi Peringkasan Dokumen Sebagai Pendukung Sistem Manajemen Surat. Register: Jurnal Ilmiah Teknologi Sistem Informasi, 1(1), 1-6.
Nidhi, R. H., & Annappa, B. (2017). Twitter-user recommender system using tweets: A content-based approach. 2017 International Conference on Computational Intelligence in Data Science(ICCIDS). Chennai: IEEE.
Perdana, R. S., Fatichah, C., & Purwitasari, D. (2015). Pemilihan kata kunci untuk deteksi kejadian trivial pada dokumen Twitter menggunakan Autocorrelation Wavelet Coefficients. JUTI, 13(2), 152-159.
Putra, F. N., Effendi, A., & Arifin, A. Z. (2018). Pembobotan Kata pada Query Expansion dengan Tesaurus dalam Pencarian Dokumen Bahasa Indonesia. Jurnal Linguistik Komputasional (JLK), 1(1), 17-22.
Downloads
Published
How to Cite
Issue
Section
License
Please find the rights and licenses in Register: Jurnal Ilmiah Teknologi Sistem Informasi. By submitting the article/manuscript of the article, the author(s) agree with this policy. No specific document sign-off is required.
1. License
The non-commercial use of the article will be governed by the Creative Commons Attribution license as currently displayed on Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
2. Author(s)' Warranties
The author warrants that the article is original, written by stated author(s), has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author(s).
3. User/Public Rights
Register's spirit is to disseminate articles published are as free as possible. Under the Creative Commons license, Register permits users to copy, distribute, display, and perform the work for non-commercial purposes only. Users will also need to attribute authors and Register on distributing works in the journal and other media of publications. Unless otherwise stated, the authors are public entities as soon as their articles got published.
4. Rights of Authors
Authors retain all their rights to the published works, such as (but not limited to) the following rights;
Copyright and other proprietary rights relating to the article, such as patent rights,
The right to use the substance of the article in own future works, including lectures and books,
The right to reproduce the article for own purposes,
The right to self-archive the article (please read out deposit policy),
The right to enter into separate, additional contractual arrangements for the non-exclusive distribution of the article's published version (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal (Register: Jurnal Ilmiah Teknologi Sistem Informasi).
5. Co-Authorship
If the article was jointly prepared by more than one author, any authors submitting the manuscript warrants that he/she has been authorized by all co-authors to be agreed on this copyright and license notice (agreement) on their behalf, and agrees to inform his/her co-authors of the terms of this policy. Register will not be held liable for anything that may arise due to the author(s) internal dispute. Register will only communicate with the corresponding author.
6. Royalties
Being an open accessed journal and disseminating articles for free under the Creative Commons license term mentioned, author(s) aware that Register entitles the author(s) to no royalties or other fees.
7. Miscellaneous
Register will publish the article (or have it published) in the journal if the article’s editorial process is successfully completed. Register's editors may modify the article to a style of punctuation, spelling, capitalization, referencing and usage that deems appropriate. The author acknowledges that the article may be published so that it will be publicly accessible and such access will be free of charge for the readers as mentioned in point 3.