Klasifikasi jenis kejadian menggunakan kombinasi NeuroNER dan Recurrent Convolutional Neural Network pada data Twitter

Fatra Nonggala Putra
Chastine Fatichah

Abstract


Sistem deteksi kejadian dari data Twitter bertujuan untuk mendapatkan data secara real-time sebagai alternatif sistem deteksi kejadian yang murah. Penelitian tentang sistem deteksi kejadian telah dilakukan sebelumnya. Salah satu modul utama dari sistem deteksi kejadian adalah modul klasifikasi jenis kejadian. Informasi dapat diklasifikasikan sebagai kejadian penting jika memiliki entitas yang merepresentasikan di mana lokasi kejadian terjadi. Beberapa penelitian sebelumnya masih memanfaatkan fitur ‘buatan tangan’, maupun fitur model berbasis pipeline seperti n-gram sebagai penentuan fitur kunci klasifikasi yang tidak efektif dengan performa kurang optimal. Oleh karena itu, diusulkan penggabungan metode Neuro Named Entity Recognition (NeuroNER) dan klasifier Recurrent Convolutional Neural Network (RCNN) yang diharapkan dapat melakukan deteksi kejadian secara efektif dan optimal. Pertama, sistem melakukan pengenalan entitas bernama pada data tweet untuk mengenali entitas lokasi yang terdapat dalam teks tweet, karena informasi kejadian haruslah memiliki minimal satu entitas lokasi. Kedua, jika tweet terdeteksi memiliki entitas lokasi maka akan dilakukan proses klasifikasi kejadian menggunakan klasifier RCNN. Berdasarkan hasil uji coba, disimpulkan bahwa sistem deteksi kejadian menggunakan penggabungan NeuroNER dan RCNN bekerja dengan sangat baik dengan nilai rata-rata precision, recall, dan f-measure masing-masing 94,87%, 92,73%, dan 93,73%.

 

 

 

 

The incident detection system from Twitter data aims to obtain real-time information as an alternative low-cost incident detection system. One of the main modules in the incident detection system is the classification module. Information is classified as important incident if it has an entity that represents where the incident occurred. Some previous studies still use 'handmade' features as well as feature-based pipeline models such as n-grams as the key features for classification which are deemed as ineffective. Therefore, this research propose a combination of Neuro Named Entity Recognition (NeuroNER) and Recurrent Convolutional Neural Network (RCNN) as an effective classification method for incident detection. First, the system perform named entity recognition to identify the location contained in the tweet text because the event information should have at least one location entity. Then, if the location is successfully identified, the incident will be classified using RCNN. Experimental result shows that the incident detection system using combination  of NeuroNER and RCNN works very well with the average value of precision, recall, and f-measure 92.44%, 94.76%, and 93.53% respectively.


Keywords


deteksi kejadian; ekstraksi informasi; NeuroNER; RCNN; incident detection; information extraction

Full Text:

PDF

References


Bharti, S. K., Vachha, B., Pradhan, R. K., Babu, K. S., & Jena, S. K. (2016). Sarcastic sentiment detection in tweets streamed in real time: a big data approach. Digital Communications and Networks, 2(3), 108-121.

Dernoncourt, F., Lee, J. Y., & Szolovits, P. (2017). NeuroNER: an easy-to-use program for named-entity recognition based on neural networks. Proceedings of the 2017 EMNLP System Demonstrations (pp. 97-102). Copenhagen: Association for Computational Linguistics.

Gelernter, J., & Balaji, S. (2013). An algorithm for local geoparsing of microtext. GeoInformatica, 17(4), 635-667.

Gu, Y., Qian, Z. S., & Chen, F. (2016). From Twitter to detector: Real-time traffic incident detection using social media data. Transportation research part C: emerging technologies, 67, 321-342.

Hasby, M., & Khodra, M. L. (2013). Optimal path finding based on traffic information extraction from Twitter. International Conference on ICT for Smart Society. Jakarta: IEEE.

He, K., Li, Y., Soundarajan, S., & Hopcroft, J. E. (2018). Hidden Community Detection in Social Networks. Information Sciences, 425(January 2018), 92-106.

Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. CORR.

Jianqiang, Z., & Xiaolin, G. (2017). Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Access, 5, 2870-2879.

Lai, S., Xu, L., Liu, K., & Zhao, J. (2015). Recurrent Convolutional Neural Networks for Text Classification. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (pp. 2267-2273). Austin: AAAI.

Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural architectures for named entity recognition. Proceedings of NAACL-HLT 2016 (pp. 260–270). San Diego: Association for Computational Linguistics.

Liu, X., & Zhou, M. (2013). Two-stage NER for tweets with clustering. Information Processing & Management, 49(1), 264-273.

Najibullah, A., & Mingyan, W. (2015). Otomatisasi Peringkasan Dokumen Sebagai Pendukung Sistem Manajemen Surat. Register: Jurnal Ilmiah Teknologi Sistem Informasi, 1(1), 1-6.

Nidhi, R. H., & Annappa, B. (2017). Twitter-user recommender system using tweets: A content-based approach. 2017 International Conference on Computational Intelligence in Data Science(ICCIDS). Chennai: IEEE.

Perdana, R. S., Fatichah, C., & Purwitasari, D. (2015). Pemilihan kata kunci untuk deteksi kejadian trivial pada dokumen Twitter menggunakan Autocorrelation Wavelet Coefficients. JUTI, 13(2), 152-159.

Putra, F. N., Effendi, A., & Arifin, A. Z. (2018). Pembobotan Kata pada Query Expansion dengan Tesaurus dalam Pencarian Dokumen Bahasa Indonesia. Jurnal Linguistik Komputasional (JLK), 1(1), 17-22.




DOI: https://doi.org/10.26594/register.v4i2.1242

Article metrics

Abstract views : 86 | views : 42

Refbacks

  • There are currently no refbacks.



Indexed in:

                                   


 

Creative Commons License
Register: Jurnal Ilmiah Teknologi Sistem Informasi is licensed under a Creative Commons Attribution 4.0 International License.