Community detection in twitter based on tweets similarities in indonesian using cosine similarity and louvain algorithms

Authors

  • Akhmad Irsyad Institut Inteknologi Sepuluh Nopember, Surabaya
  • Nur Aini Rakhmawati Institut Inteknologi Sepuluh Nopember, Surabaya

DOI:

https://doi.org/10.26594/register.v6i1.1595

Keywords:

community detection, Louvain algorithm, social network, text similarity, Twitter

Abstract

Twitter is now considered as one of the fastest and most popular communication media and is often used to track current events or news. Many tweets tend to contain semantically identical information. When following an activity or news, sometimes in tweeting people do it in groups. Therefore, it is necessary to have a useful technique for grouping users based on the tweets similarities. In this study, cosine similarity method is used to examine the similarity of tweets between accounts, and a graph-based approach is proposed to detect communities. Graphs are first depicted from similarities between tweets and next community detection techniques are applied in graphs to group accounts that have similar tweets. The reason for using these two methods is that compared to other methods, the accuracy of cosine similarity is higher while Louvain can result a better modularity. From this research, it was concluded that cosine similarity and Louvain algorithm could be used in community detection on social media.

Author Biographies

Akhmad Irsyad, Institut Inteknologi Sepuluh Nopember, Surabaya

Information System Department

Nur Aini Rakhmawati, Institut Inteknologi Sepuluh Nopember, Surabaya

Information System Department

References

N. R. Fatahillah, P. Suryati and C. Haryawan, "Implementation of Naive Bayes classifier algorithm on social media (Twitter) to the teaching of Indonesian hate speech," in International Conference on Sustainable Information Engineering and Technology (SIET), Malang, Indonesia, 2017.

I. A. Nur, M. A. Bijaksana and E. Darwiyanto, "Community Detection Menggunakan Genetic Algorithm dalam Social Network Twitter," in eProceedings of Engineering, 2015.

Y. Zhang, Y. Wu and Q. Yang, "Community Discovery in Twitter Based on User Interests," Journal of Computational Information Systems, vol. 8, no. 3, p. 991–1000, 2012.

C. N. Utami, W. Maharani and A. Adiwijaya, "Analisis dan Implementasi Community Detection Menggunakan Algoritma Girvan and Newman Dalam Sosial Network," Telkom University, Bandung, 2013.

A. Riyani, M. Z. Naf’an and A. Burhanuddin, "Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen," Jurnal Linguistik Komputasional (JLK), vol. 2, no. 1, pp. 23-27, 2019.

M. R. R. Gunaedi, I. Atastina and A. Herdiani, "Analisis dan Implementasi Algoritma Dynamicnet pada Deteksi Evolusi Komunitas di Media Sosial Twitter," in e-Proceeding of Engineering, 2018.

D. R. Lazuardi, "Analisis Sentimen untuk Mengetahui Persepsi Kualitas Merek Menggunakan Text Mining dan Social Network Analysis Pada Konten Percakapan Di Media Sosial Twitter," in eProceedings of Management, 2014.

S. Dutta, S. Ghatak, M. Roy, S. Ghosh and A. K. Das, "A graph based clustering technique for tweet summarization," in 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), Noida, India, 2015.

C. Fócil-Arias, J. Zúñiga, G. Sidorov, I. Batyrshin and A. Gelbukh, in Conference and Labs of the Evaluation Forum, Dublin, Ireland, 2017.

M. D. Conover, J. Ratkiewicz, M. Francisco, B. Gonc¸alves, A. Flammini and F. Menczer, "Political Polarization on Twitter," in Proceedings of The Fifth International AAAI Conference on Weblogs and Social Media (ICWSM-11), Barcelona, Spain, 2011.

F. P. Azali, "Klasifikasi Pengaduan Masyarakat Berbasis SMS dengan Metode Naive Bayes Classifier," Universitas Gadjah Mada, Yogyakarta, 2016.

T. Arif, "Prediksi Perpindahan Pelanggan Industri Telekomunikasi Seluler Menggunakan Klasifikasi Sentimen Pada Situs Jejaring Sosial Twitter Menggunakan Support Vector Machine," Institut Teknologi Sepuluh Nopember, Surabaya, 2016.

A. K. Uysal and S. Gunal, "The impact of preprocessing on text classification," Information Processing & Management, vol. 50, no. 1, pp. 104-112, 2014.

M. Adriani, J. Asian, B. Nazief, S. M. Tahaghoghi and H. E. Williams, "Stemming Indonesian: A confix-stripping approach," ACM Transactions on Asian Language Information Processing (TALIP), vol. 6, no. 4, pp. 1-33, 2007.

Z. Pratama, E. Utami and M. R. Arief, "Analisa Perbandingan Jenis N-Gram Dalam Penentuan Similarity Text pada Deteksi Plagiat," Citec Journal, vol. 4, no. 4, pp. 254-263, 2017.

O. Nurdiana, J. Jumadi and D. Nursantika, "Perbandingan Metode Cosine Similarity dengan Metode Jaccard Similarity pada Aplikasi Pencarian Terjemah Al-Qur’an dalam Bahasa Indonesia," JOIN, vol. 1, no. 1, pp. 59-63, 2016.

M. Needham and A. E. Hodler, A Comprehensive Guide to Graph Algorithms in Neo4j, Neo4j, 2018.

V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, "Fast unfolding of communities in large networks," Journal of Statistical Mechanics: Theory and Experiment, 2008.

B. Wei and Y. Deng, "A cluster-growing dimension of complex networks: From the view of node closeness centrality," Physica A: Statistical Mechanics and its Applications, vol. 522, no. 15 May 2019, pp. 80-87, 2019.

Downloads

Published

2020-01-01

How to Cite

[1]
A. Irsyad and N. A. Rakhmawati, “Community detection in twitter based on tweets similarities in indonesian using cosine similarity and louvain algorithms”, regist. j. ilm. teknol. sist. inf., vol. 6, no. 1, pp. 22–31, Jan. 2020.

Issue

Section

Article