Community detection in twitter based on tweets similarities in indonesian using cosine similarity and louvain algorithms

Akhmad Irsyad(1*), Nur Aini Rakhmawati(2),

(1) Institut Inteknologi Sepuluh Nopember, Surabaya
(2) Institut Inteknologi Sepuluh Nopember, Surabaya
(*) Corresponding Author
Akhmad Irsyad
Nur Aini Rakhmawati

Abstract


Twitter is now considered as one of the fastest and most popular communication media and is often used to track current events or news. Many tweets tend to contain semantically identical information. When following an activity or news, sometimes in tweeting people do it in groups. Therefore, it is necessary to have a useful technique for grouping users based on the tweets similarities. In this study, cosine similarity method is used to examine the similarity of tweets between accounts, and a graph-based approach is proposed to detect communities. Graphs are first depicted from similarities between tweets and next community detection techniques are applied in graphs to group accounts that have similar tweets. The reason for using these two methods is that compared to other methods, the accuracy of cosine similarity is higher while Louvain can result a better modularity. From this research, it was concluded that cosine similarity and Louvain algorithm could be used in community detection on social media.

Keywords


community detection; Louvain algorithm; social network; text similarity; Twitter

Full Text:

Article iThenticate

References


N. R. Fatahillah, P. Suryati and C. Haryawan, "Implementation of Naive Bayes classifier algorithm on social media (Twitter) to the teaching of Indonesian hate speech," in International Conference on Sustainable Information Engineering and Technology (SIET), Malang, Indonesia, 2017.

I. A. Nur, M. A. Bijaksana and E. Darwiyanto, "Community Detection Menggunakan Genetic Algorithm dalam Social Network Twitter," in eProceedings of Engineering, 2015.

Y. Zhang, Y. Wu and Q. Yang, "Community Discovery in Twitter Based on User Interests," Journal of Computational Information Systems, vol. 8, no. 3, p. 991–1000, 2012.

C. N. Utami, W. Maharani and A. Adiwijaya, "Analisis dan Implementasi Community Detection Menggunakan Algoritma Girvan and Newman Dalam Sosial Network," Telkom University, Bandung, 2013.

A. Riyani, M. Z. Naf’an and A. Burhanuddin, "Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen," Jurnal Linguistik Komputasional (JLK), vol. 2, no. 1, pp. 23-27, 2019.

M. R. R. Gunaedi, I. Atastina and A. Herdiani, "Analisis dan Implementasi Algoritma Dynamicnet pada Deteksi Evolusi Komunitas di Media Sosial Twitter," in e-Proceeding of Engineering, 2018.

D. R. Lazuardi, "Analisis Sentimen untuk Mengetahui Persepsi Kualitas Merek Menggunakan Text Mining dan Social Network Analysis Pada Konten Percakapan Di Media Sosial Twitter," in eProceedings of Management, 2014.

S. Dutta, S. Ghatak, M. Roy, S. Ghosh and A. K. Das, "A graph based clustering technique for tweet summarization," in 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), Noida, India, 2015.

C. Fócil-Arias, J. Zúñiga, G. Sidorov, I. Batyrshin and A. Gelbukh, in Conference and Labs of the Evaluation Forum, Dublin, Ireland, 2017.

M. D. Conover, J. Ratkiewicz, M. Francisco, B. Gonc¸alves, A. Flammini and F. Menczer, "Political Polarization on Twitter," in Proceedings of The Fifth International AAAI Conference on Weblogs and Social Media (ICWSM-11), Barcelona, Spain, 2011.

F. P. Azali, "Klasifikasi Pengaduan Masyarakat Berbasis SMS dengan Metode Naive Bayes Classifier," Universitas Gadjah Mada, Yogyakarta, 2016.

T. Arif, "Prediksi Perpindahan Pelanggan Industri Telekomunikasi Seluler Menggunakan Klasifikasi Sentimen Pada Situs Jejaring Sosial Twitter Menggunakan Support Vector Machine," Institut Teknologi Sepuluh Nopember, Surabaya, 2016.

A. K. Uysal and S. Gunal, "The impact of preprocessing on text classification," Information Processing & Management, vol. 50, no. 1, pp. 104-112, 2014.

M. Adriani, J. Asian, B. Nazief, S. M. Tahaghoghi and H. E. Williams, "Stemming Indonesian: A confix-stripping approach," ACM Transactions on Asian Language Information Processing (TALIP), vol. 6, no. 4, pp. 1-33, 2007.

Z. Pratama, E. Utami and M. R. Arief, "Analisa Perbandingan Jenis N-Gram Dalam Penentuan Similarity Text pada Deteksi Plagiat," Citec Journal, vol. 4, no. 4, pp. 254-263, 2017.

O. Nurdiana, J. Jumadi and D. Nursantika, "Perbandingan Metode Cosine Similarity dengan Metode Jaccard Similarity pada Aplikasi Pencarian Terjemah Al-Qur’an dalam Bahasa Indonesia," JOIN, vol. 1, no. 1, pp. 59-63, 2016.

M. Needham and A. E. Hodler, A Comprehensive Guide to Graph Algorithms in Neo4j, Neo4j, 2018.

V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, "Fast unfolding of communities in large networks," Journal of Statistical Mechanics: Theory and Experiment, 2008.

B. Wei and Y. Deng, "A cluster-growing dimension of complex networks: From the view of node closeness centrality," Physica A: Statistical Mechanics and its Applications, vol. 522, no. 15 May 2019, pp. 80-87, 2019.




DOI: https://doi.org/10.26594/register.v6i1.1595

Article metrics

Abstract Abstract views : 0times
Article views : 0 times iThenticate views : 0 times

Refbacks

  • There are currently no refbacks.



Indexed in:

                                   


 

Creative Commons License
Register: Jurnal Ilmiah Teknologi Sistem Informasi is licensed under a Creative Commons Attribution 4.0 International License.