Community detection in twitter based on tweets similarities in indonesian using cosine similarity and louvain algorithms
DOI:
https://doi.org/10.26594/register.v6i1.1595Keywords:
community detection, Louvain algorithm, social network, text similarity, TwitterAbstract
Twitter is now considered as one of the fastest and most popular communication media and is often used to track current events or news. Many tweets tend to contain semantically identical information. When following an activity or news, sometimes in tweeting people do it in groups. Therefore, it is necessary to have a useful technique for grouping users based on the tweets similarities. In this study, cosine similarity method is used to examine the similarity of tweets between accounts, and a graph-based approach is proposed to detect communities. Graphs are first depicted from similarities between tweets and next community detection techniques are applied in graphs to group accounts that have similar tweets. The reason for using these two methods is that compared to other methods, the accuracy of cosine similarity is higher while Louvain can result a better modularity. From this research, it was concluded that cosine similarity and Louvain algorithm could be used in community detection on social media.References
N. R. Fatahillah, P. Suryati and C. Haryawan, "Implementation of Naive Bayes classifier algorithm on social media (Twitter) to the teaching of Indonesian hate speech," in International Conference on Sustainable Information Engineering and Technology (SIET), Malang, Indonesia, 2017.
I. A. Nur, M. A. Bijaksana and E. Darwiyanto, "Community Detection Menggunakan Genetic Algorithm dalam Social Network Twitter," in eProceedings of Engineering, 2015.
Y. Zhang, Y. Wu and Q. Yang, "Community Discovery in Twitter Based on User Interests," Journal of Computational Information Systems, vol. 8, no. 3, p. 991–1000, 2012.
C. N. Utami, W. Maharani and A. Adiwijaya, "Analisis dan Implementasi Community Detection Menggunakan Algoritma Girvan and Newman Dalam Sosial Network," Telkom University, Bandung, 2013.
A. Riyani, M. Z. Naf’an and A. Burhanuddin, "Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen," Jurnal Linguistik Komputasional (JLK), vol. 2, no. 1, pp. 23-27, 2019.
M. R. R. Gunaedi, I. Atastina and A. Herdiani, "Analisis dan Implementasi Algoritma Dynamicnet pada Deteksi Evolusi Komunitas di Media Sosial Twitter," in e-Proceeding of Engineering, 2018.
D. R. Lazuardi, "Analisis Sentimen untuk Mengetahui Persepsi Kualitas Merek Menggunakan Text Mining dan Social Network Analysis Pada Konten Percakapan Di Media Sosial Twitter," in eProceedings of Management, 2014.
S. Dutta, S. Ghatak, M. Roy, S. Ghosh and A. K. Das, "A graph based clustering technique for tweet summarization," in 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), Noida, India, 2015.
C. Fócil-Arias, J. Zúñiga, G. Sidorov, I. Batyrshin and A. Gelbukh, in Conference and Labs of the Evaluation Forum, Dublin, Ireland, 2017.
M. D. Conover, J. Ratkiewicz, M. Francisco, B. Gonc¸alves, A. Flammini and F. Menczer, "Political Polarization on Twitter," in Proceedings of The Fifth International AAAI Conference on Weblogs and Social Media (ICWSM-11), Barcelona, Spain, 2011.
F. P. Azali, "Klasifikasi Pengaduan Masyarakat Berbasis SMS dengan Metode Naive Bayes Classifier," Universitas Gadjah Mada, Yogyakarta, 2016.
T. Arif, "Prediksi Perpindahan Pelanggan Industri Telekomunikasi Seluler Menggunakan Klasifikasi Sentimen Pada Situs Jejaring Sosial Twitter Menggunakan Support Vector Machine," Institut Teknologi Sepuluh Nopember, Surabaya, 2016.
A. K. Uysal and S. Gunal, "The impact of preprocessing on text classification," Information Processing & Management, vol. 50, no. 1, pp. 104-112, 2014.
M. Adriani, J. Asian, B. Nazief, S. M. Tahaghoghi and H. E. Williams, "Stemming Indonesian: A confix-stripping approach," ACM Transactions on Asian Language Information Processing (TALIP), vol. 6, no. 4, pp. 1-33, 2007.
Z. Pratama, E. Utami and M. R. Arief, "Analisa Perbandingan Jenis N-Gram Dalam Penentuan Similarity Text pada Deteksi Plagiat," Citec Journal, vol. 4, no. 4, pp. 254-263, 2017.
O. Nurdiana, J. Jumadi and D. Nursantika, "Perbandingan Metode Cosine Similarity dengan Metode Jaccard Similarity pada Aplikasi Pencarian Terjemah Al-Qur’an dalam Bahasa Indonesia," JOIN, vol. 1, no. 1, pp. 59-63, 2016.
M. Needham and A. E. Hodler, A Comprehensive Guide to Graph Algorithms in Neo4j, Neo4j, 2018.
V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, "Fast unfolding of communities in large networks," Journal of Statistical Mechanics: Theory and Experiment, 2008.
B. Wei and Y. Deng, "A cluster-growing dimension of complex networks: From the view of node closeness centrality," Physica A: Statistical Mechanics and its Applications, vol. 522, no. 15 May 2019, pp. 80-87, 2019.
Downloads
Published
How to Cite
Issue
Section
License
Please find the rights and licenses in Register: Jurnal Ilmiah Teknologi Sistem Informasi. By submitting the article/manuscript of the article, the author(s) agree with this policy. No specific document sign-off is required.
1. License
The non-commercial use of the article will be governed by the Creative Commons Attribution license as currently displayed on Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
2. Author(s)' Warranties
The author warrants that the article is original, written by stated author(s), has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author(s).
3. User/Public Rights
Register's spirit is to disseminate articles published are as free as possible. Under the Creative Commons license, Register permits users to copy, distribute, display, and perform the work for non-commercial purposes only. Users will also need to attribute authors and Register on distributing works in the journal and other media of publications. Unless otherwise stated, the authors are public entities as soon as their articles got published.
4. Rights of Authors
Authors retain all their rights to the published works, such as (but not limited to) the following rights;
Copyright and other proprietary rights relating to the article, such as patent rights,
The right to use the substance of the article in own future works, including lectures and books,
The right to reproduce the article for own purposes,
The right to self-archive the article (please read out deposit policy),
The right to enter into separate, additional contractual arrangements for the non-exclusive distribution of the article's published version (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal (Register: Jurnal Ilmiah Teknologi Sistem Informasi).
5. Co-Authorship
If the article was jointly prepared by more than one author, any authors submitting the manuscript warrants that he/she has been authorized by all co-authors to be agreed on this copyright and license notice (agreement) on their behalf, and agrees to inform his/her co-authors of the terms of this policy. Register will not be held liable for anything that may arise due to the author(s) internal dispute. Register will only communicate with the corresponding author.
6. Royalties
Being an open accessed journal and disseminating articles for free under the Creative Commons license term mentioned, author(s) aware that Register entitles the author(s) to no royalties or other fees.
7. Miscellaneous
Register will publish the article (or have it published) in the journal if the article’s editorial process is successfully completed. Register's editors may modify the article to a style of punctuation, spelling, capitalization, referencing and usage that deems appropriate. The author acknowledges that the article may be published so that it will be publicly accessible and such access will be free of charge for the readers as mentioned in point 3.