Deteksi Bot Spammer Twitter Berbasis Time Interval Entropy dan Global Vectors for Word Representations Tweet’s Hashtag
DOI:
https://doi.org/10.26594/register.v5i1.1382Keywords:
bot spammer, CNN, Glove, hashtag, Twitter,Abstract
Bot spammer merupakan penyalahgunaan user dalam menggunakan Twitter untuk menyebarkan pesan spam sesuai dengan keinginan user. Tujuan spam mencapai trending topik yang ingin dibuatnya. Penelitian ini mengusulkan deteksi bot spammer pada Twitter berbasis Time Interval Entropy dan global vectors for word representations (Glove). Time Interval Entropy digunakan untuk mengklasifikasi akun bot berdasarkan deret waktu pembuatan tweet. Glove digunakan untuk melihat co-occurrence kata tweet yang disertai Hashtag untuk proses klasifikasi menggunakan Convolutional Neural Network (CNN). Penelitian ini menggunakan data API Twitter dari 18 akun bot dan 14 akun legitimasi dengan 1.000 tweet per akunnya. Hasil terbaik recall, precision, dan f-measure yang didapatkan yaitu 100%; 100%, dan 100%. Hal ini membuktikan bahwa Glove dan Time Interval Entropy sukses mendeteksi bot spammer dengan sangat baik. Hashtag memiliki pengaruh untuk meningkatkan deteksi bot spammer.
Spam spammers are users' misuse of using Twitter to spread spam messages in accordance with user wishes. The purpose of spam is to reach the required trending topic. This study proposes detection of bot spammers on Twitter based on Time Interval Entropy and global vectors for word representations (Glove). Time Interval Entropy is used to classify bot accounts based on the tweet's time series, while glove views the co-occurrence of tweet words with Hashtags for classification processes using the Convolutional Neural Network (CNN). This study uses Twitter API data from 18 bot accounts and 14 legitimacy accounts with 1000 tweets per account. The best results of recall, precision, and f-measure were 100%respectively. This proves that Glove and Time Interval Entropy successfully detects spams, with Hash tags able to increase the detection of bot spammers.
References
Aditya, h. S., Hani’ah, M., Fitrawan, A. A., Arifin, A. Z., & Purwitasari, D. (2016). Deteksi Bot Spammer pada Twitter Berbasis Sentiment Analysis dan Time Interval Entropy. Jurnal Buana Informatika, 7(3).
Amleshwaram, A. A., Reddy, N., Yadav, S., Gu, G., & Yang, C. (2013). CATS: Characterizing automation of Twitter spammers. 2013 Fifth International Conference on Communication Systems and Networks (COMSNETS). Bangalore, India: IEEE.
Bindu, P. V., Mishra, R., & Thilagam, P. S. (2018). Discovering spammer communities in Twitter. Journal of Intelligent Information Systems, 51(3), 503–527.
Chu, Z., Gianvecchio, S., Wang, H., & Jajodia, S. (2012). Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? IEEE Transactions On Dependable And Secure Computing, 9(6), 811-824.
Daffa, W., Bamasag, O., & AlMansour, A. (2018). A Survey On Spam URLs Detection In Twitter. 2018 1st International Conference on Computer Applications & Information Security (ICCAIS). Riyadh, Saudi Arabia: IEEE.
Fields, J. D. (2016). Botnet Campaign Detection on Twitter. Utica, New York: SUNY Polytechnic Institute.
Kenter, T., Borisov, A., & Rijke, M. d. (2016, June 15). Siamese CBOW: Optimizing Word Embeddings for Sentence Representations. Retrieved from arXiv:1606.04640: https://arxiv.org/abs/1606.04640
Kuzi, S., Shtok, A., & Kurland, O. (2016). Query Expansion Using Word Embeddings. CIKM '16 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (pp. 1929-1932). Indianapolis, Indiana, USA: ACM.
Martinez-Romo, J., & Araujo, L. (2013). Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Systems with Applications, 40(8), 2992-3000.
Nguyen, P. T., & Takeda, H. (2016, May 14). Online learning for Social Spammer Detection on Twitter. Retrieved from arXiv: https://arxiv.org/abs/1605.04374
Pennington, J., Socher, R., & C. D. (2014). GloVe: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532-1543). Doha, Qatar: Association for Computational Linguistics.
Perdana, R. S., Muliawati, T. H., & Alexandro, R. (2015). Bot Spammer Detection In Twitter Using Tweet Similarity and Time Interval Entropy. Jurnal Ilmu Komputer dan Informasi, 8(1), 19-25.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61(January), 85-117.
Sedhai, S., & Sun, A. (2018). Semi-Supervised Spam Detection in Twitter Stream. IEEE Transactions On Computational Social Systems, 5(1), 169-175.
Yang, C., Harkreader, R. C., & Gu, G. (2011). Die Free or Live Hard? Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers. International Workshop on Recent Advances in Intrusion Detection (pp. 318-337). Berlin, Heidelberg: Springer.
Zhang, C. M., & Paxson, V. (2011). Detecting and Analyzing Automated Activity on Twitter. International Conference on Passive and Active Network Measurement (pp. 102-111). Berlin, Heidelberg: Springer.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2019 Register: Jurnal Ilmiah Teknologi Sistem Informasi
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Please find the rights and licenses in Register: Jurnal Ilmiah Teknologi Sistem Informasi. By submitting the article/manuscript of the article, the author(s) agree with this policy. No specific document sign-off is required.
1. License
The non-commercial use of the article will be governed by the Creative Commons Attribution license as currently displayed on Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
2. Author(s)' Warranties
The author warrants that the article is original, written by stated author(s), has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author(s).
3. User/Public Rights
Register's spirit is to disseminate articles published are as free as possible. Under the Creative Commons license, Register permits users to copy, distribute, display, and perform the work for non-commercial purposes only. Users will also need to attribute authors and Register on distributing works in the journal and other media of publications. Unless otherwise stated, the authors are public entities as soon as their articles got published.
4. Rights of Authors
Authors retain all their rights to the published works, such as (but not limited to) the following rights;
Copyright and other proprietary rights relating to the article, such as patent rights,
The right to use the substance of the article in own future works, including lectures and books,
The right to reproduce the article for own purposes,
The right to self-archive the article (please read out deposit policy),
The right to enter into separate, additional contractual arrangements for the non-exclusive distribution of the article's published version (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal (Register: Jurnal Ilmiah Teknologi Sistem Informasi).
5. Co-Authorship
If the article was jointly prepared by more than one author, any authors submitting the manuscript warrants that he/she has been authorized by all co-authors to be agreed on this copyright and license notice (agreement) on their behalf, and agrees to inform his/her co-authors of the terms of this policy. Register will not be held liable for anything that may arise due to the author(s) internal dispute. Register will only communicate with the corresponding author.
6. Royalties
Being an open accessed journal and disseminating articles for free under the Creative Commons license term mentioned, author(s) aware that Register entitles the author(s) to no royalties or other fees.
7. Miscellaneous
Register will publish the article (or have it published) in the journal if the article’s editorial process is successfully completed. Register's editors may modify the article to a style of punctuation, spelling, capitalization, referencing and usage that deems appropriate. The author acknowledges that the article may be published so that it will be publicly accessible and such access will be free of charge for the readers as mentioned in point 3.