Analisis perbandingan algoritma Naïve Bayes, k-Nearest Neighbor dan Neural Network untuk permasalahan class-imbalanced data pada kasus credit card fraud dataset

Authors

  • Mita Yuanika Sahroni Universitas Islam Negeri Sunan Ampel
  • Niken Ayu Setifani Universitas Islam Negeri Sunan Ampel
  • Devinta Nurul Fitriana Universitas Islam Negeri Sunan Ampel

DOI:

https://doi.org/10.26594/teknologi.v11i2.2393

Abstract

The high public interest in transactions using credit cards in the banking sector has the potential for higher credit card fraud. This study uses a credit card fraud dataset that consisting of 284,807 data obtained from Kaggle. The dataset in this study is class-imbalanced data with a comparison between the major class of 99.8% and the minor class of 0.2%. This class-imbalanced data problem will be solved by applying undersampling. In order to determine the performance of the classification algorithm that is most suitable for solving class-imbalanced data problems, a comparison of the Naïve Bayes, k-Nearest Neighbor (kNN) and Neural Network algorithms will be carried out. The t-test in this study was conducted to determine the significance of differences between algorithms. Algorithm performance evaluation uses accuracy and AUC (area under the curve) values. The test results in this study is Neural Network has better performance than other algorithms because it has the highest accuracy value of 93.59% and AUC value of 0.977. Based on the t-test results, the Neural Network with k-NN has a significant difference, in contrast to the Neural Network with Naïve Bayes there is no significant difference.

Author Biographies

Mita Yuanika Sahroni, Universitas Islam Negeri Sunan Ampel

Sistem Informasi

Niken Ayu Setifani, Universitas Islam Negeri Sunan Ampel

Sistem Informasi

Devinta Nurul Fitriana, Universitas Islam Negeri Sunan Ampel

Sistem Informasi

References

Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446-3453.

Chamidah, N., Santoni, M. M., & Matondang, N. (2020). Pengaruh Oversampling pada Klasifikasi Hipertensi dengan Algoritma Naïve Bayes, Decision Tree, dan Artificial Neural Network (ANN). Jurnal RESTI (Rekayasa Sistem dan Teknologi, 4(4), 635-641.

Devita, R. N., Herwanto, H. W., & Wibawa, A. P. (2018). Perbandingan Kinerja Metode Naive Bayes dan K-Nearest Neighbor untuk Klasifikasi Artikel Berbahasa Indonesia. Jurnal Teknologi Informasi dan Ilmu Komputer, 5(4), 427-434.

Hairani, H., Saputro, K. E., & Fadli, S. (2020). K-means-SMOTE untuk menangani ketidakseimbangan kelas dalam klasifikasi penyakit diabetes dengan C4.5, SVM, dan naive Bayes. Jurnal Teknologi dan Sistem Komputer, 8(2), 89-93.

Jayadianti, H., Cahyadi, T. A., Amri, N. A., & Pitayandanu, M. F. (2020). Metode Komparasi Artificial Neural Network pada Prediksi Curah Hujan - Literature Review. Jurnal Tekno Insentif, 14(2), 48-53.

Karni, S. (2000). Auditing: Audit Khusus & Audit Forensik dalam praktik. Jakarta: Universitas Indonesia.

Karyono, K. (2013). Forensic Fraud. Andi: Yogyakarta.

Khatri, S., Arora, A., & Agrawal, A. P. (2020). Supervised Machine Learning Algorithms for Credit Card Fraud Detection: A Comparison. 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) (pp. 680-683). Noida, India: IEEE.

Larose, D. T., & Larose, C. D. (2014). Discovering Knowledge in Data: An Introduction to Data Mining. John Wiley & Sons.

Prasetio, R. T., & Pratiwi, P. (2015). Penerapan Teknik Bagging pada Algoritma Klasifikasi untuk Mengatasi Ketidakseimbangan Kelas Dataset Medis. Jurnal Informatika, 2(2), 395-403.

Rahman, A., Rahmat, F., Fariqi, M. Y., & Adi, S. (2020). Metode Naive Bayes untuk Menganalisis Akurasi Sentimen Komentar di Youtube. Jurnal EECCIS (electrics, electronics, communications, controls, informatics, systems), 14(1), 31–34.

Religia, Y., Nugroho, A., & Hadikristanto, W. (2021). Analisis Perbandingan Algoritma Optimasi pada Random Forest untuk Klasifikasi Data Bank Marketing. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 5(1), 187-192.

Shin, H., & Cho, S. (2006). Response modeling with support vector machines. Expert Systems with Applications, 30(4), 746-760.

Shirodkar, N., Mandrekar, P., Mandrekar, R. S., Sakhalkar, R., Kumar, K. M., & Aswale, S. (2020). Credit Card Fraud Detection Techniques – A Survey. International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE). Vellore, India: IEEE.

Zareapoor, M., Seeja, K. R., & Alam, M. A. (2012). Analysis of Credit Card Fraud Detection Techniques: based on Certain Design Criteria. International Journal of Computer Applications, 52(3), 35-42.

Downloads

Additional Files

Published

2021-06-09

Issue

Section

Articles