The Application of Modified K-Nearest Neighbor Algorithm for Classification of Groundwater Quality Based on Image Processing and pH, TDS, and Temperature Sensors


  • Hasna Shafa Amalia Institut Teknologi Telkom Purwokerto
  • Ummi Athiyah Institut Teknologi Telkom Purwokerto
  • Arif Wirawan Muhammad Universiti Tun Hussein Onn Malaysia



image processing, Modified K-Nearest Neighbor, classification, groundwater


The limited availability of water in remote areas makes rural communities pay less attention to the water quality they use. Water quality analysis is needed to determine the level of groundwater quality used using the Modified K-Nearest Neighbor Algorithm to minimize exposure to a disease. The data used in this study was images combined with sensor data obtained from pH (Potential of Hydrogen), TDS (Total Dissolved Solids) sensors and Temperature Sensors. The test used the Weight voting value as the highest class majority determination and was evaluated using the K-Fold Cross Validation and Multi Class Confusion Matrix algorithms, obtaining the highest accuracy value of 78% at K-Fold = 2, K-Fold = 9, and K- Fold = 10. Meanwhile, the results of testing the effect of the K value obtained the highest accuracy value at K = 5 of 67.90% with a precision value of 0.32, 0.37 recall, and 0.33 F1-Score. From the results of the tests carried out, it can be concluded that most of the water conditions are suitable for use.

Author Biographies

Hasna Shafa Amalia, Institut Teknologi Telkom Purwokerto

Department of Informatics Engineering

Ummi Athiyah, Institut Teknologi Telkom Purwokerto

Department of Data Science

Arif Wirawan Muhammad, Universiti Tun Hussein Onn Malaysia

Department of Information Security and Web Technology


World Health Organization, Water for health: taking charge, World Health Organization (WHO), 2001.

Zamroni A., Trisnaning P.T., Prasetya H.N.E., Sagala S.T., and Putra A.S. (2022). Geochemical Characteristics and Evaluation of the Groundwater and Surface Water in Limestone Mining Area around Gunungkidul Regency, Indonesia. The Iraqi Geological Journal, 189-198.

P. Rekha, K. Sumathi, S. Samyuktha, A. Saranya, G. Tharunya and R. Prabha, "Sensor Based Waste Water Monitoring for Agriculture Using IoT," in 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), 2020.

S. Nashif, R. Raihan, R. Islam and M. H. Imam, "Heart disease detection by using machine learning algorithms and a real-time cardiovascular health monitoring system," World Journal of Engineering and Technology, vol. 6, no. 4, pp. 854-873, 2018.

Boateng, E. , Otoo, J. and Abaye, D. (2020) Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review. Journal of Data Analysis and Information Processing, 8, 341-357. doi: 10.4236/jdaip.2020.84020.

H. Shahabi et al., “Flood Detection and Susceptibility Mapping Using Sentinel-1 Remote Sensing Data and a Machine Learning Approach: Hybrid Intelligence of Bagging Ensemble Based on K-Nearest Neighbor Classifier,” Remote Sensing, vol. 12, no. 2, p. 266, Jan. 2020, doi: 10.3390/rs12020266.

Okfalisa, I. Gazalba, Mustakim and N. G. I. Reza, "Comparative analysis of k-nearest neighbor and modified k-nearest neighbor algorithm for data classification," in 017 2nd international conferences on information technology, information systems and electrical engineering (ICITISEE), 2017.

Y. Lee, A. Scolari, B.-G. Chun, M. D. Santambrogio, M. Weimer, and M. Interlandi, "Pretzel: Opening the Black Box of Machine Learning Prediction Serving Systems," in Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI '18), Carlsbad, CA, USA, Oct. 8-10, 2018.

B. G. Marcot and A. M. Hanea, "What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?," Computational Statistics, vol. 36, no. 3, pp. 2009-2031, 2021,

A. A. Nababan, M. Khairi, and B. S. Harahap, “Implementation of K-Nearest Neighbors (KNN) Algorithm in Classification of Data Water Quality”, Mantik, vol. 6, no. 1, pp. 30-35, Mar. 2022.

R. I. Perwira, B. Yuwono, R. I. P. Siswoyo, F. Liantoni and H. Himawan, "Effect of information gain on document classification using k-nearest neighbor," Register: Jurnal Ilmiah Teknologi Sistem Informasi, vol. 8, no. 1, pp. 50-57, 2022.

N. Radhakrishnan and A.S. Pillai, "Comparison of Water Quality Classification Models using Machine Learning," in Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES), 2020, pp. 409-413.

C.-M. Hsu, C.-C. Hsu, Z.-M. Hsu, F.-Y. Shih, M.-L. Chang, and T.-H. Chen, “Colorectal Polyp Image Detection and Classification through Grayscale Images and Deep Learning,” Sensors, vol. 21, no. 18, p. 5995, Sep. 2021, doi: 10.3390/s21185995.

V. Stimper, S. Bauer, R. Ernstorfer, B. Schölkopf, and R.P. Xian, "Multidimensional Contrast Limited Adaptive Histogram Equalization," IEEE Access, vol. 7, pp. 150834-150846, 2019, doi: 10.1109/ACCESS.2019.2952899.

S. Nalband, C.A. Valliappan, A. Prince, and A. Agrawal, "Time-frequency based feature extraction for the analysis of vibroarthographic signals," Comput. Electr. Eng., vol. 67, pp. 196-208, Jul. 2018, doi: 10.1016/j.compeleceng.2018.02.009.

M. M. Ghazala and A. Hammad, "Application of knowledge discovery in database (KDD) techniques in cost overrun of construction projects," International Journal of Construction Management, vol. 22, no. 9, pp. 1632-1646, 2022.

S. M. Ayyad, A. I. Saleh and L. M. Labib, "Gene expression cancer classification using modified K-Nearest Neighbors technique," Biosystems, vol. 176, pp. 41-51, 2019.

M. Faisal, E.M. Zamzami, and Sutarman, "Comparative Analysis of Inter-Centroid K-Means Performance using Euclidean Distance, Canberra Distance and Manhattan Distance," J. Phys.: Conf. Ser., vol. 1566, article 012112, Nov. 2019, doi: 10.1088/1742-6596/1566/1/012112.

V. C. Osamor and A. F. Okezie, "Enhancing the weighted voting ensemble algorithm for tuberculosis predictive diagnosis," Scientific Reports, vol. 11, article 14806, Jul. 2021, doi: 10.1038/s41598-021-94279-w.

I. Markoulidakis, I. Rallis, I. Georgoulas, G. Kopsiaftis, A. Doulamis, and N. Doulamis, “Multiclass Confusion Matrix Reduction Method and Its Application on Net Promoter Score Classification Problem,” Technologies, vol. 9, no. 4, p. 81, Nov. 2021, doi: 10.3390/technologies9040081.




How to Cite

H. S. Amalia, U. Athiyah, and A. W. Muhammad, “The Application of Modified K-Nearest Neighbor Algorithm for Classification of Groundwater Quality Based on Image Processing and pH, TDS, and Temperature Sensors”, Register: Jurnal Ilmiah Teknologi Sistem Informasi, vol. 9, no. 1, pp. 42–54, Mar. 2023.