K Nearest Neighbor OveRsampling Approach: An Open Source Python Package for Data Augmentation
Published in Software Impacts, 2022
This paper presents the K Nearest Neighbor OveRsampling (KNNOR) algorithm, a groundbreaking data augmentation technique designed to tackle the persistent issue of imbalanced datasets. Implemented as an open-source Python package, KNNOR considers the distribution of data and utilizes the k nearest neighbors to generate artificial data points that significantly improve classifier accuracy. The algorithm includes robust techniques to identify critical data points, ensuring that the augmented data does not introduce noise or outliers.
KNNOR has demonstrated superior performance compared to state-of-the-art augmentation algorithms, particularly in health datasets where data imbalance is prevalent. Its applicability extends to various domains, including image datasets with lower dimensions, making it a versatile tool for data scientists and researchers dealing with imbalanced data.
Recommended citation: Islam, A., Belhaouari, S. B., Rehman, A. U., & Bensmail, H. (2022). "K Nearest Neighbor OveRsampling Approach: An Open Source Python Package for Data Augmentation." Software Impacts, 12, 100272. https://doi.org/10.1016/j.simpa.2022.100272
Download Paper | Download Slides