For any educational project, it is important and challenging to know, at the moment of enrollment, whether a given student is likely to successfully pass the academic year. This task is not simple at all because many factors contribute to college failure. Being able to infer how likely is an enrolled student to present promotions problems, is undoubtedly an interesting challenge for the areas of data mining and education. In this paper, we propose the use of data mining techniques in order to predict how likely a student is to succeed in the academic year. Normally, there are more students that success than fail, resulting in an imbalanced data representation. To cope with imbalanced data, we introduce a new algorithm based on probabilistic Rough Set Theory (RST). Two ideas are introduced. The first one is the use of two different threshold values for the similarity between objects when dealing with minority or majority examples. The second idea combines the original distribution of the data with the probabilities predicted by the RST method. Our experimental analysis shows that we obtain better results than a range of state-of-the-art algorithms.
Imbalance classification is one of the most challenging research problems in machine learning. Techniques for two-class imbalance classification are relatively mature nowadays, yet multi-class imbalance learning is still an open problem. Moreover, the community lacks a suitable software tool that can integrate the major works in the field. In this paper, we present Multi-Imbalance, an open source software package for multi-class imbalanced data classification. It provides users with seven different categories of multi-class imbalance learning algorithms, including the latest advances in the field. The source codes and documentations for Multi-Imbalance are publicly available at https://github.com/chongshengzhang/Multi_Imbalance.