1. Introduction
Classification is, at its core, a deceptively simple task: assign an unseen instance to one of several predefined categories based on patterns learned from labeled training data. In carefully controlled settings — balanced benchmarks with clean features — most modern algorithms perform reasonably well. But real-world data rarely cooperates. Datasets in domains such as credit fraud, rare disease detection, network intrusion, and mechanical fault prediction are almost always skewed, sometimes dramatically so: the class of primary interest may represent fewer than one in forty instances (Farid et al., 2016; Farid et al., 2014; Farid et al., 2013). Under these conditions, standard classifiers tend to do the safe thing — predict the majority class almost exclusively — and still report impressive overall accuracy. The minority class, which often carries the most practical significance, is quietly ignored.
Two broad categories of solutions have emerged in response to this problem: internal methods, which modify the learning algorithm itself to be less sensitive to distributional skew, and external methods, which adjust the dataset prior to training. Among external approaches, sampling-based techniques are perhaps the most widely adopted. Under-sampling methods reduce majority class instances either randomly or through heuristic selection — approaches such as the neighborhood cleaning rule (Laurikkala, 2001), near miss (Mani & Zhang, 2003), and one-sided selection (Kubat & Matwin, 1997) are well-established examples. Over-sampling works in the opposite direction, generating additional minority instances; SMOTE (Chawla et al., 2002) and AdaSyn (He & Garcia, 2009) remain among the most cited techniques in this space. Neither approach is without cost: under-sampling risks discarding genuinely informative majority-class examples, while over-sampling can lead to overfitting when synthetic minority instances too closely cluster around existing ones (Sun et al., 2015).
The challenge is compounded further when features interact in complex ways. Traditional classifiers — k-nearest neighbors, decision trees, support vector machines, random forests (Farid et al., 2014; Cortes & Vapnik, 1995; Liaw & Wiener, 2002) — optimize for global accuracy, meaning they are structurally blind to the asymmetric cost of misclassifying minority instances. Cost-sensitive learning attempts to address this by assigning differential penalties to errors across classes, but calibrating appropriate misclassification weights is non-trivial and often requires domain-specific knowledge that may not be available in practice.
Ensemble methods — and boosting in particular — have attracted considerable attention as an alternative path. The core idea in boosting is to adaptively reweight training instances, forcing successive weak learners to concentrate on the examples that earlier learners found most difficult (Freund et al., 1996). This adaptive focus turns out to be surprisingly well-suited to imbalanced problems, even though boosting was not originally designed with class imbalance in mind. Building on this observation, a number of hybrid approaches have been developed: RUSBoost integrates random under-sampling with AdaBoost (Seiffert et al., 2010); SMOTEBoost couples AdaBoost with SMOTE-based over-sampling (Chawla et al., 2003); EUSBoost incorporates evolutionary under-sampling (Galar et al., 2013); DataBoost-IM generates synthetic difficult instances during training (Guo & Viktor, 2004); and Easy Ensemble creates multiple balanced subsets via repeated random under-sampling (Liu et al., 2009). Each of these approaches extends AdaBoost in a different direction, and each carries its own limitations.
One aspect of boosting design that has received comparatively less attention is the choice of base learner itself — and more specifically, whether using a single fixed weak learner throughout training is actually optimal. De Souza and Matwin (2011) explored the use of multiple alternating estimators within AdaBoost, finding that the diversity introduced by varying the base classifier could benefit ensemble performance. However, their implementation included computationally intensive models such as Random Forests, SVMs, and neural networks, making the overall framework substantially more expensive to train.
This paper proposes KMFusionNet, a hybrid boosting framework that addresses this gap directly. Rather than relying on a single weak estimator, KMFusionNet alternates between two structurally complementary tree-based classifiers — the C4.5 Decision Tree (Quinlan, 1996) and the Extra Tree (Geurts et al., 2006) — within an AdaBoost-style architecture. The alternating strategy is intended to exploit the well-documented instability of tree-based learners: because small perturbations to training data can produce substantially different tree structures, combining two tree algorithms that partition feature space differently should increase hypothesis diversity without requiring computationally expensive models. The framework is further stabilized through an early stopping criterion based on validation auROC (Bühlmann & Yu, 2003; Jiang, 2004), which prevents unnecessary estimator accumulation and guards against overfitting.
The proposed method was benchmarked against AdaBoost, RUSBoost, SMOTEBoost, EUSBoost, DataBoost, and Easy Ensemble across 12 imbalanced datasets from the KEEL repository (Alcalá-Fdez et al., 2011), spanning imbalance ratios from 1.87 to 41.03. Evaluation relied on auROC, a threshold-independent metric that is particularly informative under class imbalance (He & Garcia, 2009). Experimental results indicate that KMFusionNet achieves superior or competitive auROC performance on most benchmark datasets — with the most pronounced advantages at higher imbalance ratios — while remaining computationally more tractable than approaches that incorporate complex base learners.
The remainder of this article is organized as follows. Section 2 reviews relevant prior work on imbalanced learning, boosting hybrids, and ensemble diversity. Section 3 describes the KMFusionNet algorithm in detail. Section 4 presents the experimental setup, datasets, evaluation protocol, and comparative results. Section 5 discusses findings in the broader context of the literature. Section 6 concludes with a summary and directions for future work.