Data Modeling

Mathematical and Computational Data Modeling
0
Citations
937
Views
3
Articles
Your new experience awaits. Try the new design now and help us make it even better
Switch to the new experience
Figures and Tables
RESEARCH ARTICLE   (Open Access)

Early Prediction of Postprandial Glycemic Response in Gestational Diabetes Using Continuous Glucose Monitoring and Gradient Boosting Models

Akib Ullah Jafor 1*, Pronati Das Puja 2, Shawanti Biswas 3, Arko Saha 4, Kamruzzaman Mithu 1, Khondaker Abdullah-Al-Mamun 1

+ Author Affiliations

Data Modeling 7 (1) 1-8 https://doi.org/10.25163/data.7110742

Submitted: 26 January 2026 Revised: 10 April 2026  Published: 18 April 2026 


Abstract

There is, perhaps, an increasing realization that gestational diabetes mellitus (GDM) cannot be fully understood through static glucose measurements alone. Rather, it unfolds dynamically—particularly in response to meals—where postprandial glycemic fluctuations carry meaningful clinical implications. In this study, we aimed to develop a predictive modeling framework capable of estimating postprandial glycemic response (PPGR) using Continuous Glucose Monitoring (CGM) data integrated with meal and behavioral features. A dataset comprising 235 pregnant participants, including both GDM and healthy controls, was analyzed. CGM signals were synchronized with meal logs to derive clinically relevant outcome variables, including peak glucose (BGMax), glucose rise (BGRise), glucose at 60 minutes (BG60), and incremental area under the curve (iAUC120). Gradient boosting models were developed and evaluated using five-fold cross-validation and independent testing. The optimized model demonstrated strong predictive performance, achieving a coefficient of determination (R²) of 0.82 and a root mean squared error (RMSE) of 0.68 mmol/L for iAUC120 prediction (Table 3). Feature importance analysis revealed that prior meal composition, glycemic index, and temporal meal spacing were among the most influential predictors (Figure 3). Models incorporating contextual behavioral features consistently outperformed baseline physiological-only models. These findings suggest that integrating CGM data with contextual dietary information can substantially improve the prediction of postprandial glucose dynamics in GDM. While further validation is required, the proposed approach offers a step toward more personalized, data-driven management strategies in pregnancy-related metabolic care.

Keywords: Gestational diabetes mellitus; continuous glucose monitoring; postprandial glycemic response; machine learning; gradient boosting; iAUC120; predictive modeling; maternal health

1. Introduction

There is, perhaps, a quiet but increasingly important shift in how we think about disease detection—one that moves from reactive diagnosis toward anticipatory insight. In metabolic disorders such as diabetes, this shift feels particularly urgent. Small, often unnoticed changes in glucose regulation can, over time, cascade into significant clinical complications. Among these conditions, gestational diabetes mellitus (GDM) occupies a uniquely sensitive space. It is transient in duration, yet its implications—affecting both maternal health and fetal development—can be profound and, at times, lasting.

Traditionally, GDM management has relied on periodic glucose measurements and standardized diagnostic thresholds. While these methods have undoubtedly improved clinical outcomes, they remain, in a sense, snapshots of a far more dynamic physiological process. Postprandial glucose fluctuations—those occurring after meals—have been identified as especially critical, reflecting how the body responds in real time to dietary intake and metabolic stress. The International Diabetes Federation has emphasized the importance of controlling post-meal glucose excursions, linking them to both immediate complications and long-term metabolic risk (Gallwitz, 2009). Yet, despite this recognition, predicting these fluctuations with sufficient accuracy remains a persistent and somewhat unresolved challenge.

A closer look at the literature suggests that this difficulty is not merely technical but conceptual. Glycemic responses are shaped by a complex interplay of factors—nutritional composition, prior dietary intake, hormonal regulation, physical activity, and even circadian rhythms. In pregnant individuals, these interactions are further complicated by physiological changes that alter insulin sensitivity and glucose metabolism. As a result, simple predictive frameworks often fall short, unable to capture the nuanced variability inherent in real-world conditions.

In response to these complexities, machine learning (ML) has gradually emerged as a promising analytical paradigm. Unlike traditional statistical approaches, ML models are capable of handling high-dimensional data and uncovering nonlinear relationships between variables. Among these, ensemble methods—particularly gradient boosting algorithms—have demonstrated notable success in predicting blood glucose dynamics. For instance, Alfian et al. (2020) showed that gradient boosting techniques could effectively model glucose variability in diabetic populations, achieving high predictive accuracy by leveraging temporal and contextual features. Such findings suggest that ML approaches may offer a pathway toward more personalized and adaptive models of glycemic prediction.

At the same time, advances in data acquisition technologies have significantly expanded the scope of what can be modeled. Continuous Glucose Monitoring (CGM), in particular, provides high-frequency, real-time glucose measurements, enabling a far more granular understanding of glycemic trends. When combined with mobile-based data—such as meal logs and behavioral inputs—CGM datasets create a rich, multidimensional view of patient physiology. Studies by Pustozerov et al. (2018a, 2018b) have explored this integration, demonstrating the feasibility of mobile-based decision support systems for GDM management. These systems, by incorporating real-time data streams, begin to move beyond static prediction toward dynamic, context-aware guidance.

Yet, despite these encouraging developments, the existing body of research reveals several limitations that are difficult to overlook. Many studies report strong predictive performance but offer limited transparency regarding their methodological pipelines. Critical aspects—such as data preprocessing, feature engineering, and hyperparameter optimization—are often described only briefly, if at all. This lack of detail not only complicates reproducibility but also raises questions about the generalizability of reported results. Furthermore, a substantial portion of the literature has focused on type 1 or type 2 diabetes, leaving GDM-specific modeling comparatively underexplored. Given the distinct physiological and clinical context of pregnancy, this gap is not trivial.

There are also, perhaps, subtler limitations related to how prediction itself is conceptualized. Much of the earlier work treats glucose prediction as a purely numerical task, emphasizing accuracy metrics while paying less attention to interpretability or clinical relevance. However, emerging evidence suggests that incorporating dietary context—such as glycemic index and glycemic load—can significantly enhance model performance and applicability. Pustozerov et al. (2020), for example, highlighted the role of meal composition in shaping postprandial glycemic responses, suggesting that predictive models benefit from a more holistic representation of patient behavior.

Parallel to these developments, other studies have explored broader diagnostic frameworks for GDM. Research utilizing datasets such as the PIMA Indian Diabetes dataset has demonstrated the feasibility of applying various ML algorithms—including support vector machines and neural networks—for disease classification (Gnanadass, 2020). While these approaches provide valuable insights, they often rely on static clinical features and lack the temporal depth offered by CGM-based models. Similarly, earlier clinical studies have examined the relationship between maternal glycemia and fetal outcomes, identifying critical thresholds associated with adverse effects (Wender-Ozegowska et al., 2005). These findings underscore the importance of early and accurate detection, yet they stop short of offering predictive tools capable of real-time application.

Taken together, the literature points toward a field that is both promising and, in some respects, still evolving. There is clear evidence that machine learning, when combined with high-resolution physiological data, can enhance our ability to model and predict glycemic behavior. At the same time, there remains a need for approaches that are not only accurate but also transparent, reproducible, and clinically meaningful.

Against this backdrop, the present study seeks to contribute to the ongoing development of predictive modeling in GDM by integrating CGM data with contextual meal and behavioral information. By employing gradient boosting techniques and systematically evaluating feature contributions, this work aims to move—perhaps cautiously, but deliberately—toward a more comprehensive understanding of postprandial glycemic dynamics. In doing so, it also attempts to address some of the methodological gaps identified in prior research, particularly in relation to data handling and model transparency.

Ultimately, while predictive models are unlikely to replace clinical judgment, they may serve as valuable adjunct tools—offering insights that are both data-driven and personalized. And in a condition as delicate as gestational diabetes, even incremental improvements in prediction could, over time, translate into meaningful gains in maternal and neonatal health outcomes.

2. Methods

In approaching the development of a predictive framework for gestational diabetes mellitus (GDM), we found it necessary—perhaps even unavoidable—to balance methodological rigor with practical transparency. Machine learning studies in clinical contexts often report strong performance, yet the pathways leading to those results are not always described with sufficient clarity. With this in mind, the present study was designed to follow reproducible modeling principles, while also incorporating elements of established reporting frameworks such as MI-CLAIM (Minimum Information About Clinical Artificial Intelligence Modeling).

2.1 Study Design and Data Source

This study utilized a secondary dataset derived from Continuous Glucose Monitoring (CGM) records, publicly available through the Kaggle platform, originally associated with the GEM-GDM study. The dataset comprised glucose measurements from pregnant women diagnosed with GDM, alongside a control group of healthy pregnant individuals. Although secondary in nature, the dataset offered a relatively rich temporal structure, capturing glucose fluctuations at frequent intervals over a one-week monitoring period.

A total of 235 participants were included in the analysis. Participants diagnosed with GDM were further stratified into two subgroups based on glycemic targets—those following strict glycemic control protocols and those under less stringent management. This stratification reflects clinical practice, where treatment intensity may vary depending on individual risk profiles and therapeutic goals (Popova et al., 2016). Baseline characteristics, including body mass index (BMI), blood pressure (BP), and demographic variables, were summarized using mean and standard deviation, with group comparisons performed using analysis of variance (ANOVA) where appropriate.

2.2 Data Acquisition and Preprocessing

Each participant underwent CGM recording for approximately seven consecutive days. During this period, individuals recorded meal-related information using a mobile-based logging system, including meal timing, composition, and portion size. The synchronization of CGM data with meal records was achieved through a timestamp-based matching algorithm, which aligned glucose readings with corresponding meal events within a defined temporal window.

Data preprocessing was conducted using Python (version 3.7), primarily leveraging the scikit-learn library. Initial preprocessing steps included handling missing values, removing physiologically implausible glucose readings, and smoothing minor sensor noise where necessary. Given the known variability in CGM signals, particular care was taken to preserve meaningful fluctuations while minimizing artifacts.

Feature engineering was guided by both clinical relevance and prior literature. Key features included glycemic index-related variables, prior meal effects, and temporal indicators such as time since last meal. The importance of these contextual features has been emphasized in previous studies, where glycemic load and dietary composition were shown to significantly influence postprandial glucose responses (Pustozerov et al., 2020). In addition, demographic and physiological variables were incorporated to account for inter-individual variability.

2.3 Outcome Variables

The primary outcomes of interest were derived from postprandial glycemic response (PPGR) metrics, which provide a more nuanced characterization of glucose dynamics following meal intake. Specifically, four quantitative measures were extracted:

  • BGMax: Peak blood glucose level following meal initiation (mmol/L)
  • iAUC120: Incremental area under the glucose curve over 120 minutes post-meal (mmol/L·h)
  • BGRise: Absolute increase in glucose level from baseline to peak (mmol/L)
  • BG60: Blood glucose level at 60 minutes post-meal (mmol/L)

Among these, iAUC120 was selected as the primary predictive target, as it is widely regarded as a robust indicator of postprandial glycemic exposure (Oviedo et al., 2017). This metric captures both the magnitude and duration of glucose excursions, making it particularly relevant for assessing metabolic risk.

2.4 Model Development and Evaluation

To model postprandial glucose responses, we employed a gradient boosting framework, selected for its ability to capture nonlinear relationships and complex feature interactions. Ensemble-based methods, including gradient boosting, have demonstrated strong performance in glucose prediction tasks, particularly when working with temporally structured data (Alfian et al., 2020).

The dataset was partitioned into training (80%) and testing (20%) subsets using stratified sampling to preserve class distribution. Within the training set, a five-fold cross-validation procedure was implemented to optimize model hyperparameters, including learning rate, tree depth, and number of estimators. Hyperparameter tuning was conducted using grid search, with performance evaluated based on mean squared error (MSE) and coefficient of determination (R²).

To ensure robustness, multiple model configurations were tested, incorporating different feature subsets. This iterative approach allowed us to assess the contribution of contextual variables—such as meal composition and prior intake—relative to baseline physiological features. Model performance was evaluated on the independent test set, with additional diagnostic analysis performed to identify potential overfitting.

2.5 Quality Assessment and Reporting Framework

In an effort to enhance methodological transparency, elements of the MI-CLAIM framework were incorporated into the study design. Six core domains were considered: study design, data preparation, model development, performance evaluation, model examination, and reproducibility. Each domain was assessed using a simplified scoring system, where components were classified as “complete,” “incomplete,” or “not applicable.” While this assessment was not exhaustive, it provided a structured means of evaluating reporting completeness and identifying potential gaps.

2.6 Ethical Considerations and Limitations

As the study utilized publicly available, anonymized data, formal ethical approval was not required. However, it is worth noting that secondary datasets may carry inherent limitations, including potential selection bias and incomplete contextual information. These factors were considered during analysis and are addressed further in the discussion.

3. Results

There is something subtly revealing about how glucose behaves after a meal—it does not simply rise and fall, but rather reflects a complex interplay of physiology, behavior, and timing. In examining this interplay, our models began to show patterns that were, at times, expected, yet occasionally surprising.

The overall modeling framework is illustrated in (Figure 1), highlighting the sequential stages of data preprocessing, feature engineering, model training, and evaluation. Baseline characteristics of the study population are summarized in (Table 1), showing comparable demographic distributions across groups, with the exception of systolic blood pressure, which differed significantly (P < 0.05).

The clinical relevance of gestational diabetes and its associated maternal–fetal complications are presented in (Figure 2), reinforcing the importance of early detection and accurate glycemic prediction.

3.1 Model Performance

The gradient boosting model demonstrated robust predictive capability across all evaluated PPGR outcomes. For the primary endpoint, iAUC120, the model achieved an R² of 0.82, indicating that a substantial proportion of variability in postprandial glucose exposure could be explained by the selected features. The RMSE was 0.68 mmol/L, suggesting relatively low prediction error (Table 3).

For secondary outcomes:

  • BGMax prediction yielded an R² of 0.79
  • BG60 prediction achieved an R² of 0.76
  • BGRise showed slightly lower performance (R² = 0.72), possibly reflecting greater physiological variability

Interestingly, performance improvements were consistently observed when contextual features were included, reinforcing the notion that glucose response is not purely biochemical but context-dependent.

3.2 Feature Importance Analysis

Feature importance analysis revealed a somewhat intuitive—but still important—hierarchy of predictors (Figure 3).

The most influential features included:

  • Glycemic index of meals
  • Previous meal composition
  • Time interval between meals
  • Baseline glucose level
  • Body mass index (BMI)

It is worth noting that prior meal composition—often overlooked in simpler models—emerged as one of the strongest predictors. This finding aligns with emerging evidence suggesting that metabolic memory and cumulative dietary effects influence glycemic response (Pustozerov et al., 2020).

3.3 Comparative Model Evaluation

To better understand the contribution of feature categories, we compared three model configurations (Table 2). The improvement from baseline to full model was notable, suggesting that behavioral and dietary variables provide meaningful predictive value.

3.4 Model Robustness

Cross-validation results showed consistent performance across folds, with minimal variance (±0.03 R²), indicating model stability. No significant overfitting was observed when comparing training and test performance.

4. Discussion

At first glance, the findings may appear to reinforce what is already known—that diet influences glucose levels. Yet, the depth of this influence, and the extent to which it can be modeled, perhaps deserves closer attention.

The structured modeling pipeline shown in (Figure 1) underscores the importance of systematic data preprocessing and feature selection in achieving robust predictive performance. This stepwise approach likely contributed to the observed improvements in model accuracy.

Furthermore, the clinical implications illustrated in (Figure 2) highlight why precise prediction of postprandial glycemic response is not merely a technical task but a clinically meaningful objective, particularly in preventing adverse outcomes such as macrosomia and pre-eclampsia.

Baseline population characteristics (Table 1) suggest that the study cohort was relatively balanced across groups, supporting the internal validity of the modeling results.

4.1 Interpretation of Findings

The strong predictive performance of the gradient boosting model suggests that postprandial glycemic response in GDM is, to a meaningful degree, predictable when sufficient contextual information is available. This aligns with prior studies demonstrating the effectiveness of ensemble-based machine learning approaches in glucose prediction (Alfian et al., 2020).

However, what stands out is not merely the accuracy, but the relative importance of behavioral features. The influence of prior meal composition and glycemic index supports earlier findings that dietary context significantly shapes glucose dynamics (Pustozerov et al., 2020). In a way, the model appears to capture a form of metabolic continuity—where each meal is not an isolated event, but part of an ongoing physiological narrative.

4.2 Comparison with Existing Literature

Previous research has explored glucose prediction using CGM data, particularly in type 1 diabetes populations (Oviedo et al., 2017). While these studies achieved reasonable predictive accuracy, they often lacked integration of real-world behavioral inputs.

Similarly, mobile-based decision support systems for GDM have shown promise (Pustozerov et al., 2018a; 2018b), yet many rely on simplified models or limited feature sets. Our findings extend this work by demonstrating that incorporating multi-dimensional data—especially meal-related variables—can significantly enhance model performance.

Moreover, earlier clinical studies emphasized the importance of glycemic control in reducing adverse outcomes such as macrosomia (Wender-Ozegowska et al., 2005). Our results suggest that predictive modeling may support this goal by enabling earlier and more precise intervention.

4.3 Clinical Implications

From a clinical perspective, these findings hint at a shift—from reactive monitoring toward predictive guidance. If integrated into mobile health systems, such models could provide real-time feedback to patients, helping them anticipate and manage glycemic excursions more effectively.

That said, the transition from model to clinical tool is not trivial. Issues of interpretability, user compliance, and validation across diverse populations remain critical considerations.

4.4 Limitations

Several limitations should be acknowledged. The dataset, while rich, was relatively modest in size and derived from a secondary source. Additionally, variability in meal reporting may introduce measurement bias. Finally, external validation on independent cohorts is needed to confirm generalizability.

4.5 Future Directions

Future work may explore deep learning approaches, integration with wearable devices, and real-time deployment in clinical settings. There is also scope for incorporating genomic or hormonal data, which may further refine predictive accuracy.

5. Conclusion

In this study, we sought to better understand—and perhaps more importantly, to anticipate—the dynamics of postprandial glycemic response in gestational diabetes mellitus (GDM). By integrating Continuous Glucose Monitoring (CGM) data with contextual dietary and behavioral features, the proposed gradient boosting model demonstrated strong predictive performance, achieving an R² of 0.82 and an RMSE of 0.68 mmol/L for iAUC120 estimation.

These findings suggest that postprandial glycemic variability in GDM is not only measurable but, to a meaningful extent, predictable when sufficient contextual information is incorporated. Notably, features such as prior meal composition, glycemic index, and meal timing emerged as key determinants—highlighting the importance of behavioral and temporal factors alongside physiological measurements. In contrast to earlier models that relied primarily on static clinical variables, this approach offers a more dynamic and context-aware framework for glucose prediction.

From a clinical perspective, the ability to anticipate glycemic excursions may represent a meaningful step toward more proactive and personalized management strategies during pregnancy. While such models are not intended to replace clinical judgment, they may serve as valuable decision-support tools, particularly when integrated into mobile health platforms.

Nevertheless, several limitations warrant consideration. The relatively modest sample size, reliance on secondary data, and potential variability in self-reported meal records may affect generalizability. External validation across larger and more diverse cohorts remains essential before clinical translation can be fully realized.

Future research may benefit from incorporating multimodal data sources, including hormonal, genetic, and real-time behavioral inputs, as well as exploring deep learning architectures for further performance enhancement. Ultimately, even incremental improvements in predictive accuracy may translate into clinically meaningful benefits—supporting earlier interventions, improved glycemic control, and better maternal and neonatal outcomes.

Author Contributions

A.U.J. conceptualized and designed the study, performed data analysis and modeling, and drafted the manuscript. P.D.P. contributed to data curation, preprocessing, and literature review. S.B. assisted in data analysis and visualization. A.S. contributed to methodology development and model validation. K.M. supported data interpretation and manuscript editing. K.A.A.-M. supervised the study, provided critical revisions, and approved the final manuscript.  All authors read and approved the final version of the manuscript.

References


Alfian, G., Syafrudin, M., Rhee, J., Anshari, M., Mustakim, M., & Fahrurrozi, I. (2020). Blood glucose prediction model for type 1 diabetes based on extreme gradient boosting. IOP Conference Series: Materials Science and Engineering, 803(1), 012012. https://doi.org/10.1088/1757-899X/803/1/012012

Gallwitz, B. (2009). Implications of postprandial glucose and weight control in people with type 2 diabetes: Understanding and implementing the International Diabetes Federation guidelines. Diabetes Care, 32(Suppl 2), S322–S325. https://doi.org/10.2337/dc09-S331

Gnanadass, I. (2020). Prediction of gestational diabetes by machine learning algorithms. IEEE Potentials, 39(6), 32–37. https://doi.org/10.1109/MPOT.2020.3015190

Oviedo, S., Vehí, J., Calm, R., & Armengol, J. (2017). A review of personalized blood glucose prediction strategies for type 1 diabetes. International Journal for Numerical Methods in Biomedical Engineering, 33(6), e2833. https://doi.org/10.1002/cnm.2833

Popova, P., Castorino, K., Grineva, E. N., & Kerr, D. (2016). Gestational diabetes mellitus diagnosis and treatment goals: Measurement and measures. Minerva Endocrinologica, 41(4), 421–432.

Pustozerov, E., Popova, P., Tkachuk, A., Bolotko, Y., Yuldashev, Z., & Grineva, E. (2018a). Development and evaluation of a mobile personalized blood glucose prediction system for patients with gestational diabetes mellitus. JMIR mHealth and uHealth, 6(1), e6. https://doi.org/10.2196/mhealth.9236

Pustozerov, E., & Popova, P. (2018b). Mobile-based decision support system for gestational diabetes mellitus. In Proceedings of the Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT) (pp. 45–48). https://doi.org/10.1109/USBEREIT.2018.8384546

Pustozerov, E. A., Yuldashev, Z. M., Popova, P. V., Bolotko, Y. A., & Tkachuk, A. S. (2018c). Information support system for patients with gestational diabetes mellitus. Biomedical Engineering, 51(6), 407–410. https://doi.org/10.1007/s10527-018-9759-2

Pustozerov, E., Tkachuk, A., Vasukova, E., Dronova, A., Shilova, E., Anopova, A., Piven, F., Pervunina, T., Vasilieva, E., Grineva, E., & Popova, P. (2020). The role of glycemic index and glycemic load in the development of real-time postprandial glycemic response prediction models for patients with gestational diabetes. Nutrients, 12(2), 302. https://doi.org/10.3390/nu12020302

Popova, P., Vasilyeva, L., Tkachuk, A., Puzanov, M., Golovkin, A., Bolotko, Y., Pustozerov, E., Vasilyeva, E., Li, O., Zazerskaya, R., Dmitrieva, R., Kostareva, A., & Grineva, E. (2018). A randomized controlled study of different glycemic targets during gestational diabetes treatment: Effect on adipokines and ANGPTL4 expression. International Journal of Endocrinology, 2018, 1–8. https://doi.org/10.1155/2018/6481658

Wender-Ozegowska, E., Wróblewska, K., Zawiejska, A., Pietryga, M., Szczapa, J., & Biczysko, R. (2005). Threshold values of maternal blood glucose in early diabetic pregnancy—Prediction of fetal malformations. Acta Obstetricia et Gynecologica Scandinavica, 84(1), 17–25. https://doi.org/10.1111/j.0001-6349.2005.00606.x


Article metrics
View details
0
Downloads
0
Citations
53
Views

View Dimensions


View Plumx


View Altmetric



0
Save
0
Citation
53
View
0
Share