Integrative Biomedical Research

Integrative Biomedical Research (Journal of Angiotherapy) | Online ISSN  3068-6326
685
Citations
1.4m
Views
731
Articles
Your new experience awaits. Try the new design now and help us make it even better
Switch to the new experience
Figures and Tables
RESEARCH ARTICLE   (Open Access)

Predicting Parkinson’s Disease Progression Using Statistical And Neural Mixed Effects Models: Comparative Study On Longitudinal Biomarkers

Ran Tong 1*, Lanruo Wang 2, Tong Wang 3, Wei Yan 4

+ Author Affiliations

Integrative Biomedical Research 10 (1) 1-18 https://doi.org/10.25163/biomedical.10110648

Submitted: 17 January 2026 Revised: 18 February 2026  Published: 27 February 2026 


Abstract

Predicting Parkinson’s Disease (PD) progression is crucial for personalized treatment, and voice biomarkers offer a promising non-invasive method for tracking symptom severity through telemon- itoring. However, analyzing this longitudinal data is challenging due to inherent within-subject correlations, the small sample sizes typical of clinical trials, and complex patient-specific progres- sion patterns. While deep learning offers high theoretical flexibility, its application to small-cohort longitudinal studies remains under-explored compared to traditional statistical methods. This study presents an application of the Neural Mixed Effects (NME) framework to Parkinson’s telemonitoring, benchmarking it against Generalized Neural Network Mixed Models (GNMM) and semi-parametric Generalized Additive Mixed Models (GAMMs). Using the Oxford Parkinson’s telemonitoring voice dataset (N = 42), we demonstrate that while neural architectures offer flexibility, they are prone to significant overfitting in small-sample regimes. Our results indicate that GAMMs provide the optimal balance, achieving superior predictive accuracy (MSE 6.56) compared to neural baselines (MSE > 90) while maintaining clinical interpretability. We discuss the critical implications of these findings for developing robust, deployable telemonitoring systems where data scarcity is a constraint, highlighting the necessity for larger, diverse datasets for neural model validation.

Keywords: Parkinson’s Disease · Biostatistics · Longitudinal Data · Neural Networks · Artificial Intelligence

1. Introduction

Parkinson’s disease (PD) is a progressive neurodegenerative disorder marked by motor symptoms such as tremor, rigidity, and postural instability, each of which lowers quality of life. The condition affects millions worldwide, and prevalence rises with age, which makes modeling the progression of Parkinson’s disease increasingly important and urgent.

In order to measure the progression of Parkinson’s disease, researchers have explored several objective digital biomarkers. Wearable inertial sensors quantify gait impairment, bradykinesia, and tremor during everyday activity Del Din et al. (2016). Smartphone accelerometers, gyroscopes, and touch-screen interactions capture movement patterns and tapping speed that relate to symptom severity Arora et al. (2015). Handwriting and drawing tasks recorded on digitizing tablets reveal micrographia and fine-motor deficits Drotar et al. (2016). Among these methods, voice biomarkers have emerged as a promising one for tracking PD progression Tsanas et al. (2012), as it is objective, non-invasive and convenient to be obtained. Subtle shifts in pitch, loudness, and 

stability may appear as the disease progresses, and telemonitoring makes it possible to collect frequent longitudinal voice data that complement clinic visits.These voice features are typically used to predict scores from clinical assessments like the Unified Parkinson’s Disease Rating Scale (UPDRS) Fahn et al. (1987), which serves as the primary response variable for quantifying symptom severity and progression in many PD studies.

Because the same individual is measured many times, longitudinal voice data contain within-subject correlation, and patients differ in baseline severity and rate of change. Models therefore need to represent both population trends and subject-specific variation Mandel et al. (2023). Furthermore, the link between high-dimensional voice features and UPDRS may be highly nonlinear Mandel et al. (2023).

Classical statistical methods like Linear Mixed Models (LMMs) Laird & Ware (1982) and Generalized Linear Mixed Models (GLMMs) Breslow & Clayton (1993) have long been common tools for analyzing such longitudinal data, as they effectively use random effects to capture within-subject correlations and individual differences. However, their primary limitation lies in the inherent assumption of linear relationships for the fixed effects component, which may inadequately model the complex, nonlinear patterns of change, which are often observed in PD progression Mandel et al. (2023).

While Nonlinear Mixed Effects (NLME) models Lindstrom & Bates (1990) provide greater flexibility for nonlinear trends, their traditional optimization algorithms can be computationally demanding and may not scale efficiently to the high-dimensional parameter spaces characteristic of modern machine learning approaches Wörtwein et al. (2023). Similarly, semi-parametric extensions like Generalized Additive Mixed Models (GAMMs) Lin & Zhang (1999); Wood (2017), which use smoothing splines for time trends, can also face challenges with intricate interactions among predictors.

Deep neural networks (DNNs) offer a powerful alternative for modeling complex, nonlinear structures within large datasets. However, standard DNNs typically presuppose that observations are independent. Applying them naively to longitudinal data by disregarding these inherent correlations can result in biased estimates and suboptimal predictive performance Mandel et al. (2023). Early adaptations, such as incorporating subject identifiers as input features in ANNs Maity & Pal (2013), aimed to address this but often encountered scalability problems as the parameter space increased with the number of subjects Mandel et al. (2023).

More recent advancements have focused on combing neural networks with mixed-effects frameworks. A prevalent strategy has been the development of Neural Networks with Linear Mixed Effects (NN-LME), where a neural network learns nonlinear data representations, and a linear mixed model is subsequently applied to these features or forms the final layer Xiong et al. (2019). Although these NN-LME models can capture nonlinear population-level trends, they frequently restrict person-specific (random) effects to be linear and may inherit the scalability constraints of conventional LME optimization methods Wörtwein et al. (2023).

To overcome these limitations, models such as the Generalized Neural Network Mixed Model (GNMM) Mandel et al. (2023) were introduced. The GNMM replaces the linear fixed-effect component of a GLMM with a neural network, thereby enhancing the ability to capture nonlinear associations while retaining the GLMM structure for random effects. Further extending this method, the Neural Mixed Effects (NME) model Wörtwein et al. (2023) allows nonlinear person-specific parameters to be optimized at any point within the neural network architecture, offering more flexibility and scalability for modeling individual-specific nonlinear trends.

In this study, we bridge the gap between deep learning and clinical statistics by presenting, to the best of our knowledge, the first application of the Neural Mixed Effects (NME) framework Wörtwein et al. (2023) to Parkinson’s Disease telemonitoring. Unlike prior studies that often treat voice recordings as independent samples or rely solely on linear random effects, we implement a fully non-linear mixed-effects architecture designed to disentangle complex subject- specific disease trajectories from population-level trends. To rigorously evaluate this approach, we benchmark the NME framework against statistical baselines, addressing a critical gap regarding the trade-off between neural flexibility and statistical stability in the small-sample regime (N < 50) typical of clinical pilot studies.

Our contributions are threefold: (1) Methodological Integration: We introduce a novel modeling pipeline that integrates the NME Wörtwein et al. (2023) and Generalized Neural Network Mixed Model (GNMM) Mandel et al. (2023) architectures into PD telemonitoring. This approach allows for the non-linear optimization of subject-specific parameters, moving beyond the linear random effects typically used in prior studies. (2) Rigorous Benchmarking: We perform a comprehensive benchmark against semi-parametric statistical standards (GAMMs). We provide empirical evidence that for typical clinical trial cohort sizes (N = 42), the high parameter space of deep neural mixed models leads to overfitting, despite their theoretical advantages. (3) Clinical Guidance: We provide strategic guidance for biomedical model selection, demonstrating that while NME offers a powerful framework for large-scale data, Generalized Additive Mixed Models remain the superior choice for interpretability, stability, and predictive accuracy in current telemonitoring regimes.

2. Related Work

Even though some studies have leveraged the UCI Parkinson’s Telemonitoring dataset UCI Machine Learning Repository (2012) to model disease severity, many of them do not account for the longitudinal structure in the data. For example, Eskidere et al. Eskidere et al. (2012) applied various linear and nonlinear regression techniques like Support Vector Machines (SVM) and Least Squares SVM (LS-SVM) to predict UPDRS scores based on acoustic features. However, their approaches treated each observation as an independent sample, neglecting the repeated measures structure of the dataset. Similarly, Nilashi et al. Nilashi et al. (2016) proposed a hybrid system combining noise removal, clustering, and prediction methods like Adaptive Neuro-Fuzzy Inference System (ANFIS) and Support Vector Regression (SVR), but there is the absence of incorporating random effects to model individual differences in progression. Moreover, the interpretability of the model can be complicated by the combination of multiple techniques to limit the clinical utility.

Beyond static regression, recent research has increasingly adopted temporal deep learning architectures to capture disease trajectories. Long Short-Term Memory (LSTM) networks have been widely deployed to predict PD severity from voice and wearable sensors, often outperforming static baselines by retaining memory of long-term dependencies Elkholy et al. (2025); Senturk (2024). More recently, Transformer-based models, such as the Temporal Fusion Transformer (TFT), have shown promise in forecasting cognitive decline in PD by leveraging self-attention mechanisms to weigh the importance of different time points Hassan et al. (2024). However, these deep sequence models typically require massive datasets (e.g., PPMI with hundreds of subjects) to generalize effectively. In small-N cohorts typical of telemonitoring pilots, they are prone to overfitting and often lack an explicit mechanism to decompose variance into population-level trends versus subject-specific deviations Mandel et al. (2023).

Alternative non-parametric Bayesian approaches, particularly Gaussian Processes (GPs), have also been explored for modeling PD trajectories. Comparisons of Gaussian Process Regression (GPR) with other machine learning methods have demonstrated GPR’s efficacy in predicting UPDRS scores from speech signals, particularly when combined with dimensionality reduction techniques like Laplacian scores Filali et al. (2023). Further advancements using Deep Gaussian Processes have attempted to model personalized progression patterns through multi-task learning frameworks Wang et al. (2024). While GPs offer excellent uncertainty quantification, their computational complexity typically scales cubically with the number of observations (O(N 3)), making them less scalable for high-frequency telemonitoring data compared to the linear scalability of neural mixed-effects approximations.

Finally, the "black box" nature of deep learning has spurred interest in Explainable AI (XAI) for Parkinson’s applications. Recent studies have integrated Layer-wise Relevance Propagation (LRP) Rezaei et al. (2024) or SHapley Additive exPlanations (SHAP) Alshammari et al. (2025) to interpret complex LSTM or CNN predictions. While these post- hoc methods provide valuable insights, they differ fundamentally from mixed-effects models, which offer intrinsic interpretability by explicitly estimating fixed effects (population trends) and random effects (individual heterogeneity) as part of the model structure.

To improve the methods, in our study, we model the longitudinal structure of the data by comparing traditional linear mixed models (LMMs) with two recent neural extensions: the GNMM and NME model. Relatedly, we have studied longitudinal clinical time series modeling in other settings and found that compact recurrent models can remain competitive with more complex transformer-based approaches, which supports the idea that strong temporal inductive bias can matter as much as model size Tong et al. (2025b). By incorporating both fixed and random effects, these models are better equipped to capture both population-level trends and subject-specific variations in disease progression. However, to date, the scalability and stability of the NME framework Wörtwein et al. (2023) have not been evaluated on Parkinson’s telemonitoring data, where the ratio of repeated measures to subjects is high, but the total subject count is typically low. This study fills that gap by systematically comparing these neural architectures against established statistical baselines.

3. Methodology

3.1 Dataset

This study utilizes the UCI Parkinson’s Telemonitoring DatasetUCI Machine Learning Repository (2012), which consists of longitudinal data collected from 42 patients (28 males and 14 females) diagnosed with Parkinson’s disease (Table 1). These patients were in early-stage Parkinson’s disease and recruited to a six-month trial of a telemonitoring device for remote symptom progression monitoring. The recordings, comprising a range of biomedical voice measurements, were automatically collected in patients’ homes. The dataset contains a total of 5,875 records, capturing repeated measures. Each record includes 22 variables including the following information:

 

Table 1: Variable descriptions for the Parkinson’s Telemonitoring Dataset.

Category

Variable Name

Type

Description

Demographics

subject

Integer

Unique identifier for each subject

 

age

Integer

Age of the subject

 

sex

Binary

Subject sex (0 = male, 1 = female)

Clinical Scores

total_UPDRS

Continuous

Total UPDRS score, linearly interpolated

 

motor_UPDRS

Continuous

Motor UPDRS score, linearly interpolated

 

test_time

Continuous

Time since recruitment (in days)

Voice Biomarkers

Jitter(%)

Continuous

Several measures of variation in fundamental frequency

 

Jitter(Abs)

Continuous

 
 

Jitter:RAP

Continuous

 
 

Jitter:PPQ5

Continuous

 
 

Jitter:DDP

Continuous

 
 

Shimmer

Continuous

Several measures of variation in amplitude

 

Shimmer(dB)

Continuous

 
 

Shimmer:APQ3

Continuous

 
 

Shimmer:APQ5

Continuous

 
 

Shimmer:APQ11

Continuous

 
 

Shimmer:DDA

Continuous

 
 

NHR

Continuous

Ratio of noise to tonal components in the voice

 

HNR

Continuous

 
 

RPDE

Continuous

Nonlinear dynamical complexity measure

 

DFA

Continuous

Fractal scaling exponent

 

PPE

Continuous

Nonlinear measure of fundamental frequency variation

 

In the early stages of Parkinson’s disease (PD), many patients show noticeable changes in their speech, such as unstable pitch, uneven loudness, hoarseness, and unclear pronunciation. Clinically, this cluster of symptoms is often referred to as hypokinetic dysarthria. PD affects the brain’s ability to control the muscles which are used for speaking, including those in the throat and chest. These impairments are largely driven by underlying bradykinesia (slowness of movement) and muscular rigidity. Since telemedicine 

and remote health tools become more common, voice recordings have become a useful and non-invasive way to monitor how the disease changes over time. The early diagnosis can also be helped.

Many voice-based measurements are contained in the datasets and can be grouped into four types: frequency changes (Jitter), amplitude changes (Shimmer), noise features (NHR and HNR), and nonlinear patterns (RPDE, DFA, and PPE).

Jitter features including Jitter (%), Jitter (Abs), RAP, PPQ5, and DDP measure small changes in pitch between voice cycles. Since the vocal folds of people with PD do not move smoothly due to laryngeal rigidity and micro-tremors, they cannot keep a steady pitch which will lead to a higher jitter values.

Shimmer feature such as Shimmer in percent and decibels, APQ3, APQ5, APQ11, and DDA show how the loudness of the voice changes from one cycle to the next. Due to the muscle control problems and reduced respiratory support (hypophonia) which make their speech volume less steady, these values are resulted in higher.

NHR (Noise-to-Harmonics Ratio) and HNR (Harmonics-to-Noise Ratio) are used to check how much noise is in the voice. PD patients tend to have more noise and less clear voice sounds due to incomplete glottic closure, which make these values worse than healthy individuals.

Furthermore, Recurrence Period Density Entropy (RPDE), Detrended Fluctuation Analysis (DFA), and Pitch Period Entropy (PPE) are nonlinear dynamic measures that capture complexity and unpredictability in vocal patterns. People with PD may speak in a way which is less regular or harder to predict. These features can be helpful to identify those subtle inssues in voice control.

Overall, these voice features can help to study speech problems in Parkinson’s disease. These features can be easily measured and also related to related to motor symptoms closely, therefore they are valuable digital biomarkers in tracking disease progression and supporting remote healthcare systems.

In this study, we use total_UPDRS as the response variable to explore the impact of voice biomarkers on disease progression.

Figure 1(a) shows the pairwise Pearson correlations among all measured variables. Several voice-based features, particularly jitter and shimmer measures, show moderate positive correlations with UPDRS scores. This indicates that voice instability is related to the severity of the disease.

Figure 1(b) illustrates the relationship between patient age and total_UPDRS scores. Although the scatterplot reveals variability within the age range, the fitted regression line shows a clear positive association, indicating that older individuals tend to have more severe symptoms.

Figure 1(c) presents longitudinal trends in log-transformed total_UPDRS scores for individual patients. The patterns differ among patients, with some showing progressive worsening while others remain stable or even slightly improve. The heterogeneity in progression patterns shows the necessity for individualized modeling approaches like mixed effects models.

Figure 1: Data analysis of the dataset

3.2 Traditional Methods: Linear Mixed–Effects Model (LMM)

We first apply Linear Mixed-Effects model Laird & Ware (1982) on our dataset. Let Yij denote the UPDRS score for subject i (i = 1, . . . , m) at time tij (j = 1, . . . , ni), and let Xijk be the k-th voice feature (k = 1, . . . , K). We use a linear mixed–effects model with a subject-specific random intercept b0i and random slope b1i for time:

Y_ij = ß_0 + ß_1 t_ij + sum_{k=1}^{K} ß_{k+1} X_ijk + b_0i + b_1i t_ij + e_ij

In this model, the terms ß0, ß1tij, and LK ßk+1 Xijk represent the fixed effects. Specifically, ß0 is the overall

intercept, ß1 is the average slope for time tij across all subjects, and ßk+1 are the coefficients for the K voice features 

Xijk, representing their average effects on Yij. These fixed effects describe the population-average relationships.

The terms b0i and b1itij represent the random effects for subject i. Here, b0i is the subject-specific random intercept, showing how subject i’s baseline UPDRS score deviates from the overall intercept ß0. Similarly, b1i is the subject- specific random slope for time, indicating how subject i’s rate of change in UPDRS score over time tij deviates from the average time slope ß1. These random effects capture individual heterogeneity around the population-average trends.

Finally, eij is the residual error term for subject i at time j, representing within-subject variability not explained by the fixed or random effects.

Distributional assumptions.

The subject-specific random effects (b0i, b1i)? are assumed to be drawn from a bivariate normal distribution with a

mean of zero and a covariance matrix D:

(b_0i, b_1i)^T ~ N(0, D)

where

D = [ s_b0^2 ? s_b0 s_b1
? s_b0 s_b1 s_b1^2 ]

The residual errors eij are assumed to be independent and identically distributed (i.i.d.) normal random variables with a mean of zero and variance s2:

eij ~ N (0, s2).

Furthermore, the random effects bi and residual errors ei are assumed to be independent of each other.

Matrix Formulation

Let

y_i = (Y_{i1}, …, Y_{i n_i})?

be the vector of n_i UPDRS scores for subject i.

The fixed-effects design matrix X_i and the random-effects design matrix Z_i for subject i are defined as:

X_i =
[
1 t_{i1} X_{i1,1} … X_{i1,K}
? ? ? ? ?
1 t_{i n_i} X_{i n_i,1} … X_{i n_i,K}
]

Z_i =
[
1 t_{i1}
? ?
1 t_{i n_i}
]

The matrix X_i contains:

  • A column of ones for the intercept,

  • A column for time t_{ij},

  • K columns for the voice features X_{ij,k}.

The matrix Z_i contains:

  • A column of ones for the random intercept,

  • A column for time t_{ij}, corresponding to the random slope.

Subject-Specific Model

The model for subject i can be written as:

y_i = X_i ß + Z_i b_i + e_i

where:

  • ß = (ß0, ß1, …, ß_{K+1})? is the vector of fixed-effects coefficients,

  • b_i = (b_{0i}, b_{1i})? is the vector of random effects for subject i,

  • e_i = (e_{i1}, …, e_{i n_i})? is the vector of residual errors.

Assumptions:

b_i ~ N(0, D)
e_i ~ N(0, s² I_{n_i})

Marginal Distribution

The marginal distribution of y_i is:

y_i ~ N(X_i ß, V_i)

where:

V_i = Z_i D Z_i? + s² I_{n_i}.

Stacking all subjects gives:

y ~ N(Xß, V)

with

V = blockdiag(V1, …, V_m).

Log-Likelihood

The log-likelihood can be written as:

l(ß, ?)
= -½ { log |V|
+ (y - Xß)? V?¹ (y - Xß)
+ n log(2p) }

(2)

where

? = (s_b0², s_b1², ?, s²)

represents the vector of variance components.

Maximum Likelihood Estimation

Estimation of ß and ? via Maximum Likelihood (ML) proceeds as follows.

Setting ?l / ?ß = 0 gives the generalized least squares (GLS) estimator for ß:

ß^ = (X? V?¹ X)?¹ X? V?¹ y.

For each variance component ?_j in ?, the ML estimate ?^_j is found by solving the score equation:

?l / ??_j = 0,

where

?l / ??_j
= -½ {
tr( V?¹ (?V / ??_j) )
- (y - Xß)? V?¹ (?V / ??_j) V?¹ (y - Xß)
}
= 0.

The REML log-likelihood is:

l_REML(?)
= -½ {
log |V|
+ log |X? V?¹ X|
+ (y - Xß^)? V?¹ (y - Xß^)
+ (n - p) log(2p)
}

(4)

These equations are typically solved numerically (e.g., using Newton–Raphson), often by iterating between estimating ß given ?, and then estimating ? given ß, until convergence to obtain ß^ and ?^.

While ML provides estimates for all parameters, its estimates of variance components ? can be biased, particularly in smaller samples, because ML does not fully account for the degrees of freedom used to estimate the fixed effects ß.

Restricted Maximum Likelihood (REML) is preferred for estimating variance components, as it yields less biased estimates. REML achieves this by maximizing a likelihood function based on linear combinations of y that are invariant to the fixed effects, effectively adjusting for the estimation of ß.

Here:

p = dim(ß),
and ß^ is the GLS estimator obtained from Equation (3).

REML estimates ?^_REML are found by solving:

?l_REML / ??_j = 0

numerically.

Subsequently, ß is estimated using GLS with V evaluated at ?^_REML.

Numerical Estimation in Practice

Closed-form solutions for the variance components ? do not exist, so one must rely on iterative algorithms. Two standard approaches are:

  1. Newton or Fisher scoring methods applied to the log-likelihood, and

  2. The Expectation–Maximization (EM) algorithm, which treats the random effects b as latent variables.

These methods can be implemented using standard statistical software, such as lme4 in R or statsmodels in Python.

Variable Selection

In modeling Parkinson’s disease progression, the presence of multicollinearity among voice biomarkers poses a significant challenge. Several acoustic features, such as the various jitter and shimmer measures, are known to be highly correlated (e.g., as shown in Figure 1 (a), correlations exceeding 0.9 between some shimmer metrics). To address this and ensure model interpretability and parsimony, we employed a two-stage variable selection strategy: first using the Least Absolute Shrinkage and Selection Operator (LASSO) for dimensionality reduction, followed by backward stepwise selection on a linear mixed-effects model. Variance Inflation Factor (VIF) diagnostics were also computed to assess residual multicollinearity after selection.

The LASSO was applied on the linear model ignoring random effects, focusing solely on the fixed effects which shrinks less informative coefficients to zero, yielding a sparse set of candidate predictors. Importantly, the LASSO helped identify redundant jitter and shimmer variables, retaining only the most informative features for further modeling.

Subsequently, we performed stepwise backward selection using the lmer model, starting from a full linear mixed-effects model with all LASSO-selected predictors. This iterative process removed non-significant fixed effects based on AIC, leading to a reduced yet effective model. During this step, we also checked VIF values to confirm that no remaining variable exhibited severe multicollinearity (all VIF < 5.0).

The final model retained five predictors: age, test_time, Jitter_PPQ5, NHR, and HNR. This subset balances inter- pretability, predictive performance, and model stability, and serves as the foundation for further modeling, including transformation, interaction terms, and random slopes. Note that interaction effects and nonlinear terms were considered in later modeling stages rather than during variable selection.

 

Model Refinement via Interaction and Random Slopes

After identifying a subset of relevant predictors through variable selection, we further refined the linear mixed-effects model by incorporating interaction terms and evaluating the inclusion of subject-specific random slopes. These enhancements were designed to capture individual variation more flexibly and to model potential time-varying effects, thereby improving overall model fit and predictive performance.

To ensure the validity of the model assumptions, we examined diagnostic plots of residuals and normality. Figure 2 (top row) presents diagnostic plots from the initial model using the raw outcome variable total_UPDRS. The residual plot reveals heteroscedasticity, with increasing spread at higher fitted values, while the Q-Q plots for both fixed and random effects show noticeable deviations from normality.

To mitigate these issues, we applied a logarithmic transformation to the outcome variable. This transformation significantly stabilized the variance and improved the normality of residuals, as shown in the bottom row of Figure 2. It also compressed the scale of extreme values, reducing the influence of high-leverage tail points and producing more symmetric residuals overall.

Figure 2: Top row: Residuals, fixed effect Q-Q, and random effect Q-Q plots from the original model using total_UPDRS. Bottom row: Diagnostics after log-transforming the response. Transformation improves variance stabilization and normality.

In addition to transforming the response, we explored potential interactions among the selected covariates. A systematic evaluation of all pairwise interactions using likelihood-based model comparison revealed that the interaction between test_time and HNR was the most impactful. This interaction was statistically significant and led to a notable improvement in model fit, with the AIC decreasing from 10231.6 to 10243.3. The result suggests that the effect of HNR on disease progression varies over time. Other interactions provided marginal improvement or introduced unnecessary complexity and were therefore excluded from the final model.

We also investigated whether to include random slopes in addition to random intercepts for each subject. As shown in Figure 1 (c), subject-specific log(UPDRS) trajectories over time exhibited heterogeneous slopes, motivating the inclusion of a random slope for test_time. The addition of this random slope further reduced AIC to 10261.4, resulting in the final model:

log(UPDRS_ij) = ß_0 + ß_1 age_i + ß_2 test_time_ij + ß_3 HNR_ij + ß_4 (test_time_ij × HNR_ij) + b_0i + b_1i test_time_ij + e_ij

where b0i, b1i ~ N(0, G) represent the subject-specific random intercept and slope for time, and eij (0, s2)

denotes the residual error.

Table 2 summarizes the model refinement process using AIC as the selection criterion. Each modification, i.e., the selection of variables, the transformation, the inclusion of interactions and the random slope, improved the model fit, culminating in the final model specification above.

Table 2: Model selection and refinement steps based on AIC comparison

Step

AIC

Comment

Full model (all predictors)

28124.1

Initial LMM

After LASSO (fixed effects only)

27880.6

Removed highly correlated terms

After stepwise (LMM) & VIF

27869.8

Dropped non-significant effects

After log-transformation

–10231.6

Improved residual normality, variance

Add interaction (test_time × HNR)

–10243.3

–10261.4

Included significant time-varying HNR effect Final model with varying subject-specific slopes

This finalized model serves as a baseline for comparison against more flexible methods, such as generalized additive and deep mixed-effects models, in subsequent sections.

3.3 Generalized Additive Mixed Model (GAMM)

To capture nonlinear temporal effects in Parkinson’s disease progression, we adopted a Generalized Additive Mixed Model (GAMM), which extends the linear mixed-effects framework by allowing smooth, data-driven functions of continuous covariates (Wood, 2017). This formulation maintains the interpretability of linear effects while introducing the flexibility necessary to model nonlinear trends over time.

In our application, the log-transformed total UPDRS score is modeled as a smooth function of test_time, along with linear terms for other covariates. The model is expressed as:

log(UPDRS_ij) = ß_0 + ß_1 age_i + ß_2 HNR_ij + f(test_time_ij) + b_0i + b_1i test_time_ij + e_ij -----------------(5)

where f (·) is a smooth function of time, and b0i, b1i ~ N (0, G) are the subject-specific random intercept and slope. The residual errors eij ~ N (0, s2) are assumed to be independent and homoscedastic.

Spline Basis Representation and Estimation

The smooth function f (test_time) is approximated via a linear combination of basis functions:

f(test_time) = sum_{k=1}^{K} a_k B_k(test_time)-----------(6)

where Bk(.) are predefined spline basis functions (e.g., cubic regression splines, B-splines, or thin plate splines (Ruppert et al., 2003)), and ak are coefficients estimated from the data. To control smoothness and avoid overfitting, a roughness penalty is imposed on the second derivative of the function:

Penalized log-likelihood = l(ß, a) - (1/2) ? ? [ f''(t) ]² dt---------(7)

Here, ? is a smoothing parameter that balances model fit and regularity. The complexity of the spline is quantified through its effective degrees of freedom (edf).

Estimation proceeds using Penalized Iteratively Reweighted Least Squares (P-IRLS), an efficient approach for maximizing the penalized likelihood. The gamm() function in the mgcv R package is used to jointly estimate the fixed effects, the smooth term, and the random effects. Internally, gamm() delegates the estimation of random effects to the lme() function from the nlme package, allowing flexible modeling of subject-specific deviations via random intercepts and slopes.

The smoothing parameter ? is selected by optimizing the marginal Restricted Maximum Likelihood (REML) criterion. REML balances goodness of fit with smoothness and has been shown to provide stable and efficient estimation in practice (Wood, 2011).

Comparison of LMM and GAMM

To assess model performance, we compared the final Linear Mixed-Effects Model (LMM) and the Generalized Additive Mixed Model (GAMM) using both model fit criteria and predictive accuracy. Table 3 summarizes the estimated fixed and random effects, along with model fit and test set performance.

Table 3: Comparison of LMM and GAMM Estimates and Performance. The AIC value for LMM reported here differs from Table 2 because the final models in this comparison were fitted using Restricted Maximum Likelihood (REML) for unbiased variance estimation, whereas the model selection process in Table 2 utilized Maximum Likelihood (ML) to compare fixed effects.

Section

Term

LMM

GAMM

Notes

Fixed Effects (Est. (p-val))

Intercept

3.316 (<2e-16)

3.316 (<2e-16)

 
 

Age

0.136 (0.0219)

0.136 (0.0145)

Significant in both

 

test_time

0.0358 (0.0099)

spline

Nonlinear in GAMM

 

HNR

0.0036 (0.0035)

0.0043 (0.0006)

Consistent positive effect

 

test_time:HNR

-0.0077 (1e-08)

n.s.

Only significant in LMM

Smooth Terms (GAMM only)

s(test_time)

edf = 6.17, p < 2e-16

 

Random Effects (Std. Dev (Corr))

Intercept

0.382

0.373

Similar

 

Slope (test_time)

0.085 (-0.08)

0.084 (-0.061)

Similar

 

Residual

0.058

0.057

Similar

Model Fit (AIC)

 

-16062.39

-16162.05

GAMM better (fit)

Test MSE (orig. scale)

 

7.70

6.56

GAMM better (predictive)

 

The GAMM demonstrated a superior model fit based on Akaike Information Criterion (AIC), achieving a lower AIC value (16162.05) than the LMM (16062.39). This improvement can be attributed to GAMM’s flexibility in capturing nonlinear structures, particularly through a spline term applied to test_time. As shown in Figure 3, the estimated smooth function of time deviates notably from linearity, reinforcing the presence of nonlinear progression patterns in UPDRS scores over time.

Figure 3: Estimated spline effect of test_time from GAMM, showing nonlinear progression over time.

To compare predictive performance, we conducted a hold-out evaluation: the last observation from each of the 42 subjects was set aside as the test set, while the remaining data were used for training. On this test set, GAMM achieved a slightly lower mean squared error (MSE = 6.56) compared to LMM (MSE = 7.70), indicating marginally better prediction accuracy in the original scale.

Notably, the interaction term test_time HNR was statistically significant in the LMM but became non-significant under GAMM. This suggests that the nonlinear main effect of test_time in GAMM may account for variation previously explained by the interaction term in the LMM. The estimated standard deviations of the random effects and residual terms were similar across models, indicating consistent subject-level variation.

In summary, GAMM provided a more flexible fit by capturing nonlinear temporal patterns, as evident in its superior AIC and spline visualization. However, LMM offered comparable generalization performance on the test set and more interpretable fixed effects. The choice between models thus depends on whether the goal prioritizes interpretability or modeling flexibility.

 

3.4 Generalized Neural Network Mixed Model (GNMM) for Non-linear Longitudinal Modeling

Building on Mandel Mandel et al. (2023) we use a Generalized Neural Network Mixed Model (GNMM) to predict the longitudinal Total_UPDRS scores collected in the tele-monitoring study of Parkinson’s disease.

We retain the notation introduced earlier:  Yij  is the Total_UPDRS score for subject i at visit j, Xij          Rp is the

p = 17-vector of predictors (test time + 16 voice features).

Let i = 1, . . . , m label the m = 42 patients in the Oxford telemonitoring study and j = 1, . . . , ni their successive visits. At visit j we record the response Yij (the Total_UPDRS score) and a predictor vector

Xij = (test_time, 16 voice features)? ? R17,

where test_time is the elapsed study time and the remaining 16 entries are acoustic measures extracted from the voice recording.

Following the mixed-effects formulation of Mandel et al. (2023) , we allow observations from the same subject to be correlated through a cluster-specific random-effect vector bi  Rq , where q   1. Conditional on bi, the outcomes Yij are assumed to follow an exponential-family distribution

E(Y_ij | b_i) = µ_ij^b,
Var(Y_ij | b_i) = f a_ij v(µ_ij^b)

with known variance function v(·), fixed dispersion ?, and aij is a known constant.

Generalized Neural Network Mixed Model (GNMM) On Our Case

Consider a feed-forward artificial neural network (ANN) with L hidden layers, the predictor vector Xij Rp (p = 17) as input, and a univariate output µb representing the conditional mean of Yij (Total_UPDRS) for subject i at visit j. Following Mandel et al. (2023), the network output is built up through a sequence of nested activation functions gl(.), l = 0, . . . , L.

Network layers.

The input Xij enters the L-th (bottom) hidden layer with knodes:

a_ij^(L) = g_L( ?^(L) X_ij + d^(L) ) ------(8)

where ?^(L) is a k_L × p weight matrix and d^(L) is a bias vector of length k_L.

For hidden layer l = L - 1, …, 1 with k_l nodes,

a_ij^(l) = g_l( ?^(l) a_ij^(l+1) + d^(l) )

with ?^(l) of size k_l × k_(l+1) and d^(l) ? R^(k_l).

 

Output layer and random effects

The univariate network output determines the conditional mean through

µ_ij^b = g_0( ?^(0) a_ij^(1) + d^(0) + Z_ij^T b_i ) (10)

where ?^(0) is a 1 × k_1 weight vector, d^(0) is a scalar bias,
Z_ij ? R^q is the design vector for the cluster-specific random effect, and
b_i ~ N(0, D).

In a classical generalized linear mixed model (Breslow & Clayton, 1993):

E[ Y_ij | b_i ] = h( X_ij^T a + Z_ij^T b_i ) (11)

where:

  • h(·) is the inverse link,

  • X_ij^T a is the fixed-effect component,

  • Z_ij^T b_i is the random effect.

In the GNMM we replace the fixed-effect term by the nonlinear network output and use the final activation g_0(·) in place of h(·):

E[ Y_ij | b_i ] = g_0( ?^(0) a_ij^(1) + d^(0) + Z_ij^T b_i ) (12)

 

Quasi-likelihood

Let ? = vec(?^(0), ?^(1)) and d = (d^(0), d^(1)) collect all weights and biases of the single-hidden-layer network.

With q = 1 (a random intercept), we set Z_ij = 1 and b_i ~ N(0, D).

The quasi-likelihood used to estimate (?, d, D) is:

exp{ q(?, d, D) }
? |D|^(-1/2) ? exp {
?{i=1}^{42} ?{j=1}^{n_i} ?_{µ_ij^b}^{y_ij} (y_ij - u) / [ a_ij v(u) ] du
- b_i^2 / (2D)
- ?(?^T ? + d^T d)
} db.

Laplace approximation

Following Breslow & Clayton (1993) and Mandel et al. (2023), write

?(b) =
- ?{i=1}^{42} ?{j=1}^{n_i} ?_{µ_ij^b}^{y_ij}
(y_ij - u) / [ f a_ij v(u) ] du

  • b^2 / (2D)

  • (? / 2)( ?^T ? + d^T d ).

So that equation (13) is proportional to

|D|^(-1/2) ? exp{ -?(b) } db.

Let b^ be the mode obtained from solving

??(b) / ?b = 0.

The first- and second-order derivatives give:

?'(b) =
- ?_{j=1}^{n_i}
(Y_ij - µ_ij^b) g_0'(?_ij^b)
/ [ f a_ij v(µ_ij^b) ]

  • b / D,

??(b) = Z_i^T W_i Z_i + D^(-1),

where

Z_i = 1_{n_i},

and

W_i = diag( f^(-1) a_ij^(-1) v(µ_ij^b)^(-1) [ g_0'(?_ij^b) ]^2 ).

Ignoring the remainder term yields the Laplace approximation of the quasi-likelihood function:

q_L(?, d, ?) ˜
- (1/2) log |D|
- (1/2) log | Z_i^T W_i Z_i + D^(-1) |
- (? / 2)( ?^T ? + d^T d ).

(14)

Training objective

Treating the WiW_iWi? term in Equation (14) as negligible (Mandel et al., 2023) leads to the objective maximized during network training:

(15)

q(?, d, ?) ?
(1 / f) ?{i=1}^{m} ?{j=1}^{n_i}
?{ij}^b}^{y_{ij}}
(y_{ij} - u) / [ a_{ij} v(u) ] du
- b^2 / (2D)
- ? ( ?^T ? + d^T d ).

Explanation of Terms

  • The first term corresponds to the quasi-likelihood contribution from the observations.

  • The second term represents the Gaussian prior penalty on the random effect b~N(0,D)b \sim N(0, D)b~N(0,D).

  • The third term is the L2 regularization penalty applied to the network weights and biases.



Quasi-score equations (single hidden layer)

Differentiating (15) with respect to the network parameters yields the quasi-score system:

(QS1)

?q_l / ??_k^(0)
= (1/f) ?{i=1}^{42} ?{j=1}^{n_i}
(Y_ij - µ_ij^b) g_0'(?_ij^b) a_ijk^(1)
- 2? ?_k^(0)

(QS2)

?q_l / ?d^(0)
= (1/f) ?{i=1}^{42} ?{j=1}^{n_i}
(Y_ij - µ_ij^b) g_0'(?_ij^b)
- 2? d^(0)

(QS3)

?q_l / ??_kl^(1)
= (1/f) ?{i=1}^{42} ?{j=1}^{n_i}
(Y_ij - µ_ij^b) g_0'(?_ij^b) ?_k^(0) g_1'(s_ij,k) X_ij,l
- 2? ?_kl^(1)

(QS4)

?q_l / ?d_k^(1)
= (1/f) ?{i=1}^{42} ?{j=1}^{n_i}
(Y_ij - µ_ij^b) g_0'(?_ij^b) ?_k^(0) g_1'(s_ij,k)
- 2? d_k^(1)

Where:

?_ij^b = ?^(0) a_ij^(1) + d^(0) + b_i

s_ij,k = (?_k^(1))^T X_ij + d_k^(1)

a_ij,k^(1) = g_1(s_ij,k)

Equation (15) is maximized by solving (QS1)–(QS4) jointly with

??(b) / ?b = 0.

We use backpropagation for the network parameters and stochastic gradient descent on b_i, updating b^_i at each epoch while keeping D fixed at its REML estimate (as in GLMM, with X_ij^T ß replaced by ?^(0) a_ij^(1) + d^(0)).

 

Summary of the Algorithm

The GNMM network transforms the 17-dimensional vector of voice features into a latent disease score, allowing nonlinear interactions and saturation effects to influence the predicted Total_UPDRS. The random intercept bib_ibi? absorbs persistent patient-level deviations, so estimates borrow strength across subjects (partial pooling). An L2 penalty ?(???+d?d)\lambda(\omega^\top \omega + \delta^\top \delta)?(???+d?d) decreases large weights and biases, mitigating over-fitting.

The algorithm flow used is summarized as below (Algorithm 1):

Algorithm 1

Stochastic training of GNMM on the Parkinson data (adapted from Mandel et al. (2023))

Require: scaled features XXX, scaled targets YYY, subject indices; epochs EEE, batch size BBB, learning rate ?\eta?

  1. Initialize parameters ?\theta? (Xavier), bi=0b_i = 0bi?=0, s2?1\sigma^2 \leftarrow 1s2?1, t2?1\tau^2 \leftarrow 1t2?1

  2. For e=1,…,Ee = 1, \dots, Ee=1,…,E do

3. Shuffle the training set

4. For each mini-batch BeB^eBe of size BBB do

5. Compute µijb\mu_{ij}^bµijb? for (i,j)?Be(i,j) \in B^e(i,j)?Be via (10)

6. Evaluate mini-batch loss LBeL_{B^e}LBe? from (15)

7. Back-propagate: compute ??LBe\nabla_\theta L_{B^e}???LBe?

8. Update parameters:
???-???LBe\theta \leftarrow \theta - \eta \nabla_\theta L_{B^e}???-????LBe?

9. Update each bib_ibi? that appears in BeB^eBe

10. End for

11. Update s2?\sigma^2 \leftarrows2? mean squared residual over the full training set

12. Update t2?\tau^2 \leftarrowt2? sample variance of {bi}\{ b_i \}{bi?}

  1. End for

 

Implementation and Results

We implemented the GNMM in R using the gnmm.sgd and gnmm.predict routines provided in the Supplementary Material of Mandel et al. (2023) and tailored for our own dataset. Following the evaluation strategy defined for the LMM and GAMM, the final visit for each of the 42 subjects was held out as the test set.

We compare the following models:

  • 1-layer GNMM: one hidden layer with three ReLU nodes, ridge penalty ? = 0.001, learning rate 0.005, random intercept included.
  • 2-layer GNMM: two hidden layers (three and two nodes), ? = 0.002, learning rate 0.005, random intercept included.
  • ANN baseline: one hidden layer with three nodes, ? = 0.001, learning rate 0.001, no random effect.

Evaluation Strategy and Limitations. Predictive accuracy was measured on the held-out visits using mean squared error (MSE) and mean absolute error (MAE). To account for initialization variability in high-capacity models, each model was trained five times with independent random seeds; Table 6 reports the average performance.

Following the strategy of Mandel et al. (2023), the final visit for every subject was held out for testing. It is important to clarify that this split evaluates the model’s ability to forecast disease trajectory for patients already under monitoring (interpolation), rather than generalizing to entirely new subjects (extrapolation).  We also acknowledge that the data magnitude presents a specific challenge for stability. While the total number of observations is substantial (Nobs = 5, 875), the effective sample size for estimating subject-specific random effects is constrained to the number of patients (N = 42). This creates a high parameter-to-sample ratio for the neural mixed-effects models, particularly the NME which optimizes non-linear parameters across a multi-layer architecture. To rigorously assess robustness in this small-N regime, we report results averaged over the five independent training runs to mitigate initialization bias. The single-layer GNMM attains the lowest prediction error, reducing test-set MSE by 15% relative to the two-layer variant and by 15.3% relative to the feed-forward network without random effects. The average predictive performance across five independent runs is summarized in Table 4.

 

Table 4: Average test-set error over five independent runs

Model

MSE

MAE

1-layer GNMM

96.82

6.96

2-layer GNMM

106.09

7.56

ANN (no random effect)

114.20

8.47

3.5 Neural Mixed-Effects (NME) Model for Longitudinal UPDRS Prediction

Another recently introduced neural network model that can be applied to our case is the Neural Mixed-Effects (NME) model.

Classical mixed-effects models (LMM, GLMM) effectively handle subject heterogeneity in longitudinal data but are typically restricted to linear fixed effects. Conversely, standard neural networks can learn rich nonlinear relationships but often ignore the within-subject correlation inherent in repeated measures. The Neural Mixed Effects (NME) framework, as proposed by Wörtwein et al. Wörtwein et al. (2023), elegantly combines these strengths. This framework permits the inclusion of nonlinear subject-specific parameters at any layer of the network and utilizes stochastic gradient descent for optimization, which ensures scalability with both the number of patients (m) and the total number of visits (i.e.,

?_{i=1}^{m} n_i

Applying the NME approach to our Parkinson’s tele-monitoring study offers several advantages. First of all, it allows for the learning of complex relationships between voice features and disease severity without the need for pre-specifying interaction terms.

Additionally, the NME model employs partial pooling for its parameter estimates. This approach allows the model  to share information across different patients, leading to more robust and reliable individual-specific parameters, particularly for patients with fewer observations, by balancing individual data with overall population trends.

NME Parameterization.

Let i = 1, …, m index the m = 42 participants in the Parkinson’s tele-monitoring study, and let j = 1, …, n_i index their repeated visits.

At each visit j for participant i, we observe the response Y_ij, representing the UPDRS score, and a p-dimensional predictor vector X_ij ? R^17, consisting of test_time (time of assessment) and 16 scaled voice features.

Following Wörtwein et al. (2023), the NME model decomposes the network parameters into two components:

  1. A person-generic component ?, which is shared across all participants and captures common population-level trends.

  2. A person-specific component ?_i, unique to participant i, capturing individual deviations from the generic trend.

The effective parameters for participant i are therefore:

?_i = ? + ?_i.

The person-specific components ?_i are typically regularized by assuming they follow a multivariate normal distribution:

?_i ~ N(0, S),

where S is a covariance matrix, often assumed to be diagonal for scalability (e.g., S = t² I).

Network Architecture for UPDRS Prediction.

To capture the complex mapping between voice features and disease severity, we employ a multilayer perceptron (MLP) architecture. Unlike standard MLPs, our implementation integrates the mixed-effects structure directly into the network weights. This enables the model to learn a global population trend (via person-generic parameters ?) while simultaneously adjusting weights and biases for each individual (via person-specific deviations ?_i), thereby personalizing predictions for each patient.

Specifically, we implement a two-hidden-layer MLP with:

  • k1 = 32 units in the first hidden layer,

  • k2 = 16 units in the second hidden layer.

The parameters of this network are decomposed into:

  • Person-generic components (elements of ?), and

  • Person-specific deviations (elements of ?_i),

as defined in the NME parameterization.

For predicting Total UPDRS, the network operations involving these decomposed parameters are defined layer by layer.

First Hidden Layer

The first hidden layer maps the input feature vector X_ij ? R¹7 to k1 hidden units:

a_ij^(1) = g_{1a} ( (O^(1) + ?_{O(1),i}) X_ij + (d^(1) + ?_{d(1),i}) )

where:

  • O^(1) are the person-generic weights,

  • ?_{O(1),i} are the subject-specific deviations for participant i,

  • d^(1) are the generic biases,

  • ?_{d(1),i} are the subject-specific bias deviations,

  • g_{1a}(·) is the ReLU activation function.

Second Hidden Layer

The second hidden layer maps a_ij^(1) to k2 hidden units:

a_ij^(2) = g_{1b} ( (O^(2) + ?_{O(2),i}) a_ij^(1) + (d^(2) + ?_{d(2),i}) )

where:

  • O^(2), d^(2) are generic parameters,

  • ?_{O(2),i}, ?_{d(2),i} are subject-specific deviations,

  • g_{1b}(·) is also ReLU.

Output Layer

The final prediction is:

Y_ij = µ_ij^{NME}
= g_0 ( (?^(0) + ?_{?(0),i}) a_ij^(2) + (d^(0) + ?_{d(0),i}) )

Since Total UPDRS is a continuous outcome, the output activation function is the identity:

g_0(x) = x.

This architecture enables the model to simultaneously:

  • Learn global nonlinear relationships between voice features and disease severity, and

  • Capture patient-specific nonlinear adjustments through ?_i.

 

(16)

a_ij^(1) = g_{1a} ( (O^(1) + ?_{O(1),i}) X_ij + (d^(1) + ?_{d(1),i}) )

(17)

a_ij^(2) = g_{1b} ( (O^(2) + ?_{O(2),i}) a_ij^(1) + (d^(2) + ?_{d(2),i}) )

(18)

Y_ij = µ_ij^{NME} = g_0 ( (?^(0) + ?_{?(0),i}) a_ij^(2) + (d^(0) + ?_{d(0),i}) )

In these equations, X_ij represents the input features for subject i at visit j. The term a_ij^(1) denotes the activations of the first hidden layer. These are computed using the input-to-first-hidden-layer person-generic weights O^(1) and person-specific weight deviations ?_{O(1),i}, together with the corresponding biases d^(1) and their subject-specific deviations ?_{d(1),i}, followed by the activation function g_{1a}(·).

Similarly, a_ij^(2) represents the activations of the second hidden layer, taking a_ij^(1) as input. This layer uses person-generic weights O^(2) and biases d^(2), with their respective person-specific deviations ?_{O(2),i} and ?_{d(2),i}, followed by its activation function g_{1b}(·).

For both hidden layers, the activation function used is the Rectified Linear Unit (ReLU).

The final prediction Y_ij (or µ_ij^{NME}) is obtained from the output layer. This layer takes a_ij^(2) as input and applies the second-hidden-to-output-layer person-generic weights ?^(0) and output bias d^(0), along with their person-specific deviations ?_{?(0),i} and ?_{d(0),i}. The output layer activation function g_0(·) is the identity function (g_0(x) = x), since Total_UPDRS is a continuous response.

Collectively, the person-generic parameters are:

{ O^(1), d^(1), O^(2), d^(2), ?^(0), d^(0) }

and the person-specific deviations for subject i are:

{ ?_{O(1),i}, ?_{d(1),i}, ?_{O(2),i}, ?_{d(2),i}, ?_{?(0),i}, ?_{d(0),i} }

All these parameters are estimated during training. If a specific parameter (or an entire layer’s parameters) is not intended to have a patient-specific component, its corresponding entries in ?_i are fixed at zero.

Loss Function and Optimization.

The NME objective function is optimized per epoch. For our regression task with squared error loss,

l(Y_ij, Y_ij) = (Y_ij - Y_ij)²,

and assuming a diagonal person-specific parameter covariance matrix
S = t² I (so S?¹ = (1/t²) I),
the objective function, adapted from Equation (1) of Wörtwein et al. (2023), is:

(19)

L(?, ?) =
?{i=1}^{m} ?{j=1}^{n_i} (1/s²) (Y_ij - Y_ij^{NME})²

  • ?_{i=1}^{m} (1/t²) ??_i?²

where:

  • s² is the observational (residual) variance, typically estimated as the mean squared error on the training data after each epoch.

  • The first term encourages fidelity to the data.

  • The second term penalizes large deviations of person-specific parameters ?_i from zero, effectively shrinking them toward the person-generic parameters ?.

Mini-batch Training

For stochastic gradient descent using mini-batches, the loss for a mini-batch B of size |B| (containing observations from a set of unique subjects B_subjects) is defined as follows.

The data fidelity part is the average loss over the batch. The regularization penalty is applied per subject within the batch, scaled by the proportion of that subject’s total observations present in the current batch (Wörtwein et al., 2023).

Thus, the mini-batch loss is:

(20)

L_B =

(1 / |B|) ?_{(i,j) ? B} (1/s²) (Y_ij - Y_ij^{NME})²

  • ?_{k ? B_subjects} (N_k^B / n_k) (1/t²) ??_k?²

Where:

  • N_k^B is the number of observations for subject k in the current mini-batch B.

  • n_k is the total number of training observations for subject k.

This scaling ensures that the regularization term for each subject is weighted according to its representation in the current batch relative to its total contribution to the dataset.

 

Gradient Updates

The parameters {?, ?_i}_{i=1}^m are updated using gradients derived from the loss function L (Equation 19). For any parameter ? (which could be a component of ? or a component of some ?_i), the update uses its partial derivative.

Let

E_ij = Y_ij - Y_ij^{NME}

denote the prediction error for subject i at visit j.

The derivative of the data fidelity part of the loss with respect to the model output Y_ij^{NME} (assuming squared error loss)

l(Y_ij, Y_ij) = (Y_ij - Y_ij)²

is

?L / ?Y_ij^{NME}
= -(Y_ij - Y_ij^{NME})
= -E_ij.

Thus, the common error signal propagated backward from the loss, scaled by the residual variance, is

d_ij^{out} = (2 / s²) E_ij.

For our two-hidden-layer network, where µ_ij^{NME} is defined by Equations (16)–(18), we have:

  • g_0'(z) = 1 (identity output activation),

  • g_{1a}'(·) and g_{1b}'(·) are derivatives of the ReLU activation functions for the first and second hidden layers, respectively.

Let:

  • s_ij^{(1)} be the pre-activation for the first hidden layer,

  • s_ij^{(2)} be the pre-activation for the second hidden layer.

These pre-activations are the linear combinations before applying the ReLU nonlinearity.

 

Output Layer Parameters

The output layer directly computes the prediction
Y_ij^{NME}.

For an element ?_p^(0) of the generic output weights (connecting the p-th unit of the second hidden layer to the output), the gradient is:

?L / ??_p^(0)
= ?{i=1}^m ?{j=1}^{n_i} d_ij^{out} a_{ij,p}^{(2)}

This gradient aggregates the product of the output error signal and the corresponding activation from the second hidden layer across all observations.

For the person-specific deviation of an output weight ?_{?_p^(0),k} for subject k:

?L / ??_{?_p^(0),k}
= ?{j=1}^{n_k} d{kj}^{out} a_{kj,p}^{(2)}

  • 2 [S^{-1} ?_k]_{?_p^(0)}

where:

  • n_k is the number of observations for subject k,

  • The first term accumulates subject-specific contributions,

  • The second term is the regularization term that penalizes large deviations from the person-generic parameter.

 

Second Hidden Layer Parameters

Gradients for the second hidden layer involve backpropagating the error signal through the output layer weights.

For an element O_{pq}^{(2)} of the generic weights of the second hidden layer (connecting the q-th unit of the first hidden layer to the p-th unit of the second hidden layer):

?L / ?O_{pq}^{(2)}
= ?{i=1}^{m} ?{j=1}^{n_i}
d_{ij}^{out} (?_p^{(0)} + ?_{?_p^{(0)},i})
g_{1b}'(s_{ij,p}^{(2)})
a_{ij,q}^{(1)}

This expression shows that the gradient is formed by propagating the output error through:

  • the output weight (including its subject-specific deviation),

  • the derivative of the second hidden layer activation,

  • and the activation from the first hidden layer.

For an element ?_{O_{pq}^{(2)},k} of a person-specific deviation of a second hidden layer weight for subject k:

?L / ??_{O_{pq}^{(2)},k}}
= ?{j=1}^{n_k}
d
{kj}^{out} (?_p^{(0)} + ?_{?_p^{(0)},k})
g_{1b}'(s_{kj,p}^{(2)})
a_{kj,q}^{(1)}

  • 2 [S^{-1} ?_k]{O{pq}^{(2)}}

where:

  • n_k is the number of observations for subject k,

  • The first term accumulates subject-specific backpropagated error contributions,

  • The second term is the regularization penalty shrinking deviations toward zero.

First Hidden Layer Parameters

For an element O_{lq}^{(1)} of the generic weights of the first hidden layer (connecting the l-th input feature to the q-th unit of the first hidden layer):

?L / ?O_{lq}^{(1)}
= ?{i=1}^{m} ?{j=1}^{n_i} ?{p=1}^{k_2}
d
{ij}^{out}
(?_p^{(0)} + ?_{?_p^{(0)},i})
g_{1b}'(s_{ij,p}^{(2)})
(O_{pq}^{(2)} + ?_{O_{pq}^{(2)},i})
g_{1a}'(s_{ij,q}^{(1)})
X_{ij,l}

This expression reflects full backpropagation through:

  • the output layer weights,

  • the second hidden layer activation,

  • the second hidden layer weights,

  • the first hidden layer activation,

  • and finally the input feature X_{ij,l}.

For an element ?_{O_{lq}^{(1)},k} of a person-specific deviation of a first hidden layer weight for subject k:

?L / ??_{O_{lq}^{(1)},k}
= ?{j=1}^{n_k} ?{p=1}^{k_2}
d_{kj}^{out}
(?_p^{(0)} + ?_{?_p^{(0)},k})
g_{1b}'(s_{kj,p}^{(2)})
(O_{pq}^{(2)} + ?_{O_{pq}^{(2)},k})
g_{1a}'(s_{kj,q}^{(1)})
X_{kj,l}

  • 2 [S^{-1} ?_k]{O{lq}^{(1)}}

where:

  • k_2 is the number of units in the second hidden layer,

  • n_k is the number of observations for subject k,

  • The final term is the regularization penalty shrinking subject-specific deviations.

The pre-activations are:

  • s_{ij,q}^{(1)} for the q-th unit of the first hidden layer,

  • s_{ij,p}^{(2)} for the p-th unit of the second hidden layer.

Gradients for all bias terms (d^(1), ?_{d^(1),i}, d^(2), ?_{d^(2),i}, d^(0), ?_{d^(0),i}) follow analogously by applying the chain rule, where the input to a bias is 1.

During mini-batch optimization, these sums are taken over observations (i, j) in the current mini-batch B, and the regularization term is applied only for subjects k whose parameters ?_k are being updated.

Summary of the Algorithm

The Neural Mixed Effects (NME) model is trained using an iterative, optimization-based procedure. Within each epoch, stochastic gradient descent (e.g., the Adam optimizer) is employed to update the person-generic parameters ? and the person-specific deviations {?_i}_{i=1}^m.

The variance components — namely the observational (residual) variance s² and the covariance matrix of the person-specific parameters S — are generally updated between epochs. For example, s² can be estimated based on the mean squared error computed on the training data using the current parameter estimates. The covariance matrix S is often assumed to be diagonal (e.g., S = t² I) for scalability and is updated based on the sample covariance of the current person-specific deviations.

This iterative training procedure allows the NME model to learn both:

  • the overall population trend (via ?), and

  • subject-specific nonlinear deviations (via ?_i)

simultaneously.

The person-specific deviations are regularized by their prior distribution, typically governed by the estimated covariance structure S. This regularization helps prevent overfitting and enables robust estimation, even for subjects with limited data.

The overall iterative process is outlined below.

 

Algorithm 2

General Training Procedure for the Neural Mixed Effects (NME) Model

Require:
Scaled training features
X' = {X'_ij},

Scaled training targets
Y' = {Y'_ij},

Subject indices for observations.

Require:
Architectural choices (e.g., number of layers, number of units),
Number of epochs E,
Learning rate ?,
Batch size B.

  1. Initialize person-generic parameters ? (e.g., Xavier initialization).

  2. Initialize person-specific deviations {?_i}_{i=1}^m
    (e.g., zeros or small random values).

  3. Initialize covariance matrix S
    (e.g., a scaled identity matrix).

  4. Initialize residual variance s²
    (e.g., set to 1 or estimated from an initial pass over the data).

  5. Initialize optimizer
    (e.g., Adam with learning rate ?).


  1. For epoch e = 1 to E do:

7. Shuffle training data (X', Y').

8. For each mini-batch b of size B do:

9. For each observation (k, j) in batch b
(subject k, observation j):

10. Compute prediction
Y'_kj = f(X'_kj ; ?, ?_k).

11. Compute mini-batch loss L_b
(e.g., based on Equation (20)),
using current s² and S.

12. Compute gradients with respect to ? and the relevant ?_k
for subjects appearing in the batch:

?? L_b, ?{?_k} L_b.

13. Update ? and the relevant ?_k
using the optimizer step.

14. End for (mini-batches)

15. Update s²
using the average squared residuals over the full training set
with current ? and {?_i}.

16. Update S
based on the sample covariance of the current
person-specific deviations {?_i}_{i=1}^m.

17. Adjust learning rate and/or check early stopping criteria,
if applicable.

  1. End for (epochs)

  2. Return learned parameters:

?^, {?^_i}_{i=1}^m, S^, s^².

 

Implementation and Results

Our application of the Neural Mixed Effects (NME) model to predict Total UPDRS scores was based on the publicly available PyTorch implementation provided by Wörtwein et al. (2023), adapted and tailored to our specific dataset.

Input voice features and test_time were standardized prior to model training. We configured the NME model using a two-hidden-layer multilayer perceptron (MLP) as the base network, with:

  • 32 units in the first hidden layer,

  • 16 units in the second hidden layer,

  • ReLU activation functions in both hidden layers.

Person-specific random effects (?_i) were applied to the output layer bias.

The model was trained for 4000 epochs using the Adam optimizer with a batch size of 512. During training:

  • The observational (residual) variance s² was updated iteratively.

  • A diagonal person-specific parameter covariance matrix S = t² I was maintained and updated based on the sample variance of the current subject-specific deviations.

Predictive performance was evaluated using:

  • Mean Squared Error (MSE), and

  • Mean Absolute Error (MAE).

Table 5 summarizes the key performance metrics on the test set.

 

Table 5: Performance of the NME Model on the Test Set for Total UPDRS Prediction.

Model

MSE

MAE

NME-MLP

103.4075

8.1786

4. Analysis of the Results

To ensure a rigorous comparison, all predictive metrics reported in Table 6 were calculated on the original total_UPDRS scale. Predictions from models trained on transformed targets (log-transformed LMM/GAMM and standardized Neural Networks) were inverse-transformed to the clinical domain before error computation.

Table 6 reveals a stark contrast in performance. The traditional statistical models (LMM and GAMM) achieved low error rates (MSE 6.56 - 7.70), corresponding to a Root Mean Squared Error (RMSE) of approximately 2.5 points. This indicates these models successfully utilized the subject-specific random effects to interpolate the patient’s individual trajectory.

In contrast, the neural architectures (GNMM, NME, ANN) yielded MSE values in the range of 96 - 114. These values are approximately equal to the global variance of the dataset, implying that the deep learning models failed to effectively learn the subject-specific deviations (bi) from the limited sample of N = 42 subjects. Instead of personalizing the predictions, the neural networks essentially reverted to predicting the population mean, resulting in errors roughly 4 x higher than the mixed-effects models. This validates our hypothesis that while neural models offer theoretical flexibility, they require significantly larger cluster counts to outperform efficient semi-parametric methods in longitudinal telemonitoring.

In this study, we developed and compared a suite of modeling approaches for predicting Parkinson’s disease progression, using a rich longitudinal voice dataset and the total UPDRS score as the clinical outcome. We evaluated both traditional statistical models, Linear Mixed Effects Models (LMM) and Generalized Additive Mixed Models (GAMM), as well as machine learning-based extensions, Generalized Neural Network Mixed Models (GNMM) and Neural Mixed Effect Models (NME-MLP).

To assess predictive performance, we constructed a test set composed of each subject’s last available time point (42 in total), while the remaining data were used for model training. This setup reflects a realistic clinical use case: forecasting future UPDRS values for already-observed patients, rather than for entirely new individuals.

Table 6 reports the mean squared error (MSE) and mean absolute error (MAE) of each model. Among all methods, GAMM achieved the best performance with the lowest MSE (6.56) and MAE (2.00), indicating that the spline-based temporal effect captured meaningful nonlinear disease progression patterns. The LMM, although simpler, performed nearly as well (MSE = 7.70), confirming the value of mixed-effects modeling with carefully selected covariates and interactions.

 In contrast, the deep learning models, 1-layer and 2-layer GNMMs, ANN without random effects, and NME- MLP—performed substantially worse, with MSEs exceeding 96 and MAEs exceeding 6, which is counter-intuitive. This finding is consistent with our recent evidence that larger, more complex biomedical deep models do not automatically improve clinical prediction over simpler baselines when the dataset and signal structure do not support the added capacity Tong et al. (2025c). As we normally assume that newer and complicated models outperforms the order and simpler ones. But that is not always the case, for any datasets which have many observations but only a modest number of predictors (n p) a simple linear or spline-based model can already approximate the input–output mapping well, so the added capacity of deep networks does not translate into lower error unless it is strongly regularized. While these architectures are expressive, their complexity and lack of explicit structure for within-subject correlation hinder their predictive accuracy in settings like ours.

As shown in Table 6, the GAMM significantly outperforms the neural baselines. To visualize this performance gap, we refer back to the trajectory complexity in Figure 1(c). The GAMM’s spline 

components effectively capture these smooth, non-linear subject trends without the parameter bloat of the NME or GNMM. This performance disparity is further illustrated in Figure 4, which compares the MSE and MAE across all evaluated models. The neural models, by trying to learn these non-linearities from scratch with limited subjects, failed to converge to a generalizable solution, resulting in high variance across the five training runs.

This phenomenon underscores the critical role of sample size efficiency in clinical modeling. The LMM and GAMM impose structural assumptions (linearity and smoothness) that act as effective regularizers in small-cohort regimes  (N = 42). Conversely, the NME architecture is designed to optimize non-linear random effects, a task that requires a denser sampling of subject-specific trajectories to converge effectively. Therefore, the observed performance gap is not a failure of the neural architecture per se, but rather a demonstration that semi-parametric statistical methods provide a more robust and data-efficient baseline for current telemonitoring cohort sizes. While this study provides a rigorous benchmark on the Oxford dataset, the primary limitation is the cohort size (N = 42). The poor performance of the NME model is directly attributable to the lack of sufficient subject-level clusters to map the manifold of random effects. Future work must focus on external validation using larger, multi-center datasets such as the Parkinson’s Progression Markers Initiative (PPMI). Only with datasets exceeding hundreds of subjects can the theoretical flexibility of the NME framework be properly calibrated against the risk of overfitting.

The Role of Genetics, Lifestyle, and Medication

While our study focused on acoustic biomarkers due to the constraints of the telemonitoring dataset, we acknowledge that PD progression is multifactorial. Recent literature highlights the significant impact of lifestyle factors and genetics on disease trajectories. For instance, Paul et al. Paul et al. (2021) demonstrated that lifestyle factors, including physical activity and diet, substantially modulate PD risk and progression.

Genetic heterogeneity also plays a critical role in progression rates, which our current models absorb into the subject- specific random effects (bi) rather than modeling explicitly. Studies have shown that patients with GBA mutations often exhibit a "malignant" phenotype characterized by more rapid motor and cognitive decline compared to idiopathic PD Ortega et al. (2021); Simuni et al. (2020). In contrast, LRRK2 carriers, particularly those with the G2019S mutation, often display a milder motor phenotype and slower progression of UPDRS scores Saunders-Pullman et al. (2018). The omission of these genetic covariates likely inflates the unexplained variance (s2) in our mixed models, suggesting that future NME architectures should include static genetic embeddings to refine individual trajectory predictions.

Furthermore, the current dataset does not detail the “On/Off” medication status or specific levodopa equivalent daily doses (LEDD) for every recording. This is a critical limitation because therapeutic response introduces non-linear fluctuations in voice features that pure acoustic models may struggle to distinguish from disease progression. Systematic reviews indicate that while Levodopa significantly improves limb rigidity and bradykinesia, its effect on axial symptoms like speech is inconsistent; it may modify fundamental frequency (F0) and jitter but often fails to improve vocal intensity or articulation Pinho et al. (2018). This discrepancy means that a patient might show improvement in motor-UPDRS (limb scores) due to medication, while their voice biomarkers remain unchanged or degrade, confusing the model’s mapping function. Finally, distinct clinical subtypes—such as the "diffuse malignant" versus "mild motor-predominant" phenotypes identified by Fereshtehnejad et al. Fereshtehnejad et al. (2017)—suggest that progression is not just rate-variable but structurally different across subgroups. Future telemonitoring frameworks should aim to integrate these multimodal data streams—combining voice, genetics, lifestyle factors, and pharmacodynamics—into the mixed-effects architecture.

5. Conclusion and Future Work

In summary, this study presented the first benchmarking of the Neural Mixed Effects (NME) framework against traditional statistical baselines for Parkinson’s telemonitoring. Our results demonstrate that while neural architectures offer theoretical flexibility, classical semi-parametric models—specifically the GAMM—outperformed deep learning approaches (MSE 6.56 vs. 103.41) in the small-sample regime typical of clinical trials (N = 42). Our findings indicate that incorporating smooth effects and subject-level random structures remains the most robust and interpretable strategy for the near-term forecasting of disease severity when high-volume data is unavailable.

However, our analysis of residuals and literature review (Section 4.1) identified critical sources of unmodeled variance that define the roadmap for future research. First, the impact of genetic heterogeneity (e.g., the rapid progression of GBA carriers versus the slower trajectory of LRRK2 mutations) suggests that future NME architectures must move beyond pure time-series inputs. A promising direction is the development of static-dynamic hybrid networks, where static genetic embeddings and demographic factors are fused with dynamic voice vectors in the lower layers of the NME to inform the subject-specific parameter deviations (?i). Similarly, the confounding effect of Levodopa cycles requires the integration of pharmacokinetic pharmacodynamic (PK/PD) layers that can explicitly model the non-linear "On/Off" oscillations in voice features, distinct from the long-term neurodegenerative trend.

To further delineate the scalability of these architectures, future work will also focus on external validation using larger, multi-center datasets such as the Parkinson’s Progression Markers Initiative (PPMI). While our current study establishes GAMM as the optimal choice for pilot cohorts, assessing NME on datasets exceeding hundreds of subjects will allow us to empirically map the sample size threshold where the theoretical flexibility of neural networks begins to overtake the data efficiency of statistical baselines.

Another key technical limitation of the current neural approaches is their lack of an explicit variable-selection mechanism. Neither the GNMM nor the NME-MLP currently implements automated feature pruning, forcing reliance on trial-and-error which is inefficient for high-dimensional acoustic sets. Future work should focus on integrating sparsity-inducing penalties, such as the l1 Group LASSO or Bayesian spike-and-slab priors, directly into the NME loss function. This would allow the model to automatically discard uninformative biomarkers, enhancing both generalization and clinical interpretability. While post-hoc explainability methods like SHAP or LRP have been applied to PD models Alshammari et al. (2025); Rezaei et al. (2024), developing mixed-effects models with intrinsic sparsity would provide more reliable trust mechanisms for clinicians. Building on the benchmark guidelines summarised by Tong et al. Tong et al. (2025a), we plan to construct a transparent test bed that rigorously compares these sparse neural extensions against classical methods.

Finally, the ultimate goal is to embed these models into a seamless telemedicine workflow Li et al. (2025). However, clinical translation faces significant hurdles regarding standardization and privacy, particularly concerning device heterogeneity. In a real-world Bring-Your-Own-Device (BYOD) scenario, variations in smartphone microphones and background noise processing can introduce non-biological variance in Jitter and Shimmer features. To address this, future models must incorporate domain adaptation techniques or be trained on data augmented with simulated channel noise to ensure robustness across different recording environments.

In parallel with standardization efforts, privacy concerns dictate the need for secure deployment strategies. To comply with strict data privacy regulations such as HIPAA and GDPR, transmitting raw voice data to the cloud presents a substantial risk. A deployable system should therefore utilize edge computing, where feature extraction and the forward pass of the light-weight GAMM occur locally on the patient’s device. In this architecture, only the computed risk scores—rather than the raw voice recordings—would be transmitted to the clinician’s dashboard. Achieving this vision requires robust calibration procedures and clear data-governance protocols, and pilot studies integrating these components will be essential to demonstrate clinical utility before large-scale deployment.

Author contributions

R.T. conceptualized the study, developed the Neural Mixed Effects (NME) modeling framework, performed statistical analyses, and drafted the manuscript. L.W. contributed to methodological design, data analysis validation, and comparative benchmarking of statistical and neural models. T.W. provided biological interpretation of Parkinson’s disease progression markers and assisted in result contextualization. W.Y. contributed clinical expertise, supported interpretation of telemonitoring implications, and critically revised the manuscript for important intellectual content. All authors reviewed, edited, and approved the final version of the manuscript and agree to be accountable for all aspects of the work.

Acknowledgment

The authors gratefully acknowledge the University of Texas at Dallas, Duke University, and The Second Affiliated Hospital of Zhejiang University School of Medicine for providing academic and research support. The authors also acknowledge the publicly available Oxford Parkinson’s telemonitoring dataset and the patients whose contributions enabled this research.

Supporting Information

Web Appendices referenced in Sections 3 and 4 are provided in the Supporting Information. Python and R code, along with a simulated example, are available at https://github.com/RanTongUTD/Parkinson-Prediction/.

Open Access and Copyright Policy

This is an open access article under the CC BY-NC-ND license. (http.//creativecommons.org/licenses/by-nc-nd/4.0/). Journal: INTEGRATIVE BIOMEDICAL RESEARCH, a publication of Eman Research, USA.

References


Alshammari, H., et al. (2025). Parkinson's disease progression prediction using transformer-based time-series models and explainable AI. IEEE Access, 13, 147819–147830.

Ananthanarayanan, A., Senivarapu, S., & Murari, A. (2025). Towards causal interpretability in deep learning for Parkinson's detection from voice data. medRxiv. https://doi.org/10.1101/2025.04.25.25326311

Arora, S., Vetek, E. V., Hargrave, Z. B., et al. (2015). Detecting and monitoring the symptoms of Parkinson's disease using smartphones: A pilot study. Parkinsonism & Related Disorders, 21(6), 650–653. https://doi.org/10.1016/j.parkreldis.2015.02.026

Bloem, B. R., Post, M. R., & Dorsey, R. (2021). The expanding burden of Parkinson's disease. Journal of Parkinson's Disease, 11(2), 403–413.

Breslow, N. E., & Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88(421), 9–25. https://doi.org/10.1080/01621459.1993.10594284

Del Din, S., Godfrey, A., & Rochester, L. (2016). Free-living gait characteristics in ageing and Parkinson's disease: Impact of environment and ambulatory bout length. Journal of NeuroEngineering and Rehabilitation, 13, 46. https://doi.org/10.1186/s12984-016-0154-5

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22.

Dorsey, E. R., Bloem, B. R., et al. (2018). Global, regional, and national burden of Parkinson's disease, 1990–2016. The Lancet Neurology, 17(11), 939–953. https://doi.org/10.1016/S1474-4422(18)30295-3

Drotar, P., Mekyska, M., & Ruzicka, I. (2016). Evaluation of handwriting kinematics and pressure for differential diagnosis of Parkinson's disease. Artificial Intelligence in Medicine, 67, 39–46. https://doi.org/10.1016/j.artmed.2016.01.004

Elkholy, G. R., et al. (2025). Enhanced LSTM with attention mechanism for early detection of Parkinson's disease through voice signals. arXiv Preprint. arXiv:2502.08672.

Eskidere, Ö., Ertas, F., & Hanilçi, C. (2012). A comparison of regression methods for remote tracking of Parkinson's disease progression. Expert Systems with Applications, 39(5), 5523–5528. https://doi.org/10.1016/j.eswa.2011.11.067

Fahn, S., Elton, R. L., & Members of the UPDRS Development Committee. (1987). Unified Parkinson's disease rating scale. In S. Fahn, C. D. Marsden, D. B. Calne, & M. Goldstein (Eds.), Recent developments in Parkinson's disease (Vol. 2, pp. 153–163). Macmillan Healthcare Information.

Fereshtehnejad, S.-M., et al. (2017). Clinical criteria for subtyping Parkinson's disease: Biomarkers and longitudinal progression. Brain, 140(7), 1959–1976. https://doi.org/10.1093/brain/awx118

Filali, Y., et al. (2023). Parkinson's disease diagnosis using Laplacian score, Gaussian process regression and self-organizing maps. Diagnostics, 13(8), 1369. https://doi.org/10.3390/brainsci13040543

Gilmour, A. R., Thompson, R., & Cullis, B. R. (1995). Average information REML: An efficient algorithm for variance parameter estimation in linear mixed models. Biometrics, 51(4), 1440–1450. https://doi.org/10.2307/2533274

Goetz, C. G., Nguyen, S. T., et al. (2008). Movement Disorder Society-sponsored revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS). Movement Disorders, 23(15), 2129–2170. https://doi.org/10.1002/mds.22340

Hassan, T., et al. (2024). Comparing machine learning and deep learning models to predict cognition progression in Parkinson's disease. NPJ Digital Medicine, 7(1), 1–12.

Laird, N. M., & Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38(4), 963–974. https://doi.org/10.2307/2529876

Li, Y., Liu, S., Tong, R., Zhang, P., Bian, J., Wang, T., & Gu, P. (2025). Revolutionizing healthcare: The role of artificial intelligence in drug discovery and delivery. Journal of Angiotherapy, 9(1), 1–8.

Lin, X., & Zhang, D. (1999). Inference in generalized additive mixed models by using smoothing splines. Journal of the Royal Statistical Society: Series B, 61(2), 381–400. https://doi.org/10.1111/1467-9868.00183

Lindstrom, M. J., & Bates, D. M. (1990). Nonlinear mixed effects models for repeated measures data. Biometrics, 46, 673–687. https://doi.org/10.2307/2532087

Maity, T. K., & Pal, A. K. (2013). Subject-specific treatment to neural networks for repeated measures analysis. In Proceedings of the International MultiConference of Engineers and Computer Scientists (Vol. 1, pp. 60–65).

Mandel, F., Ghosh, R. P., & Barnett, I. (2023). Neural networks for clustered and longitudinal data using mixed effects models. Biometrics, 79(2), 711–721. https://doi.org/10.1111/biom.13615

Nilashi, M., Ibrahim, O., & Ahani, A. (2016). Accuracy improvement for predicting Parkinson's disease progression. Scientific Reports, 6, 34181. https://doi.org/10.1038/srep34181

Ortega, R. A., et al. (2021). Association of dual LRRK2 G2019S and GBA variations with Parkinson disease progression.

Patterson, H. D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58(3), 545–554. https://doi.org/10.1093/biomet/58.3.545

Paul, K. C., Chuang, Y. H., Bronstein, J. M., & Ritz, B. (2021). Lifestyle factors and Parkinson's disease risk. Pharmacological Research, 173, 105911.


Article metrics
View details
4
Downloads
0
Citations
376
Views
📖 Cite article

View Dimensions


View Plumx


View Altmetric



4
Save
0
Citation
376
View
0
Share