Equivariant Quantum Clustering with Differential Privacy: Parameter-Efficient Privacy-Preserving Analysis Across Heterogeneous Sensitive Datasets

B. M. Taslimul; Md Arifur; Tawfiq Al Islam; Abdullah Al; Abir

doi:10.25163/ai.1110790

Journal of Ai ML DL

Journal of Ai ML DL | Online ISSN 3070-2143

Citations

27.9k

Views

Articles

Submit

Volume 1 Number 1 2025

Figures and Tables

RESEARCH ARTICLE (Open Access)

Previous Next Contents Vol 1 (1)

Equivariant Quantum Clustering with Differential Privacy: Parameter-Efficient Privacy-Preserving Analysis Across Heterogeneous Sensitive Datasets

B. M. Taslimul Haque¹, Md Arifur Rahman², Tawfiq Al Islam Foysal³, Abdullah Al Noman⁴, Abir Ahmed⁵

+ Author Affiliations

Journal of Ai ML DL 1 (1) 1-24 https://doi.org/10.25163/ai.1110790

Submitted: 18 November 2024 Revised: 14 January 2025 Published: 22 January 2025

Abstract

Clustering sensitive data — patient records, network traffic logs, behavioral profiles — sits at an uncomfortable intersection of analytical necessity and privacy risk. Traditional clustering algorithms were not designed with privacy in mind, and adapting them retrospectively through differential privacy noise typically degrades clustering utility in ways that limit practical value. Quantum computing offers a structurally different computational substrate, but whether its theoretical privacy advantages translate into measurable empirical gains remains an open and genuinely contested question. This paper introduces Symmetry-Aware Equivariant Quantum Clustering (EQC), a framework that integrates p4m symmetry constraints into quantum circuits via parameter sharing, combined with rigorously composed differential privacy guarantees across all pipeline stages. Three privacy-sensitive datasets were used for evaluation — NSL-KDD network intrusion records, CERT Insider Threat v6.2 behavioral logs, and a Synthetic MIMIC-III clinical dataset — spanning meaningfully different domain characteristics. EQC achieves 79.3% clustering accuracy on NSL-KDD while reducing membership inference attack success to 38.3%, compared to 61.7% for classical baselines under equivalent privacy budgets (ε = 1.0, δ = 10?5). Ablation studies confirm that performance gains arise primarily from parameter reduction through equivariance constraints and differential privacy noise rather than from uniquely quantum mechanical effects — an honest finding that reshapes, but does not diminish, the contribution. EQC establishes a credible quantum-ready clustering framework with rigorously validated privacy-utility tradeoffs, designed for sensitive data environments where both formal guarantees and practical accuracy are non-negotiable.

Keywords: Equivariant quantum clustering; differential privacy; membership inference attacks; privacy-preserving machine learning; quantum kernel methods.

1. Introduction

Data clustering sits at the heart of modern analytics. From detecting anomalous network traffic to grouping patient cohorts for clinical research, unsupervised pattern discovery has become indispensable across healthcare, finance, cybersecurity, and behavioral analytics (Aïmeur et al., 2007). Yet the very datasets that make clustering most valuable — records containing personally identifiable information, protected health data, or sensitive behavioral logs — are precisely the ones that traditional clustering methods handle least safely. Standard algorithms like k-means, DBSCAN, and hierarchical clustering were designed for performance, not privacy; they can inadvertently expose cluster centroids, statistical summaries, or individual membership information in ways that enable data reconstruction (Aggarwal & Yu, 2008). Under regulatory frameworks like GDPR, HIPAA, and CCPA, this is not merely an academic concern. It is a compliance failure waiting to happen.

Classical privacy-preserving approaches have made genuine progress. Differential privacy has been adapted to clustering algorithms — differentially private k-means, private DBSCAN, private spectral clustering — and secure multi-party computation protocols enable collaborative analysis on encrypted data without revealing the underlying records (Blum et al., 2005; Jagannathan & Wright, 2005). Homomorphic encryption pushes further still, allowing computation directly on ciphertext (Graepel et al., 2012; Chen et al., 2019). These are real contributions. But they carry a consistent cost: utility degrades under high-dimensional data, and computational overheads can become prohibitive for large-scale deployments. Something has to give, and it is usually accuracy.

Quantum computing enters this picture not as a silver bullet — it would be premature to frame it that way — but as a genuinely different computational substrate whose properties may offer privacy advantages that classical systems structurally cannot replicate. Quantum encoding distributes information across superposition states in ways that resist extraction; the no-cloning theorem means quantum states cannot be perfectly copied, limiting an adversary's ability to run repeated analyses against the same encoded record; and measurement-induced collapse destroys residual information in a manner with no classical analogue (Lloyd et al., 2013; Schuld & Killoran, 2019). Quantum clustering approaches, including quantum k-means variants and variational quantum clustering, have demonstrated promising performance (Kerenidis et al., 2019; Lloyd et al., 2014). What they have largely not done is treat privacy as a first-class design goal. Formal security analysis — against membership inference, model inversion, and attribute inference attacks — has been absent from most quantum clustering work (El Maouaki et al., 2024). That gap is what motivates the present paper.

There is a second thread worth pulling on here. Geometric deep learning has introduced equivariance as a structural inductive bias: by building symmetry constraints directly into model architecture, equivariant models capture essential patterns while ignoring transformations that do not change the underlying structure (Cohen & Welling, 2016; Bronstein et al., 2021). From a privacy standpoint, this is interesting for a reason that is easy to overlook. Equivariance enforces data minimization — the model learns only what it needs to, because its parameter-sharing scheme actively prevents it from memorizing incidental or identifying details (Maron et al., 2018). That is a privacy property built into the geometry of the model itself, not bolted on afterward through noise addition.

Bringing these two threads together is, admittedly, not straightforward. The p4m symmetry group — comprising fourfold rotations and mirror reflections — is most naturally motivated for grid-structured or image data, and applying it to tabular datasets like network intrusion logs requires a careful feature arrangement that is more of a structured regularization heuristic than a semantically meaningful symmetry. We are transparent about this throughout. What the framework offers is a parameter-efficient quantum-inspired clustering architecture — Symmetry-Aware Equivariant Quantum Clustering, or EQC — that integrates p4m symmetry operations into quantum circuits via parameter sharing, applies differential privacy guarantees through calibrated noise mechanisms, and provides rigorous empirical evaluation against state-of-the-art membership inference, model inversion, and attribute inference attacks (Dwork et al., 2006; Shokri et al., 2017; Yeom et al., 2018).

It is worth being upfront about scope. The privacy improvements demonstrated here stem primarily from differential privacy noise and parameter reduction through equivariance constraints — not from uniquely quantum mechanical effects, which require fault-tolerant hardware to materialize. EQC is best understood as a quantum-ready framework: designed for the hardware that is coming, evaluated rigorously on the simulators available today. The rest of this paper makes that case in detail.

2. Methodology

2.1 Overview of the experimental pipeline

Before describing individual components, it helps to understand how they connect. The EQC pipeline moves through five sequential stages: privacy-preserving data preprocessing, hybrid quantum encoding, equivariant quantum circuit transformation, secure kernel computation, and privacy-aware spectral clustering. Each stage inherits and maintains the differential privacy budget established upstream. No stage introduces information leakage that is not already accounted for in the formal composition analysis. That sequential accountability — rather than treating privacy as an afterthought applied at the end — is arguably what distinguishes this framework from prior quantum clustering approaches (El Maouaki et al., 2024; Kerenidis et al., 2019).

2.2 Datasets

Three datasets were selected to span meaningfully different privacy-sensitive domains, which matters because a framework that only works in one context is of limited practical value.

The first is NSL-KDD, a network intrusion detection benchmark containing 125,973 records across 23 attack categories. Raw features were reduced from 41 to 15 dimensions by removing redundant or near-constant attributes, followed by one-hot encoding of categorical features and MinMax scaling to [−1, 1]. An autoencoder (architecture: 15–32–8–32–15, ReLU activations, Adam optimizer, learning rate 10⁻³, 100 epochs, batch size 64) performed nonlinear dimensionality reduction to an 8-dimensional latent representation suitable for quantum encoding. Gaussian noise was then added (x′ = x + 𝒩(0, 0.1²I)) as local differential privacy prior to encoding, followed by quantization to three decimal places.

The second dataset is CERT Insider Threat v6.2, a synthetic behavioral dataset modeling 1,000 users' organizational activities over 18 months. It was selected to evaluate anomaly detection performance under user-level privacy constraints, where the risk of behavioral attribute inference is particularly acute.

The third is a Synthetic MIMIC-III dataset comprising 10,000 synthetic patient records across 50 features, constructed to reflect the distributional characteristics of clinical data under HIPAA compliance requirements. All three datasets underwent tailored preprocessing — normalization, missing value imputation, and feature selection — prior to quantum encoding. Cluster count k was set equal to the number of ground-truth classes for each dataset, and cluster assignments were mapped to class labels using the Hungarian algorithm to maximize overall accuracy (Blum et al., 2005).

2.3 Privacy-preserving data encoding

The encoding stage applies a two-part noise mechanism before any data enters the quantum circuit. For each record xᵢ in dataset D = {x₁, x₂, …, xₙ}, a calibrated Gaussian perturbation is applied:

x′ᵢ = xᵢ + 𝒩(0, σ²I)

where σ is computed from the data sensitivity and the target privacy parameter ε. This satisfies (ε, δ)-differential privacy prior to quantum encoding (Dwork et al., 2006). The perturbed vector x′ is then encoded into a quantum state using a hybrid amplitude–angle scheme. For the first n/2 qubits, amplitude encoding maps the data to complex amplitudes:

|ϕ⁰(x′)⟩ = Σⱼ αᵢⱼ(x′)|j⟩

where the amplitudes αᵢⱼ satisfy the normalization constraint. For the remaining n/2 qubits, angle encoding is applied qubit-wise:

|ϕ¹ᵢ(x′)⟩ = cos(πx′ᵢ)|0⟩ + sin(πx′ᵢ)|1⟩

The full encoded state is the tensor product |ϕ(x′)⟩ = |ϕ⁰(x′)⟩ ⊗ |ϕ¹(x′)⟩. This hybrid approach was chosen deliberately: amplitude encoding alone is expressive but can concentrate sensitive information in large-amplitude components; angle encoding alone is simple but loses representational richness for high-dimensional data. The combination distributes information across the quantum state in a way that naturally complements the differential privacy noise already applied classically (Schuld & Killoran, 2019).

2.4 Equivariant quantum circuit architecture

The quantum circuit operates on 8 qubits arranged in a 2×4 grid topology. Circuit depth was set to L = 4 alternating layers — determined empirically through the ablation study described below — each comprising single-qubit rotation gates (Rᵧ, R_z) and CNOT entangling gates. The gate set was restricted to the hardware-efficient basis {Rᵧ(θ), R_z(θ), CNOT}, with entanglement following horizontal connections within rows and vertical connections between adjacent rows, respecting the p4m orbit structure.

The full parameterized circuit is:

U(θ) = ∏ₗ [∏ᵢ Rᵢ(θₗ,ᵢ) · ∏₍ᵢ,ⱼ₎∈E Cᵢⱼ(θₗ,ᵢⱼ)]

To enforce p4m equivariance, circuit parameters θ are organized into orbits under the group action. Parameters within the same orbit are constrained to be equal:

θₗ,ᵢ = θₗ,ⱼ if qubits i and j occupy the same orbit under p4m θₗ,ᵢⱼ = θₗ,ₖₘ if qubit pairs (i,j) and (k,m) occupy the same orbit

This constraint reduces the independent parameter count from 112 (unconstrained) to 24, organized into three symmetry orbits: a rotational orbit (8 parameters shared across 4 rotation groups), a reflectional orbit (8 parameters shared across 2 reflection axes), and a combined orbit (8 parameters for entanglement gates). It is worth being explicit about what this symmetry achieves in the tabular data context: because NSL-KDD and CERT are not geometric datasets, p4m rotational and reflectional symmetry carry no direct semantic meaning. Features were grouped by semantic similarity — packet-size features, connection-duration features, protocol flags, and error statistics — before encoding into the 2×4 grid. The parameter sharing then functions as a structured regularization mechanism rather than a semantically motivated geometric constraint. The ablation study (Table 5) confirms this interpretation: a capacity-matched random sharing baseline (24 parameters, no symmetry structure) achieves statistically indistinguishable performance (p = 0.12, paired t-test), indicating that parameter reduction is the primary driver (Cohen & Welling, 2016; Bronstein et al., 2021).

2.5 Secure quantum kernel computation

Cluster similarity is computed through a modified swap test circuit that estimates the quantum kernel without exposing the underlying quantum states. For data points x′ᵢ and x′ⱼ, the kernel value is:

K(x′ᵢ, x′ⱼ) = |⟨ϕ(x′ᵢ, θ)|ϕ(x′ⱼ, θ)⟩|²

A post-processing noise layer is applied to the full kernel matrix:

K′(x′ᵢ, x′ⱼ) = K(x′ᵢ, x′ⱼ) + 𝒩(0, σ²_K)

where σ_K is calibrated to maintain the overall privacy budget (σ_K = 0.05, sensitivity = 0.5). This ensures the kernel matrix itself satisfies differential privacy guarantees independently of the upstream encoding noise (Dwork et al., 2006).

2.6 Privacy-aware clustering algorithm

The final clustering stage applies a modified spectral clustering procedure to the privacy-preserving kernel matrix K′. The normalized Laplacian is computed as L = I − D^(−1/2) K′ D^(−1/2), where D is the diagonal degree matrix. The k smallest eigenvectors of L are extracted using a privacy-preserving eigendecomposition that injects controlled noise into intermediate computations. A differentially private k-means algorithm is then applied to the resulting eigenvector matrix to obtain final cluster assignments (Blum et al., 2005; Jagannathan & Wright, 2005).

2.7 Optimization protocol

Circuit parameters θ were optimized using Simultaneous Perturbation Stochastic Approximation (SPSA), with the following fixed hyperparameters: initial step size a = 0.1, perturbation size c = 0.01, step decay exponent α = 0.602, perturbation decay exponent γ = 0.101, maximum iterations 200, blocking enabled (improvements only), early stopping after 20 consecutive non-improving iterations. The objective function jointly optimizes clustering quality and privacy:

ℒ(θ) = −NMI(K_θ) + 0.5 · MIA_risk(K_θ)

where NMI is normalized mutual information between cluster assignments and ground-truth labels, and MIA_risk estimates the membership inference attack success probability against the current kernel (Shokri et al., 2017; Yeom et al., 2018).

2.8 Privacy accounting and composition

End-to-end privacy guarantees were computed using the Google DP Accounting Library v2.0.0, composing three Gaussian mechanisms via Rényi Differential Privacy (RDP) accounting (Mironov, 2017):

— Data encoding: σ = 0.1, sensitivity = 1.0 — Kernel noise: σ = 0.05, sensitivity = 0.5 — Post-processing: σ = 0.03, sensitivity = 0.3

RDP composition followed Equation (5): ε_tot(α) = Σᵢ εᵢ(α), converted to (ε, δ)-DP with δ = 10⁻⁵, yielding ε_total = 1.02. All baseline methods were re-implemented and re-tuned to operate under the same composed budget across ε_tot ∈ {0.5, 1, 2, 4, 8} with δ = 10⁻⁵ to ensure fair utility comparisons at equivalent privacy levels (Dwork et al., 2006; Mironov, 2017).

2.9 Statistical reporting and reproducibility

All results report mean ± 95% confidence interval across 10 independent trials using seeds: 42, 123, 456, 789, 1011, 1213, 1415, 1617, 1819, and 2021. For each trial: dataset splits were stratified 80/20, circuit parameters initialized uniformly in [−π, π], Gaussian noise drawn independently (σ = 0.1), shadow models trained with independent initialization, and circuit optimization restarted from random initialization. Confidence intervals were computed by nonparametric bootstrap over 1,000 resamples. Hardware noise simulations used the IBM Quantum ibm_cairo noise model (calibrated November 2023): depolarizing noise p₂ = 1.5 × 10⁻² per two-qubit gate, p₁ = 5 × 10⁻⁴ per single-qubit gate; readout error P(0→1) = 0.018, P(1→0) = 0.022; coherence times T₁ = 100 μs, T₂ = 80 μs. Shot counts ranged from 1,000 to 100,000 for hardware noise studies; statevector simulation (infinite shots) was used for all main results. All experiments were executed on Ubuntu 22.04 LTS with an Intel Xeon Gold 6348 CPU and 64 GB RAM, using Qiskit v0.45.1 and PennyLane v0.33.1 on Python 3.10.6. Full source code and a Docker container are available at the anonymous repository provided for peer review.

3. Results

3.1 Preliminary note on interpretation

Before presenting the numbers, it is worth being clear about what they do and do not show. The improvements reported here — in clustering accuracy, privacy leakage reduction, and attack resistance — are real and statistically validated. What they reflect, however, is primarily the combined effect of differential privacy noise, parameter reduction through equivariance constraints, and careful preprocessing design. They do not demonstrate uniquely quantum mechanical advantages, which would require fault-tolerant hardware to materialize (Schuld & Killoran, 2019; Lloyd et al., 2013). That distinction matters for honest interpretation, and it runs through everything that follows.

3.2 Performance on the NSL-KDD network intrusion detection dataset

3.2.1 Clustering accuracy and privacy preservation

The NSL-KDD dataset is arguably the most demanding test case here, not because it is the largest, but because it contains 23 attack categories with highly overlapping feature signatures — the kind of structure that breaks simpler clustering approaches. Against that backdrop, the EQC framework performs considerably better than expected from either classical or existing quantum baselines.

[Table 1] summarizes the full comparison. EQC achieves 79.3% clustering accuracy (± 1.5), which represents a 15.8 percentage point improvement over the best classical baseline (Spectral Clustering at 57.8%) and a 15.8 point improvement over the best quantum baseline (VQC at 63.5%). Privacy leakage, measured as membership inference attack success rate, drops to 38.3% (± 2.1) — compared to 75.8% for Spectral Clustering and 65.4% for VQC. Attribute inference error, where higher values indicate stronger privacy protection, reaches 72.5% for EQC, well above the 35.4% recorded by Spectral Clustering and the 45.7% recorded by VQC. Statistical testing confirms these differences are not marginal: paired t-tests across 10 independent runs yield p < 0.0001 for all comparisons, with Cohen's d exceeding 1.8 in most cases — effect sizes that indicate practically meaningful, not merely statistically detectable, differences [Table 3].

What is perhaps most striking about [Table 1] is that EQC improves both dimensions simultaneously. Privacy-preserving classical methods — DP-K-means, PrivateKmeans, DiffP-Spectral — do reduce privacy leakage compared to their unprotected counterparts, but they pay for it in accuracy (43.5%, 45.2%, and 49.3% respectively). EQC does not face that same tradeoff, at least not to the same degree. That is not a trivial result, and it is worth pausing on: the conventional wisdom in privacy-preserving ML is that utility and privacy pull in opposite directions. Here, they do not — though the reasons for that are more mundane than they might appear, as the ablation studies clarify.

3.2.2 Per-attack-type accuracy

[Figure 1] breaks performance down by attack category across the five NSL-KDD superclasses: Normal, DoS, Probe, R2L, and U2R. The pattern is instructive. EQC's advantage over classical methods is relatively modest for Normal and DoS traffic — these are high-volume, structurally distinct categories that most clustering algorithms handle reasonably well. The gap widens considerably for R2L (Remote to Local) and U2R (User to Root) attacks, which are low-frequency and behaviorally subtle. These are precisely the cases where quantum feature spaces — by operating in high-dimensional Hilbert space — can separate patterns that classical kernels conflate (Schuld & Killoran, 2019). Whether that separation is genuinely quantum in origin, or simply an artifact of the hybrid preprocessing and parameter-efficient encoding, is a question the ablation study addresses directly.

Table 1. Clustering performance and privacy metrics for EQC and baseline methods on the NSL-KDD network intrusion detection dataset (k = 5). All values represent mean ± 95% confidence interval across 10 independent trials with different random seeds. Clustering accuracy was computed following optimal Hungarian label alignment. Privacy leakage reflects membership inference attack (MIA) success rate, where lower values indicate stronger privacy protection. Attribute inference error reflects adversarial failure rate in sensitive attribute reconstruction, where higher values indicate stronger protection. All methods operated under an identical composed differential privacy budget (ε_tot = 1.0, δ = 10⁻⁵). Statistical comparisons between EQC and all baselines yielded p < 0.0001 (paired t-test, 10 runs). Bold values indicate best performance per column. ARI, Adjusted Rand Index; NMI, Normalized Mutual Information; DP, differentially private; VQC, Variational Quantum Clustering; EQC, Equivariant Quantum Clustering.

Method	Accuracy (%)	ARI	NMI	Privacy Leakage (%)	Attribute Inference Error (%)
K-means	52.3 ± 1.7	0.358 ± 0.021	0.487 ± 0.018	78.5 ± 2.3	32.7 ± 1.9
DBSCAN	48.6 ± 2.4	0.312 ± 0.028	0.452 ± 0.023	72.3 ± 3.1	38.2 ± 2.5
Spectral Clustering	57.8 ± 1.9	0.412 ± 0.023	0.531 ± 0.020	75.8 ± 2.5	35.4 ± 2.1
DP-K-means	43.5 ± 2.2	0.287 ± 0.025	0.412 ± 0.022	52.1 ± 2.8	58.3 ± 2.4
PrivateKmeans	45.2 ± 2.0	0.301 ± 0.024	0.425 ± 0.021	48.7 ± 2.6	62.5 ± 2.3
DiffP-Spectral	49.3 ± 2.1	0.342 ± 0.026	0.458 ± 0.023	50.5 ± 2.7	60.8 ± 2.5
Quantum K-means	59.7 ± 2.3	0.435 ± 0.027	0.548 ± 0.024	68.2 ± 2.9	42.3 ± 2.6
VQC	63.5 ± 2.1	0.472 ± 0.025	0.583 ± 0.022	65.4 ± 2.7	45.7 ± 2.4
Quantum Spectral	61.8 ± 2.2	0.453 ± 0.026	0.567 ± 0.023	66.9 ± 2.8	44.1 ± 2.5
EQC (Ours)	79.3 ± 1.5	0.685 ± 0.018	0.742 ± 0.016	38.3 ± 2.1	72.5 ± 2.0

Table 2. Privacy-utility tradeoff analysis on the NSL-KDD dataset across composed privacy budgets ε_tot ∈ {0.1, 0.5, 1.0, 5.0} with δ = 10⁻⁵. All baseline methods were re-implemented and re-tuned to satisfy the same end-to-end privacy budget at each level using Rényi differential privacy composition. Accuracy (%) reflects optimal Hungarian-matched clustering performance. Privacy leakage (%) reflects membership inference attack success rate. Computation time (seconds) represents wall-clock time per trial on Ubuntu 22.04 LTS with Intel Xeon Gold 6348 CPU and 64 GB RAM. Lower privacy leakage and higher accuracy indicate more favorable privacy-utility tradeoff. Bold values indicate best accuracy at each privacy budget level. ε_tot, total composed privacy budget; δ = 10⁻⁵ throughout; DP, differentially private; EQC, Equivariant Quantum Clustering.

Method	ε_tot	Accuracy (%)	Privacy Leakage (%)	Computation Time (s)
DP-K-means	0.1	32.5	42.3	12.5
DP-K-means	0.5	38.7	47.8	12.3
DP-K-means	1.0	43.5	52.1	12.2
DP-K-means	5.0	48.2	63.5	12.1
DiffP-Spectral	0.1	37.2	41.8	28.7
DiffP-Spectral	0.5	44.5	46.2	28.5
DiffP-Spectral	1.0	49.3	50.5	28.3
DiffP-Spectral	5.0	53.8	61.7	28.1
EQC (Ours)	0.1	68.7	32.5	45.3
EQC (Ours)	0.5	75.2	35.8	45.1
EQC (Ours)	1.0	79.3	38.3	44.8
EQC (Ours)	5.0	81.5	42.7	44.6

Table 3. Statistical significance testing for clustering accuracy and privacy leakage improvements of EQC over baseline methods on NSL-KDD (k = 5, ε_tot = 1.0). Paired t-tests were conducted across 10 independent random initializations. Effect sizes are reported as Cohen's d. All comparisons yield p < 0.0001, indicating that observed improvements are both statistically significant and practically meaningful. Cohen's d > 1.5 across all comparisons indicates large effect sizes by conventional thresholds. Bold values denote EQC comparisons. EQC, Equivariant Quantum Clustering; VQC, Variational Quantum Clustering. Effect size thresholds: small d = 0.2, medium d = 0.5, large d = 0.8 (Cohen, 1988).

Comparison	Accuracy p-value	Accuracy Effect Size (d)	Privacy Leakage p-value	Privacy Leakage Effect Size (d)
EQC vs. K-means	< 0.0001	2.83	< 0.0001	2.95
EQC vs. Spectral	< 0.0001	2.37	< 0.0001	2.78
EQC vs. DP-K-means	< 0.0001	3.12	< 0.0001	1.87
EQC vs. DiffP-Spectral	< 0.0001	2.85	< 0.0001	1.92
EQC vs. VQC	< 0.0001	1.83	< 0.0001	2.12
EQC vs. Quantum K-means	< 0.0001	2.21	< 0.0001	2.35

Table 4. Impact of equivariance component ablation on clustering accuracy and privacy leakage across all three evaluation datasets (k = 5, ε_tot = 1.0). Each row removes one or all equivariance components from the full EQC model. Values represent mean across 10 independent trials. Privacy leakage reflects membership inference attack success rate. The full EQC configuration consistently achieves the highest accuracy and lowest privacy leakage across all datasets. Bold values indicate best performance per column. EQC, Equivariant Quantum Clustering; CERT, CERT Insider Threat v6.2 dataset; Synthetic MIMIC-III, synthetically generated clinical dataset modeled on MIMIC-III distributional characteristics.

Configuration	NSL-KDD Accuracy (%)	NSL-KDD Privacy Leakage (%)	CERT Accuracy (%)	CERT Privacy Leakage (%)	Synthetic MIMIC-III Accuracy (%)	Synthetic MIMIC-III Privacy Leakage (%)
Full EQC	79.3	38.3	75.0	35.7	70.5	32.8
No Rotational Equivariance	72.5	45.7	68.3	42.5	63.8	39.5
No Reflectional Equivariance	75.8	42.3	71.2	39.8	66.7	36.2
No Parameter Sharing	68.7	52.5	64.5	48.7	60.2	45.3
No Equivariance (Baseline)	63.5	65.4	58.7	68.2	55.8	71.5

Table 5. Capacity-matched ablation study isolating the contribution of symmetry structure versus parameter reduction on NSL-KDD (k = 5, ε_tot = 1.0). All variants except Unconstrained VQC use exactly 24 independent parameters. Results are reported as mean ± standard deviation across 10 independent runs. The non-significant difference between EQC and Random Sharing (p = 0.12, paired t-test) indicates that parameter reduction, rather than p4m symmetry structure specifically, is the primary driver of performance improvements. Bold values indicate best performance. VQC, Variational Quantum Clustering; EQC, Equivariant Quantum Clustering. Permutation Equivariance shares parameters across semantically similar feature groups without geometric symmetry assumptions. p-value for EQC vs. Random Sharing: p = 0.12 (paired t-test, not significant at α = 0.05).

Variant	Parameters	Accuracy (%)	Privacy Leakage (%)
Unconstrained VQC	112	63.5 ± 2.1	65.4 ± 2.7
Random Sharing	24	78.1 ± 1.6	39.7 ± 2.2
Permutation Equivariance	24	77.4 ± 1.7	40.5 ± 2.3
EQC p4m (Ours)	24	79.3 ± 1.5	38.3 ± 2.1

Table 6. Impact of quantum kernel hyperparameters — circuit depth, entanglement pattern, and feature encoding scheme — on clustering accuracy, privacy leakage, and composite security score on NSL-KDD (k = 5, ε_tot = 1.0). Security Score is defined as (1 − Privacy Leakage/100) + Accuracy/100, where higher values reflect a more favorable joint privacy-utility outcome. Values represent mean ± standard deviation across 10 independent runs. The full EQC configuration (depth 4, structured entanglement, hybrid encoding) achieves the highest security score. Bold values indicate best performance. Structured entanglement follows p4m grid topology with horizontal and vertical connections respecting symmetry orbit constraints. Hybrid encoding combines amplitude encoding (first n/2 qubits) with angle encoding (remaining n/2 qubits).

Circuit Depth	Entanglement Pattern	Feature Map	Accuracy (%)	Privacy Leakage (%)	Security Score
2	Linear	Angle	70.5	45.8	0.625
4	Linear	Angle	73.2	43.5	0.648
6	Linear	Angle	73.8	43.2	0.653
4	Structured	Angle	77.5	39.8	0.688
4	All-to-All	Angle	75.8	41.7	0.671
4	Structured	Amplitude	76.2	40.5	0.678
4	Structured	Hybrid	79.3	38.3	0.705

Table 7. Effect of circuit depth on formal privacy guarantees on NSL-KDD (k = 5, ε_tot reported per depth level). Four privacy metrics are reported across circuit depths 1 through 6: membership inference attack (MIA) success rate (lower is better), attribute inference error (higher is better), reconstruction error (higher is better), and differential privacy parameter ε (lower is better). Privacy improvements plateau between depths 4 and 6, confirming depth L = 4 as the optimal operating point for near-term quantum hardware. MIA, membership inference attack. ε values represent the composed (ε, δ)-differential privacy guarantee with δ = 10⁻⁵. Bold row indicates selected operating depth for all main experiments.

Circuit Depth	MIA Success (%)	Attribute Inference Error (%)	Reconstruction Error	Differential Privacy ε
1	58.7	52.3	0.425	3.25
2	52.5	58.7	0.487	2.58
3	47.8	63.5	0.532	1.87
4	45.3	67.2	0.568	1.42
5	44.8	68.5	0.575	1.35
6	44.5	68.7	0.578	1.32

Table 8. Impact of preprocessing pipeline choices on clustering accuracy and privacy leakage on NSL-KDD (k = 5, ε_tot = 1.0). Six preprocessing combinations are evaluated across three dimensions: dimensionality reduction method (PCA vs. autoencoder), normalization scheme (standard z-score vs. MinMax scaling to [−1, 1]), and privacy mechanism (none, noise addition only, or noise plus quantization). The full EQC preprocessing pipeline (autoencoder, MinMax, noise plus quantization) achieves the highest accuracy and lowest privacy leakage. Bold values indicate best performance. PCA, Principal Component Analysis. Autoencoder architecture: 15–32–8–32–15, ReLU activations, Adam optimizer (lr = 10⁻³), 100 epochs, batch size 64. Noise addition: x′ = x + 𝒩(0, 0.1²I). Quantization: values rounded to 3 decimal places.

Dimensionality Reduction	Normalization	Privacy Mechanism	Accuracy (%)	Privacy Leakage (%)
PCA	Standard	None	75.3	52.8
PCA	MinMax	None	76.5	50.5
Autoencoder	Standard	None	77.8	48.3
Autoencoder	MinMax	None	78.5	47.2
PCA	MinMax	Noise Addition	73.2	42.5
Autoencoder	MinMax	Noise Addition	75.8	40.3
Autoencoder	MinMax	Noise + Quantization	79.3	38.3

Table 9. Membership inference attack resistance analysis across clustering methods on NSL-KDD (k = 5, ε_tot = 1.0). Three attack variants are evaluated: shadow model attack (4 shadow models per target, trained on disjoint 25% data subsets), threshold attack, and loss-based attack. Attack success rate (%) reflects adversarial accuracy in determining whether a target record was included in the training dataset, where 50% represents random guessing. Resistance Score = 1 − (Average ASR/100), where higher values indicate stronger privacy protection. Bold values indicate best performance. ASR, attack success rate. Random guessing baseline = 50%. Theoretical upper bound for ε = 1.0, δ = 10⁻⁵: ASR ≤ 62.2% per Equation (4). EQC's observed ASR of 38.3% is well below the theoretical maximum, confirming strong differential privacy compliance.

Method	Shadow Model Attack (%)	Threshold Attack (%)	Loss-Based Attack (%)	Average (%)	Resistance Score
K-means	75.3	80.2	80.0	78.5	0.215
DBSCAN	70.8	74.5	71.6	72.3	0.277
Spectral Clustering	73.5	77.8	76.2	75.8	0.242
DP-K-means	50.3	53.7	52.3	52.1	0.479
DiffP-Spectral	48.7	51.5	51.3	50.5	0.495
Quantum K-means	65.7	70.2	68.7	68.2	0.318
VQC	63.2	67.5	65.5	65.4	0.346
EQC (Ours)	36.5	39.8	38.6	38.3	0.617

Table 10. Model inversion attack resistance analysis across clustering methods on NSL-KDD (k = 5, ε_tot = 1.0). Two reconstruction approaches are evaluated: gradient-based reconstruction and optimization-based reconstruction. Reconstruction error is reported as mean squared error (MSE), where higher values indicate greater difficulty in reconstructing original input records from clustering outputs — and therefore stronger privacy protection. Privacy Protection Factor is computed relative to the k-means baseline (PPF = MSE_method / MSE_k-means). Bold values indicate best performance. MSE, mean squared error. PPF, Privacy Protection Factor = MSE_method / MSE_k-means. Higher MSE and higher PPF indicate stronger resistance to model inversion. EQC's PPF of 2.42 indicates reconstruction from EQC outputs is approximately 2.4 times more difficult than from unprotected k-means outputs.

Method	Gradient-Based Reconstruction Error (MSE)	Optimization-Based Reconstruction Error (MSE)	Average Reconstruction Error (MSE)	Privacy Protection Factor
K-means	0.228	0.242	0.235	1.00 (baseline)
DBSCAN	0.253	0.267	0.260	1.11
Spectral Clustering	0.245	0.258	0.252	1.07
DP-K-means	0.375	0.398	0.387	1.65
DiffP-Spectral	0.362	0.385	0.374	1.59
Quantum K-means	0.312	0.328	0.320	1.36
VQC	0.318	0.332	0.325	1.38
EQC (Ours)	0.552	0.583	0.568	2.42

Table 11. Attribute inference attack resistance analysis across clustering methods on NSL-KDD (k = 5, ε_tot = 1.0). Three attribute categories are evaluated: demographic attributes, behavioral attributes, and sensitive attributes. Attribute inference error (%) reflects the adversarial failure rate in predicting sensitive attributes from clustering outputs, where higher values indicate stronger privacy protection. Gradient-based reconstruction with 100 iterations of projected gradient ascent was used as the attack method. Bold values indicate best performance. Higher attribute inference error indicates greater resistance to attribute leakage. Random guessing baseline for binary attributes ≈ 50%. EQC's behavioral attribute inference error of 74.2% reflects particularly strong protection for longitudinal and session-level behavioral features.

Method	Demographic Attribute Inference Error (%)	Behavioral Attribute Inference Error (%)	Sensitive Attribute Inference Error (%)	Average (%)
K-means	30.5	33.8	33.7	32.7
DBSCAN	35.7	39.5	39.3	38.2
Spectral Clustering	33.2	36.8	36.2	35.4
DP-K-means	55.7	60.2	59.0	58.3
DiffP-Spectral	58.5	62.3	61.5	60.8
Quantum K-means	40.2	43.5	43.2	42.3
VQC	43.5	47.2	46.5	45.7
EQC (Ours)	70.3	74.2	73.0	72.5

Table 12. Noise robustness analysis across clustering methods under four noise types on NSL-KDD (k = 5, ε_tot = 1.0). Clean accuracy reflects noiseless performance. Noisy accuracy is reported for Gaussian noise (σ = 0.2), salt-and-pepper noise (density = 0.1), Laplacian noise, and uniform noise. Retention (%) is computed as (Average Noisy Accuracy / Clean Accuracy) × 100, reflecting the proportion of clean performance preserved under noise. Bold values indicate best performance per column. Gaussian noise: σ = 0.2 added independently to each feature. Salt-and-pepper noise: random feature replacement at density 0.1. Laplacian and uniform noise calibrated to equivalent signal-to-noise ratio. Identical Hungarian matching and noise parameters applied across all methods.

Method	Clean (%)	Gaussian (%)	Salt-Pepper (%)	Laplacian (%)	Uniform (%)	Average Noisy (%)	Retention (%)
K-means	52.3	32.5	35.7	33.8	36.2	34.6	66.2
Spectral Clustering	57.8	38.9	41.2	40.3	42.5	40.7	70.4
DP-K-means	43.5	35.8	37.2	36.5	38.3	37.0	85.1
DiffP-Spectral	49.3	40.2	42.5	41.7	43.8	42.1	85.4
Quantum K-means	59.7	42.3	44.8	43.5	45.7	44.1	73.9
VQC	63.5	47.2	49.3	48.2	50.5	48.8	76.9
EQC (Ours)	79.3	70.5	72.8	71.3	73.5	72.0	90.8

Table 13. Hardware noise robustness across shot counts on NSL-KDD under the IBM Quantum ibm_cairo noise model (ε_tot = 1.0, k = 5). Depolarizing noise: p₂ = 1.5 × 10⁻² per two-qubit gate, p₁ = 5 × 10⁻⁴ per single-qubit gate. Readout error: P(0→1) = 0.018, P(1→0) = 0.022. Coherence times: T₁ = 100 μs, T₂ = 80 μs. Results are averaged over five independent random seeds ± standard deviation. The infinite-shot row reflects noiseless statevector simulation. Bold values indicate the realistic near-term operating point (10,000 shots). MIA, membership inference attack. The theoretical upper bound on MIA success for ε = 1.0, δ = 10⁻⁵ is 62.2% per Equation (4); all reported values remain below this bound, confirming differential privacy compliance across all shot counts. No explicit error mitigation was applied. ibm_cairo noise model calibrated November 2023.

Shots	Accuracy (%)	MIA Success (%)	Notes
∞ (noiseless)	78.4 ± 1.2	38.3 ± 1.8	Noiseless statevector simulation
100,000	68.7 ± 2.1	48.2 ± 2.5	Depolarizing + readout noise
50,000	66.5 ± 2.3	50.1 ± 2.7	Same noise model
10,000	62.3 ± 2.8	54.7 ± 3.1	Realistic near-term setting
1,000	53.8 ± 3.5	63.2 ± 3.8	Highly noisy regime

Figure 1. Clustering accuracy by attack type on the NSL-KDD dataset (k = 5, ε_tot = 1.0, δ = 10⁻⁵). Bar heights represent mean clustering accuracy across 10 independent runs; error bars indicate 95% bootstrap confidence intervals. Five attack superclasses are shown on the x-axis: DoS (Denial of Service), Probe, R2L (Remote to Local), U2R (User to Root), and Normal traffic. Cluster-to-class alignment was performed using the Hungarian algorithm at each run. Colors are consistent across all figures in this manuscript. EQC demonstrates the largest performance advantage over classical and quantum baselines for low-frequency, high-complexity attack types (R2L and U2R), where quantum feature spaces provide the greatest discriminative benefit. DP, differentially private; VQC, Variational Quantum Clustering; EQC, Equivariant Quantum Clustering.

Figure 2. t-SNE visualization of learned quantum feature spaces on NSL-KDD (k = 5, ε_tot = 1.0, δ = 10⁻⁵). Each point represents a data record; colors indicate ground-truth attack superclass (DoS, Probe, R2L, U2R, Normal). Perplexity = 30; 1,000 iterations. This visualization is provided for qualitative illustration of cluster separation only. No quantitative claims in this manuscript are derived from t-SNE projections; all quantitative conclusions rest on the metrics reported in Tables 1–3. EQC produces more distinct and spatially separated cluster regions compared to k-means and VQC baselines, particularly for Probe and DoS superclasses. t-SNE, t-distributed Stochastic Neighbor Embedding; EQC, Equivariant Quantum Clustering.

Figure 3. Row-normalized confusion matrix for EQC on NSL-KDD (k = 5, ε_tot = 1.0, δ = 10⁻⁵). Each cell value represents the fraction of samples belonging to a true attack class (rows) assigned to a predicted cluster (columns), averaged across 10 independent runs with 95% bootstrap confidence intervals. Hungarian label alignment was applied before computing the matrix. Diagonal values reflect per-class clustering accuracy. Off-diagonal misclassifications are concentrated between semantically related attack variants — DoS subtypes and Probe subtypes — consistent with their overlapping feature signatures in the NSL-KDD feature space. EQC, Equivariant Quantum Clustering.

Figure 4. Ablation of equivariance components on privacy leakage across three attack types: Membership Inference Attack (MIA), Model Inversion, and Attribute Inference (k = 5, ε_tot = 1.0). Bar heights represent mean MIA success rates across 10 independent runs; error bars indicate 95% bootstrap confidence intervals. Five configurations are compared: Full EQC, No Rotational Equivariance, No Reflectional Equivariance, No Parameter Sharing, and No Equivariance (Baseline VQC). The horizontal dashed line at 0.5 (50%) denotes the random guessing baseline for membership inference. Rotational equivariance provides the greatest protection against membership inference; reflectional equivariance contributes most to attribute inference resistance; parameter sharing provides broad-spectrum protection across all three attack types. EQC, Equivariant Quantum Clustering; MIA, membership inference attack.

Figure 5. Impact of circuit depth L ∈ {1, 2, 3, 4, 5, 6} on clustering accuracy and membership inference attack (MIA) success rate on NSL-KDD (k = 5, ε_tot = 1.0). Points represent mean values across 10 independent runs; shaded regions indicate 95% bootstrap confidence intervals. Left y-axis: clustering accuracy (%). Right y-axis: MIA success rate (%). Both accuracy gains and privacy improvements plateau beyond depth L = 4, confirming this as the optimal operating point. All experiments used identical optimization settings (SPSA optimizer, 200 maximum iterations) and Hungarian label alignment. EQC, Equivariant Quantum Clustering; MIA, membership inference attack; SPSA, Simultaneous Perturbation Stochastic Approximation.

Figure 6. Impact of data preprocessing pipeline on membership inference attack (MIA) success rates across NSL-KDD attack types (k = 5, ε_tot = 1.0). Bar heights represent mean MIA success rates across 10 independent runs; error bars indicate 95% bootstrap confidence intervals. Three preprocessing configurations are compared: raw MinMax scaling only, PCA-based dimensionality reduction, and EQC symmetry encoding with autoencoder-based reduction and noise-plus-quantization. Identical noise parameters (σ = 0.1) and Hungarian matching were applied across all configurations. The full EQC preprocessing pipeline achieves the lowest MIA success rate across all attack categories, demonstrating that preprocessing design choices carry substantial and measurable privacy consequences independent of circuit architecture. MIA, membership inference attack; PCA, Principal Component Analysis; EQC, Equivariant Quantum Clustering.

Figure 7. Membership inference attack (MIA) success rates across all methods and all three evaluation datasets: NSL-KDD, CERT Insider Threat v6.2, and Synthetic MIMIC-III (k = 5, ε_tot = 1.0, δ = 10⁻⁵). Bar heights represent mean MIA success rates across 10 independent runs; error bars indicate 95% bootstrap confidence intervals. The horizontal dashed line at 0.5 (50%) denotes the random guessing baseline. Identical noise parameters, shadow model architecture (2-layer MLP, 64 hidden units, ReLU activations, 4 shadow models per target), and evaluation protocols were applied uniformly across all methods to ensure valid comparisons. EQC achieves the lowest MIA success rate across all three datasets, with all values remaining below the theoretical upper bound of 62.2% derived from (ε, δ)-DP Equation (4). MIA, membership inference attack; EQC, Equivariant Quantum Clustering; CERT, CERT Insider Threat v6.2.

Figure 8. Model inversion attack reconstruction error (MSE) across all methods and three evaluation datasets (k = 5, ε_tot = 1.0, δ = 10⁻⁵). Higher reconstruction error indicates greater difficulty in recovering original input records from clustering outputs, reflecting stronger resistance to model inversion. Values represent mean reconstruction error across 10 independent runs. Both gradient-based and optimization-based reconstruction methods were evaluated; values shown reflect the average across both approaches (see Table 10 for disaggregated results). EQC achieves substantially higher reconstruction error than all baseline methods across all datasets, with a Privacy Protection Factor of 2.42 relative to unprotected k-means. MSE, mean squared error; PPF, Privacy Protection Factor; EQC, Equivariant Quantum Clustering.

Figure 9. Attribute inference attack error rate across clustering methods and attribute categories on NSL-KDD (k = 5, ε_tot = 1.0, δ = 10⁻⁵). Three attribute categories are shown: demographic, behavioral, and sensitive. Bar heights represent mean attribute inference error (%) across 10 independent runs; error bars indicate 95% bootstrap confidence intervals. Higher error indicates greater adversarial failure in inferring sensitive attributes from clustering outputs — and therefore stronger attribute-level privacy protection. Gradient-based reconstruction with 100 iterations of projected gradient ascent was used as the attack method. EQC achieves the highest attribute inference error across all three categories, with particularly strong protection for behavioral attributes (74.2%), which are among the most sensitive in enterprise and clinical deployment contexts. EQC, Equivariant Quantum Clustering.

Figure 10. Clustering accuracy as a function of Gaussian noise standard deviation σ ∈ [0, 0.5] on NSL-KDD (k = 5, ε_tot = 1.0). Points represent mean clustering accuracy across 10 independent runs; shaded regions indicate 95% bootstrap confidence intervals. Gaussian noise was added independently to each feature at each noise level. Optimal Hungarian label alignment was applied at each noise level for all methods. EQC maintains above 70% accuracy up to σ = 0.2 and above 60% up to σ = 0.3, degrading more gradually than all baseline methods. Privacy leakage (not shown) remained consistently below 40% across the full noise range for EQC, confirming a stable privacy-utility balance under varying noise conditions. The shaded region labeled "Optimal Complexity" indicates the parameter range (approximately 24 independent parameters) at which EQC balances expressivity and privacy leakage most efficiently. EQC, Equivariant Quantum Clustering; VQC, Variational Quantum Clustering.

One finding worth noting: U2R detection accuracy for EQC reaches approximately 61.1%, compared to around 40% for the best classical privacy-preserving baseline. That 20-point gap for the rarest and most dangerous attack class has real operational significance for security applications — though it also means there is a non-trivial 18.2 percentage point disparity between EQC's best and worst per-class performance, which warrants attention before any deployment in high-stakes settings.

3.2.3 Privacy-utility tradeoff across budget levels

[Table 2] extends the comparison across five composed privacy budgets: ε_tot ∈ {0.1, 0.5, 1.0, 5.0}, with δ = 10⁻⁵ throughout. All baseline methods were re-tuned to operate under the same end-to-end privacy budget at each level, following the RDP composition procedure described in the Methods (Mironov, 2017; Dwork et al., 2006).

Even at the strictest budget tested (ε = 0.1), EQC maintains 68.7% accuracy with 32.5% privacy leakage. DP-K-means at the same budget achieves only 32.5% accuracy. DiffP-Spectral reaches 37.2%. The gap closes somewhat at looser budgets — at ε = 5.0, DP-K-means reaches 48.2% while EQC reaches 81.5% — but EQC's advantage is consistent across the full range. Computation time for EQC runs approximately 44.8 seconds per trial at ε = 1.0, compared to 12.2 seconds for DP-K-means and 28.3 seconds for DiffP-Spectral. That roughly 3.6-fold overhead relative to the fastest baseline is a real cost that should be factored into practical deployment considerations.

3.2.4 Cluster structure visualization

[Figure 2] presents t-SNE visualizations of the learned quantum feature spaces for EQC and selected baselines on NSL-KDD. These visualizations are provided for qualitative illustration only — no quantitative claims rest on them, and t-SNE projections can be misleading if over-interpreted (all quantitative conclusions derive from [Table 1] through [Table 3]). With that caveat stated, the visualizations do show more distinct cluster separation for EQC than for k-means or VQC, particularly for the Probe and DoS superclasses, which appear as well-separated regions in the EQC projection but partially overlapping in the classical baselines.

[Figure 3] presents a row-normalized confusion matrix for EQC on NSL-KDD (k = 5), computed over 10 runs with 95% confidence intervals and Hungarian label alignment. Most misclassifications occur between related attack variants — DoS subtypes, for instance, are occasionally assigned to neighboring clusters rather than the correct one. This pattern aligns with security domain knowledge about attack similarity and does not represent a random failure mode.

3.3 Ablation studies

3.3.1 Contribution of individual equivariance components

[Table 4] isolates the contribution of each equivariance component — rotational symmetry, reflectional symmetry, and parameter sharing — by systematically removing each from the full EQC model and measuring the resulting change in accuracy and privacy leakage across all three datasets.

Removing rotational equivariance alone causes accuracy to drop by approximately 6.7–6.8 percentage points and privacy leakage to increase by a similar margin. Removing reflectional equivariance produces a smaller but consistent degradation of 3.4–3.8 points. The most consequential ablation is parameter sharing: removing it causes accuracy to fall by more than 10 percentage points and privacy leakage to increase by 12.5–14.2 points across datasets. Removing all equivariance constraints simultaneously reduces accuracy by 15–16 points and raises privacy leakage by 27–39 points — the full magnitude of EQC's headline improvement over unconstrained VQC [Table 4].

[Figure 4] visualizes these effects by attack type for membership inference, model inversion, and attribute inference attacks under fixed ε_tot = 1.0. Rotational equivariance appears especially protective against membership inference; reflectional equivariance matters most for attribute inference. Parameter sharing provides broad-spectrum protection across all three attack types (Cohen & Welling, 2016; Maron et al., 2018).

3.3.2 Symmetry versus parameter reduction: a capacity-matched test

This is probably the most important ablation in the paper, and the results are worth reading carefully. [Table 5] compares four variants all constrained to exactly 24 independent parameters: EQC with structured p4m sharing, a random sharing baseline with no symmetry structure, a permutation-equivariant baseline that shares parameters across semantically similar feature groups, and an unconstrained VQC with 112 parameters.

EQC achieves 79.3% accuracy (± 1.5). Random Sharing achieves 78.1% (± 1.6). The difference is not statistically significant (p = 0.12, paired t-test). Permutation Equivariance achieves 77.4% (± 1.7) — also comparable. Unconstrained VQC, with nearly five times as many parameters, achieves only 63.5% (± 2.1), significantly worse (p < 0.001). The conclusion from [Table 5] is clear, if somewhat deflating for the symmetry motivation: it is parameter reduction — regularization — that drives the privacy and utility improvements, not p4m symmetry specifically. Any structured sharing scheme of comparable parameter count performs similarly. This finding is reported honestly here, and the framework's contribution is better understood as parameter-efficient quantum clustering with differential privacy than as symmetry-specific quantum privacy preservation (Bronstein et al., 2021).

3.3.3 Effect of quantum kernel parameters on security

[Table 6] examines how circuit depth, entanglement pattern, and feature encoding scheme jointly affect clustering accuracy and a composite security score (defined as 1 − Privacy Leakage/100 + Accuracy/100). Increasing circuit depth from 2 to 4 layers produces meaningful gains in both accuracy and privacy. Beyond depth 4, improvements plateau — depth 6 yields only marginal gains over depth 5, confirming that L = 4 is the appropriate operating point. Structured entanglement aligned to p4m topology outperforms both linear and all-to-all entanglement patterns. The hybrid amplitude–angle encoding scheme consistently outperforms pure amplitude or pure angle encoding, with the full EQC configuration achieving the highest security score of 0.705 [Table 6].

3.3.4 Influence of circuit depth on formal privacy guarantees

[Table 7] reports four privacy metrics — membership inference attack success rate, attribute inference error, reconstruction error, and differential privacy ε — across circuit depths 1 through 6. Privacy improves substantially between depths 1 and 4: membership inference success falls from 58.7% to 45.3%, attribute inference error rises from 52.3% to 67.2%, and formal ε decreases from 3.25 to 1.42. Between depths 4 and 6, gains are minimal (ε moves from 1.42 to 1.32). This plateau pattern is consistent with the accuracy results and reinforces L = 4 as the optimal operating point for near-term hardware with finite coherence budgets [Table 7], [Figure 5].

3.3.5 Sensitivity to preprocessing choices

[Table 8] evaluates six preprocessing combinations across dimensionality reduction method (PCA vs. autoencoder), normalization scheme (standard vs. MinMax), and privacy mechanism (none vs. noise addition vs. noise plus quantization). The autoencoder with MinMax normalization and noise plus quantization achieves the best accuracy (79.3%) and the lowest privacy leakage (38.3%) — the full EQC configuration. PCA with standard normalization and no privacy mechanism achieves 75.3% accuracy but 52.8% privacy leakage, confirming that preprocessing choices have substantial privacy implications independent of the quantum circuit design.

[Figure 6] visualizes these preprocessing effects across attack types. The combination of autoencoder-based reduction, MinMax scaling, and noise-plus-quantization provides the most consistent privacy protection across all attack categories, with no single category showing an anomalous vulnerability.

3.4 Robustness to privacy attacks

3.4.1 Membership inference attack resistance

[Figure 7] presents membership inference attack success rates across all methods and datasets. EQC's attack success rate of 38.3% is the lowest of any method tested — 40.2 percentage points below standard k-means (78.5%), 13.8 points below DP-K-means (52.1%), and 27.1 points below VQC (65.4%). These comparisons are made under identical evaluation conditions: shadow model approach with four shadow models per target, each trained on disjoint 25% subsets, with a 2-layer MLP (64 hidden units, ReLU) as the attack classifier (Shokri et al., 2017).

[Table 9] disaggregates by attack type — shadow model, threshold, and loss-based attacks — and reports a composite resistance score (where 1.0 would represent perfect resistance and 0.0 would represent complete vulnerability). EQC achieves a resistance score of 0.617, compared to 0.495 for DiffP-Spectral, 0.479 for DP-K-means, and 0.346 for VQC. The protection stems from three interacting mechanisms: differential privacy noise limits what the kernel matrix reveals about individual records; parameter sharing prevents the circuit from memorizing specific data presentations; and the inherent lossy nature of quantum measurement means that even access to the kernel does not reconstruct the underlying states (Yeom et al., 2018; Schuld & Killoran, 2019).

3.4.2 Model inversion attack resistance

Model inversion attacks attempt to reconstruct input records from clustering outputs — a particularly serious risk in healthcare applications where even partial reconstruction of patient features can constitute a privacy breach. [Figure 8] and [Table 10] report reconstruction error (MSE) across gradient-based and optimization-based reconstruction methods.

EQC's average reconstruction error is 0.568, compared to 0.235 for standard k-means (baseline), 0.387 for DP-K-means, and 0.325 for VQC. Expressed as a privacy protection factor relative to k-means, EQC achieves 2.42 — meaning reconstruction from EQC outputs is approximately 2.4 times harder than from unprotected k-means outputs. No other method exceeds a factor of 1.65 [Table 10]. It is worth being candid about the interpretation here: in the classical simulation setting, the no-cloning theorem and measurement collapse — which would provide genuine protection on real quantum hardware — are not operative. The reconstruction resistance observed here comes from the combined effect of DP noise, parameter reduction, and the lossy quantum-inspired encoding process (Lloyd et al., 2013).

3.4.3 Attribute inference attack resistance

[Figure 9] and [Table 11] report attribute inference error across demographic, behavioral, and sensitive attribute categories. EQC achieves an average attribute inference error of 72.5%, compared to 32.7% for k-means, 60.8% for DiffP-Spectral, and 45.7% for VQC. Protection is strongest for behavioral attributes (74.2% error), which is significant given that behavioral data — user activity logs, session patterns, clinical trajectories — represents some of the most sensitive content in the target application domains (Yeom et al., 2018; Maron et al., 2018).

3.5 Noise resilience

3.5.1 Robustness under Gaussian and structured noise

[Figure 10] plots clustering accuracy as a function of Gaussian noise standard deviation (σ ∈ [0, 0.5]) for EQC and selected baselines on NSL-KDD. EQC maintains above 70% accuracy up to σ = 0.2 and above 60% up to σ = 0.3, degrading more gradually than any baseline method. Privacy leakage remains below 40% throughout this range, which means the privacy-utility balance is preserved even under moderately adversarial noise conditions.

[Table 12] extends this to four noise types — Gaussian (σ = 0.2), salt-and-pepper (density = 0.1), Laplacian, and uniform — reporting clean accuracy, noisy accuracy, and a retention percentage (noisy accuracy as a fraction of clean accuracy). EQC retains 90.8% of clean performance on average across noise types. The next best method, VQC, retains 76.9%. Standard k-means retains only 66.2%. The equivariance constraints appear to play a genuine regularizing role here, making the circuit's behavior more stable across perturbations in a way that is consistent with theoretical predictions from geometric deep learning (Cohen & Welling, 2016; Bronstein et al., 2021).

3.5.2 Robustness under realistic hardware noise

The hardware noise analysis is probably the most practically important result in this paper for researchers planning near-term quantum implementations. [Table 13] reports accuracy and membership inference attack success across shot counts from 1,000 to 100,000, simulated using the IBM Quantum ibm_cairo noise model (depolarizing noise p₂ = 1.5 × 10⁻², p₁ = 5 × 10⁻⁴; readout error P(0→1) = 0.018, P(1→0) = 0.022; coherence times T₁ = 100 μs, T₂ = 80 μs).

At infinite shots (noiseless statevector simulation), EQC achieves 78.4% accuracy and 38.3% MIA success — the headline results. At 10,000 shots, a realistic near-term operating point, accuracy falls to 62.3% (± 2.8) and MIA success rises to 54.7% (± 3.1). Performance improves with shot count but plateaus around 50,000 shots due to coherent noise effects that additional sampling cannot resolve. For comparison, unconstrained VQC drops to 51.8% at 10,000 shots, suggesting that symmetry-constrained circuits do show modest noise resilience relative to unstructured alternatives — though the absolute degradation from noiseless to near-term conditions is substantial for both [Table 13].

The privacy implications of this degradation deserve explicit attention. At 10,000 shots, MIA success of 54.7% is meaningfully higher than the 38.3% achieved under noiseless simulation. It remains below the 62.2% theoretical upper bound for ε = 1.0, δ = 10⁻⁵ derived from Equation (4) — so formal differential privacy guarantees are not violated — but the practical privacy protection is noticeably weaker than the headline results suggest. Near-term deployment would require either explicit error mitigation techniques such as zero-noise extrapolation, or a hybrid architecture that offloads noise-sensitive operations to classical processors while retaining quantum computation only for components that can tolerate the noise floor (Dwork et al., 2006; Mironov, 2017).

4. Discussion

4.1 What the results actually show — and what they do not

It is tempting, when results look this clean, to reach for the strongest possible interpretation. The EQC framework achieves 79.3% clustering accuracy on NSL-KDD while reducing membership inference attack success to 38.3% — simultaneously improving both dimensions where classical privacy-preserving methods consistently sacrifice one for the other [Table 1]. Across ablation studies spanning equivariance components, kernel parameters, preprocessing pipelines, and attack types, the pattern holds with notable consistency [Tables 4–11]. That is a genuinely encouraging set of findings.

But the honest interpretation requires some care. The privacy and utility improvements demonstrated here do not arise primarily from quantum mechanical effects. They arise from three interacting classical mechanisms: calibrated differential privacy noise applied during encoding, parameter reduction enforced through equivariance constraints — which functions as structured regularization — and a carefully designed preprocessing pipeline combining autoencoder-based dimensionality reduction with MinMax normalization and local noise addition [Table 8]. The capacity-matched ablation [Table 5] makes this explicit: a random parameter-sharing baseline with no symmetry structure achieves 78.1% accuracy, statistically indistinguishable from EQC's 79.3% (p = 0.12). The quantum feature space provides expressive representational power, and the differential privacy accounting provides formal guarantees — but the specific choice of p4m symmetry, however theoretically elegant, is not what drives the results (Dwork et al., 2006; Cohen & Welling, 2016).

This is not a failure. It is a clarification. The contribution of this work is a parameter-efficient, differentially private quantum clustering framework that works — and works better than existing alternatives — on privacy-sensitive data. That is worth having. What it is not, at least not yet, is a demonstration of uniquely quantum privacy advantages. Those remain a credible theoretical possibility for future fault-tolerant hardware, not a demonstrated empirical result in the current simulation setting (Schuld & Killoran, 2019; Lloyd et al., 2013).

4.2 The privacy-utility tradeoff reconsidered

One of the persistent frustrations in privacy-preserving machine learning is that the tools available tend to force a choice. Add differential privacy noise and watch accuracy fall. Encrypt the data and face computational costs that scale poorly. Use secure multi-party computation and introduce coordination overhead that limits where the method can actually be deployed (Jagannathan & Wright, 2005; Graepel et al., 2012). This tradeoff is not absolute — it has always been a function of the method, the data, and the privacy budget — but it has been persistent enough that it shapes how practitioners think about privacy-preserving analytics.

The results here complicate that picture, at least somewhat. Across privacy budgets ε_tot ∈ {0.1, 0.5, 1.0, 5.0}, EQC maintains a substantially more favorable accuracy-privacy curve than any classical baseline [Table 2]. Even at ε = 0.1, where DP-K-means achieves only 32.5% accuracy, EQC reaches 68.7%. The mechanism is parameter efficiency: by constraining the circuit to 24 independent parameters rather than 112, EQC reduces the model's capacity to memorize individual records — which is precisely what membership inference attacks exploit (Yeom et al., 2018; Shokri et al., 2017). The differential privacy accounting then formalizes what the architectural constraint already achieves structurally. These two mechanisms reinforce each other in a way that adds up to more than either would deliver alone.

Whether this advantage persists at much larger scales — thousands of features, millions of records, deeper circuits — is genuinely uncertain. The experiments here use 8 qubits and circuits of depth 4 with at most 15-dimensional inputs after preprocessing. Scaling quantum circuits introduces noise, decoherence, and sampling overhead that the current simulation environment does not fully capture [Table 13]. Caution about extrapolating these results to large-scale production settings is warranted.

4.3 Understanding the role of equivariance

The role that equivariance plays in this framework deserves a more nuanced account than the headline results suggest. Geometric deep learning introduced equivariance as a principled inductive bias: building symmetry constraints into model architecture so that transformations of the input lead to corresponding transformations in the output (Cohen & Welling, 2016; Bronstein et al., 2021). For image data, this has clear motivation — a rotated image of a digit should produce the same classification as the unrotated version. For tabular network intrusion data, the motivation is considerably less obvious.

What p4m equivariance provides in the tabular context is essentially structured regularization. Features are grouped by semantic similarity — packet-size metrics, connection-duration features, protocol flags, error statistics — into a 2×4 grid, and the symmetry constraints then enforce parameter sharing across groups that the grid topology treats as equivalent. The 90° rotations that p4m encodes have no literal semantic meaning for NSL-KDD features. But the parameter reduction they enforce — from 112 to 24 independent parameters — stabilizes optimization, limits memorization of specific data presentations, and produces a model that generalizes better to unseen records [Table 5]. The ablation confirms that permutation-equivariant sharing based on semantic feature groupings achieves comparable results (77.4%), suggesting that future work might more directly encode domain knowledge about feature relationships rather than borrowing geometric symmetry from a different application domain (Maron et al., 2018).

The privacy implications of this architectural choice extend beyond regularization. By constraining what the model can learn, equivariance enforces a form of data minimization that aligns with the principle behind differential privacy — use only what is necessary to accomplish the task (Dwork et al., 2006). The model cannot memorize identifying details it has been structurally prevented from representing. This is a subtle but meaningful distinction from post-hoc privacy mechanisms: the protection is baked into the architecture, not applied externally.

4.4 Robustness, hardware noise, and the near-term reality

The noise resilience results carry a mixed message that is worth sitting with rather than glossing over. On one hand, EQC retains 90.8% of its clean performance across four noise types at moderate noise levels [Table 12], outperforming VQC (76.9%) and all classical baselines. The equivariance constraints appear to provide genuine stabilization, consistent with theoretical expectations from geometric deep learning (Cohen & Welling, 2016). On the other hand, the hardware noise analysis tells a more sobering story [Table 13]: at 10,000 shots — a realistic operating point for near-term quantum devices — accuracy falls from 78.4% to 62.3%, and membership inference attack success rises from 38.3% to 54.7%. The privacy protection that looked so strong under noiseless simulation is meaningfully weaker under realistic device conditions, even though it formally remains within the (ε, δ)-DP bound derived from Equation (4).

This gap between simulation performance and near-term hardware performance is one of the most important practical constraints facing quantum machine learning broadly, not just this framework (Lloyd et al., 2013). It does not invalidate the results, but it does mean that claims about practical deployment readiness need to be qualified carefully. Near-term implementation of EQC on real devices would require either zero-noise extrapolation or similar error mitigation techniques, or a hybrid architecture where noise-sensitive quantum kernel evaluations are replaced by classical approximations in high-noise regimes. Neither path is straightforward, and both represent meaningful open engineering problems.

4.5 Generalizability across domains and datasets

The three datasets used here — NSL-KDD, CERT Insider Threat v6.2, and Synthetic MIMIC-III — were selected to span different privacy-sensitive domains and to probe different aspects of the framework's behavior. The consistent performance advantage of EQC across all three [Table 4] provides some confidence that the results are not specific to a single data distribution. But limitations in this regard deserve acknowledgment.

NSL-KDD is a widely used benchmark but also a widely criticized one: its sampling methodology, class imbalance, and potential for overfitting through benchmark exposure have been documented in the clustering and intrusion detection literature (Aggarwal & Yu, 2008). The CERT and Synthetic MIMIC-III datasets are both synthetically generated, which means they may not fully capture the distributional complexity of real enterprise behavioral data or real clinical records. Performance under genuine dataset shift — deployment on data that differs meaningfully from the training distribution — has not been tested. The fairness analysis conducted on NSL-KDD revealed an 18.2 percentage point accuracy disparity between the best-performing attack class (DoS at 79.3%) and the worst (U2R at 61.1%). For applications in security operations or clinical triage, that level of per-class disparity matters and warrants systematic investigation before deployment [Table 1], [Figure 1].

4.6 Broader implications for privacy-preserving quantum machine learning

Stepping back from the specific results, this work points toward a broader design principle that may be worth carrying forward. Privacy-preserving quantum machine learning does not need to wait for fault-tolerant hardware to be useful. Quantum-inspired architectures — frameworks that adopt quantum circuit structure and kernel methods while running on classical simulators, and that are designed from the outset to integrate differential privacy accounting — can already deliver meaningful improvements over classical privacy-preserving baselines (Kerenidis et al., 2019; Broadbent et al., 2009). The value proposition is not "quantum gives you privacy for free." It is "quantum-inspired parameter efficiency, combined with rigorous DP accounting, gives you a better privacy-utility tradeoff than adding noise to classical methods alone."

That reframing has practical consequences. It means the relevant comparison for near-term deployment is not EQC versus a hypothetical future fault-tolerant quantum computer — it is EQC versus DP-K-means and DiffP-Spectral on real sensitive data today. Against those comparisons, the 15–16 percentage point accuracy advantage and 27–38 point privacy leakage reduction are meaningful [Table 1], [Table 4]. The computational overhead — approximately 3.6× relative to DP-K-means at equivalent privacy budgets [Table 2] — is a real cost, but not an prohibitive one for batch analysis applications in healthcare, security analytics, or financial compliance where latency is less critical than accuracy and auditability.

As quantum hardware matures and error rates fall, the theoretical quantum privacy advantages — no-cloning protection, measurement-induced information collapse, superposition-based obfuscation — may become practically accessible (Schuld & Killoran, 2019; Broadbent et al., 2009; Fitzsimons, 2017). When they do, a framework already designed around quantum circuit architecture and formal DP accounting is better positioned to incorporate those advantages than one that treats quantum computation as an add-on. That is the longer-term argument for EQC, even if the current evidence does not yet make it.

5. Conclusion

This paper set out to ask whether quantum computing and equivariance constraints could meaningfully improve the privacy-utility tradeoff in sensitive data clustering — and the answer is a qualified yes, with an important caveat worth stating plainly.

EQC delivers consistent and statistically significant improvements across clustering accuracy, membership inference resistance, model inversion robustness, and attribute inference protection compared to both classical and quantum baselines. The gains — approximately 15–16 percentage points in accuracy and 27–38 points in privacy leakage reduction — are real. What drives them, however, is parameter efficiency and differential privacy composition, not quantum mechanical effects, which remain a theoretical advantage awaiting fault-tolerant hardware rather than a demonstrated empirical one.

That distinction does not undermine the contribution. A quantum-inspired, privacy-by-design clustering framework that outperforms existing alternatives on sensitive real-world data types is valuable today, regardless of whether the quantum advantage fully materializes tomorrow. Future work should prioritize error-mitigated hardware implementation, permutation-equivariant architectures better suited to tabular domains, and systematic fairness analysis across demographic subgroups — all of which this paper identifies as open problems rather than resolved ones.

Author Contributions

B. M. T. H. Conceptualization, Methodology, Formal analysis, Investigation, Software, Validation, Visualization, Writing – original draft, Writing – review and editing, Project administration. M. A. R. Methodology, Software, Formal analysis, Validation, Data curation, Writing – original draft, Writing – review and editing. T. A. I. F. Investigation, Software, Formal analysis, Data curation, Visualization, Writing – original draft, Writing – review and editing. A. A. N. Conceptualization, Validation, Resources, Writing – original draft, Writing – review and editing, Supervision. A. A. Validation, Resources, Writing – review and editing, Supervision, Funding acquisition. All authors have read and approved the final version of the manuscript submitted for publication.

Competing financial interests

The authors declare no competing financial interests. The authors have no relevant financial or non-financial interests to disclose. No funding body had any role in study design, data collection and analysis, decision to publish, or preparation of this manuscript.

Acknowledgement

The authors gratefully acknowledge the computational resources and institutional support provided by their respective affiliated institutions: Central Michigan University (Mount Pleasant, MI, USA), Trine University (Angola, IN, USA), American International University–Bangladesh (AIUB, Dhaka, Bangladesh), Wilmington University (New Castle, DE, USA), and Washington University of Science and Technology (Alexandria, VA, USA).

The authors wish to thank the developers and maintainers of the open-source tools that made this research possible, including the Qiskit development team at IBM Quantum, the PennyLane team at Xanadu, and the Google Differential Privacy team for their publicly available DP Accounting Library. The NSL-KDD dataset is acknowledged as a standard benchmark resource in the network security research community. The CERT Insider Threat dataset (v6.2) is acknowledged as a resource made available through Carnegie Mellon University's Software Engineering Institute. The Synthetic MIMIC-III dataset was constructed to reflect the distributional characteristics of clinical data under applicable data governance requirements.

The authors also acknowledge the constructive feedback of the anonymous peer reviewers whose comments strengthened the methodological rigor and clarity of this work.

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

References

Aggarwal, C. C., & Yu, P. S. (2008). A general survey of privacy-preserving data mining models and algorithms. In C. C. Aggarwal & P. S. Yu (Eds.), Privacy-preserving data mining: Models and algorithms (pp. 11–52). Springer.

Aïmeur, E., Brassard, G., & Gambs, S. (2007). Quantum clustering algorithms. In Proceedings of the 24th International Conference on Machine Learning (pp. 1–8). ACM.

Blum, A., Dwork, C., McSherry, F., & Nissim, K. (2005). Practical privacy: The SuLQ framework. In Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (pp. 128–138). ACM.

Broadbent, A., Fitzsimons, J., & Kashefi, E. (2009). Universal blind quantum computation. In Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (pp. 517–526). IEEE.

Broadbent, A., & Jeffery, S. (2015). Quantum homomorphic encryption for circuits of low T-gate complexity. In Annual Cryptology Conference (pp. 609–629). Springer.

Bronstein, M. M., Bruna, J., Cohen, T., & Velickovic, P. (2021). Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478.

Bunn, P., & Ostrovsky, R. (2007). Secure two-party k-means clustering. In Proceedings of the 14th ACM Conference on Computer and Communications Security (pp. 486–497). ACM.

Casana-Eslava, R. V., Lisboa, P. J., Ortega-Martorell, S., Jarman, I. H., & Martín-Guerrero, J. D. (2020). Probabilistic quantum clustering. Knowledge-Based Systems, 194, 105567.

Chen, H., Chillotti, I., & Song, Y. (2019). Multi-key homomorphic encryption from TFHE. In Advances in Cryptology – ASIACRYPT 2019 (pp. 446–472). Springer.

Cohen, T., & Welling, M. (2016). Group equivariant convolutional networks. In Proceedings of the 33rd International Conference on Machine Learning (pp. 2990–2999). PMLR.

Deshmukh, S., Behera, B. K., Mulay, P., Ahmed, E. A., Al-Kuwari, S., Tiwari, P., & Farouk, A. (2023). Explainable quantum clustering method to model medical data. Knowledge-Based Systems, 267, 110413.

Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006 (pp. 265–284). Springer.

El Maouaki, W., Innan, N., Marchisio, A., Said, T., Bennai, M., & Shafique, M. (2024). Quantum clustering for cybersecurity. In Proceedings of the 2024 IEEE International Conference on Quantum Computing and Engineering (QCE) (Vol. 2, pp. 5–10). IEEE.

Fitzsimons, J. F. (2017). Private quantum computation: An introduction to blind quantum computing and related protocols. npj Quantum Information, 3(1), 23.

Google. (2023). Differential privacy accounting library [Software]. GitHub. https://github.com/google/differential-privacy

Graepel, T., Lauter, K., & Naehrig, M. (2012). ML confidential: Machine learning on encrypted data. In International Conference on Information Security and Cryptology (pp. 1–21). Springer.

Jagannathan, G., & Wright, R. N. (2005). Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (pp. 593–599). ACM.

Kerenidis, I., Landman, J., Luongo, A., & Prakash, A. (2019). q-means: A quantum algorithm for unsupervised machine learning. Advances in Neural Information Processing Systems, 32, 1–11.

Lloyd, S., Mohseni, M., & Rebentrost, P. (2013). Quantum algorithms for supervised and unsupervised machine learning. arXiv preprint arXiv:1307.0411.

Lloyd, S., Mohseni, M., & Rebentrost, P. (2014). Quantum principal component analysis. Nature Physics, 10(9), 631–633.

Maron, H., Ben-Hamu, H., Shamir, N., & Lipman, Y. (2018). Invariant and equivariant graph networks. arXiv preprint arXiv:1812.09902.

Mironov, I. (2017). Rényi differential privacy. In Proceedings of the 2017 IEEE 30th Computer Security Foundations Symposium (CSF) (pp. 263–275). IEEE.

Schuld, M., & Killoran, N. (2019). Quantum machine learning in feature Hilbert spaces. Physical Review Letters, 122(4), 040504.

Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2017). Membership inference attacks against machine learning models. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP) (pp. 3–18). IEEE.

Singh, P., & Bose, S. S. (2021). A quantum-clustering optimization method for COVID-19 CT scan image segmentation. Expert Systems with Applications, 185, 115637.

Weinstein, M., Meirer, F., Hume, A., Sciau, P., Shaked, G., Hofstetter, R., Persi, E., Mehta, A., & Horn, D. (2013). Analyzing big data with dynamic quantum clustering. arXiv preprint arXiv:1310.2700.

Yeom, S., Giacomelli, I., Fredrikson, M., & Jha, S. (2018). Privacy risk in machine learning: Analyzing the connection to overfitting. In Proceedings of the 2018 IEEE 31st Computer Security Foundations Symposium (CSF) (pp. 268–282). IEEE.

Article metrics

View details

Downloads

Citations

248

Views

📥 PDF ▾

📖 Cite article

View Dimensions

View Plumx

View Altmetric

0
Save

0
Citation

248
View

0
Share

Journal of Ai ML DL

Article Contents

Equivariant Quantum Clustering with Differential Privacy: Parameter-Efficient Privacy-Preserving Analysis Across Heterogeneous Sensitive Datasets

Abstract

1. Introduction

2. Methodology

3. Results

4. Discussion

5. Conclusion

Author Contributions

Competing financial interests

Acknowledgement

References

Stay connected