This research, originally published in the IEEE Open Journal of Engineering in Medicine and Biology, applies topological data analysis (TDA) to breast cancer gene expression data from 1,224 primary samples. The study demonstrates how analyzing data's shape and structure can reveal clinically meaningful cancer subtypes beyond conventional classification methods.
Key Findings
The researchers identified a previously unrecognized subgroup within luminal B breast cancer. Their analysis distinguished between HER2-low and HER2-high luminal B tumors, suggesting that "some patients do not respond to trastuzumab" may belong to this distinct subtype requiring different therapeutic approaches.
The TDA methodology simultaneously modeled both sample and feature spaces, enabling the team to pinpoint specific genes โ including ERBB2, GRB7, STARD3, and others โ that characterize different cancer subtypes.
Methodology
Rather than traditional dimensionality reduction techniques like PCA, the authors used graph-based modeling to represent data topology. They constructed networks where nodes represented sample or gene clusters, colored to indicate subtype associations. This approach preserved biological interpretability, ultimately enabling visualization of findings in clinically understandable 3D scatter plots using actual gene expression values as axes.
Clinical Implications
The discovery that luminal B cancers segregate into HER2-low and HER2-high groups has significant treatment ramifications, as these subgroups respond differently to hormone therapies and HER2-targeted interventions.