New AI Model Foresees 1,000+ Diseases with Startling Accuracy Decades in Advance

A groundbreaking new AI tool, named Delphi-2M, has been developed to model the progression of more than 1,000 human diseases simultaneously, marking a significant leap in predictive healthcare. The research, published in the journal Nature, presents a “generative pretrained transformer” (GPT) model that can forecast a person’s future health trajectory, offering unprecedented potential for personalized medicine and public health planning.

Delphi-2M was trained on a massive scale, utilizing anonymized patient data from 0.4 million participants in the UK Biobank and validated on an external dataset of 1.9 million Danish individuals. The model learns the complex “grammar” of health, analyzing a patient’s medical history, including diagnoses, and lifestyle factors like obesity, smoking, and alcohol consumption. By understanding the sequence and timing of these events, the AI can predict the rates of over 1,000 diseases with an accuracy comparable to existing single-disease models.

Researchers, including EMBL interim executive director Ewan Birney, hope to see the tool integrated into clinical practice within the next five to ten years. Birney envisions a future where doctors can use these sophisticated AI models to advise patients on their specific major health risks and suggest concrete preventative actions. For example, the tool could show a patient their personal probability of developing heart disease and highlight the positive impact of a lifestyle change, such as quitting smoking, on that risk.

Beyond individual patient care, the model’s generative nature allows it to sample synthetic future health trajectories, providing meaningful estimates of potential disease burdens for up to 20 years. This capability is critical for healthcare leaders and policymakers who need to plan and allocate resources more effectively to address the needs of an aging population.

A key finding of the research is that a version of Delphi-2M trained entirely on synthetic data—that is, data that does not correspond to any real individual—performed almost as well as the original model. This demonstrates a powerful new method for protecting patient privacy while still creating datasets that are valuable for training other AI models and advancing biomedical research.

While the model shows remarkable promise, the researchers acknowledge its limitations. It works best for diseases with predictable progression patterns like certain cancers and heart attacks, but is less reliable for conditions with highly variable causes, such as mental health disorders. The model also reflects biases from its training data, including a “healthy volunteer bias” in the UK Biobank cohort. For this reason, the developers stress that Delphi-2M is intended as a tool to support, not replace, the clinical judgment of healthcare professionals.

Reference:

Shmatko, A., Jung, A.W., Gaurav, K., Brunak, S., Mortensen, L.H., Birney, E., Fitzgerald, T. & Gerstung, M. (2025). Learning the natural history of human disease with generative transformers. Nature. https://doi.org/10.1038/s41586-025-09529-3.