Visual Field Test Logo

AI for Glaucoma Detection and Prognosis: Performance, Bias, and Clinical Integration in 2025

16 min read
AI for Glaucoma Detection and Prognosis: Performance, Bias, and Clinical Integration in 2025

AI for Glaucoma Detection and Prognosis: Performance, Bias, and Clinical Integration in 2025

Glaucoma is a leading cause of irreversible blindness, affecting over 90 million people worldwide (pmc.ncbi.nlm.nih.gov). Worryingly, more than 70% of glaucoma cases remain undiagnosed (pmc.ncbi.nlm.nih.gov). Early detection is crucial because vision loss from glaucoma cannot be undone. In recent years, artificial intelligence (AI) – especially deep learning – has shown great promise in analyzing eye data (such as photos of the retina and scans like OCT or visual field tests) for glaucoma. For example, a recent meta-analysis found that AI algorithms on retinal photographs reach high accuracy: pooled sensitivity 0.92 and specificity 0.93 (AUC≈0.90) (pubmed.ncbi.nlm.nih.gov). Another review noted that many AI systems now routinely achieve AUC around 0.95, matching or even exceeding experienced glaucoma specialists in side-by-side tests (pmc.ncbi.nlm.nih.gov).

In this article, we review the very latest deep learning tools (2023–2025) for glaucoma diagnosis and progression prediction from fundus photos, OCT, and visual fields. We compare how well they work on new patient groups and devices, and how bias can be reduced. We also cover the regulatory landscape (FDA, CE Mark, etc.), legal issues, and practical ways AI might fit into real eye clinics (from optometry offices to specialist centers). Where possible, we include head-to-head comparisons against doctors and look at cost-effectiveness data.

AI Models for Glaucoma Detection and Prognosis

Modern AI models for glaucoma use large datasets of eye images and clinical data to learn patterns of disease. Three main data types are used: fundus photography (color images of the back of the eye showing the optic nerve), optical coherence tomography (OCT) scans (3D images of the retina layers), and visual field (VF) tests (maps of a patient’s vision). We discuss AI tools built for each of these inputs, and some that combine them.

Fundus Photography Models

Fundus images of the optic nerve head are a classic way to look for glaucoma signs (like an enlarged optic cup). Many deep learning models have been trained on thousands of fundus photos. In high-quality tests, these models show excellent results. For example, an AI system trained on retinal photos achieved 93–96% accuracy (AUC ≈0.95) in finding glaucomatous optic neuropathy in large datasets (pmc.ncbi.nlm.nih.gov). In that study, the AI was actually more sensitive than human graders: it caught 91–92% of true cases, versus lower sensitivity by experts (pmc.ncbi.nlm.nih.gov). Other teams reported similar success: one multi-model AI (called AI-GS) achieved 93.5% sensitivity at 95% specificity in controlled testing on fundus images (www.nature.com). Even when the same AI was tested on real-world images from diverse smartphones, it still kept a high 80–94% sensitivity (www.nature.com).

Crucially, these fundus AI models often perform on par with clinicians. In one comparison, an AI called Pegasus (based on deep learning) was tested against six ophthalmologists on the same fundus data. It achieved 92.6% AUC, about the same as the doctors, with no significant difference in accuracy (pmc.ncbi.nlm.nih.gov). Another review reported that AI’s accuracy is now “at the level of retinal specialists” for glaucoma detection (www.sciencedirect.com). These results suggest that deep neural networks can learn to recognize optic nerve damage as well as human experts in many cases.

OCT-Based Models

OCT is a laser scan that measures the thickness of the retinal nerve fiber layer (RNFL) and other retinal layers. Changes in these layers are early markers of glaucoma. Deep learning can analyze OCT scans for glaucoma. In the same meta-analysis above, the pooled sensitivity and specificity of AI on OCT for glaucoma were 0.90 and 0.87 (AUC≈0.86) (pubmed.ncbi.nlm.nih.gov). Some standalone OCT-AI tools have been built using convolutional networks. For example, researchers at Stanford used over a million OCT images to train a CNN to detect glaucoma – they reported high accuracy (AUC ≈0.99 internal, and 0.92 on external validation) (pubmed.ncbi.nlm.nih.gov) (pmc.ncbi.nlm.nih.gov).

Deep OCT analysis can also predict prognosis. Some advanced AI models use 3D OCT scans and past visits to forecast vision loss. For instance, one “FairDist” model used an OCT volume at baseline to predict future progression speed. It trained with a “knowledge distillation” method and explicitly weighted subgroups (gender, race) for equity (www.nature.com). FairDist achieved the highest AUC and fairness (equity-scaled AUC) for predicting fast progression among all tested methods (www.nature.com). This shows OCT-based AI can flag which patients may worsen quickly, while also mitigating bias.

Visual Field Models

Standard Automated Perimetry (SAP) tests map visual field loss in glaucoma. New AI approaches analyze field data (either alone or with images) to detect or predict progression. For example, sequence-learning models (recurrent neural nets) have been able to detect visual field worsening up to 1.7 years earlier than traditional methods (pmc.ncbi.nlm.nih.gov). Another recent study even used a quick portable VF (called Imo/TEMPO Screening) and trained an AI to predict full Humphrey VF results. This AI (“DeepISP”) could forecast the next VF values with reasonable accuracy (mean absolute error around 1.8–5.1 dB for summary metrics) (pmc.ncbi.nlm.nih.gov). In practice, visual field AI can serve two roles: (1) filtering unreliable fields and (2) forecasting future vision, which may help personalize follow-up.

Multimodal and Progression Prediction

Rather than using a single type of data, researchers are also building multimodal networks. One example in the literature combined fundus, OCT, and clinical data into one “baseline risk” model to forecast who might need glaucoma surgery. It achieved ~0.92 AUC for predicting eventual need for surgical intervention (pmc.ncbi.nlm.nih.gov). Other teams have tried to synthesize images: one group trained a generative adversarial network (GAN) to create predicted future OCT scans and used them to improve progression forecasts. In their study, using synthetic images gave an AUC of 0.81 for predicting progression 9 months ahead (versus only ~0.68–0.72 using one data source alone) (www.nature.com). These hybrid approaches are still experimental, but they hint at more powerful models that combine structure (OCT), function (VF), and patient history.

Validation Across Populations and Bias Mitigation

A key issue for any medical AI is external validation: testing the model on patient groups different from its training data. Many glaucoma AI models have not been widely validated on diverse datasets, which can hide biases or performance drops (pmc.ncbi.nlm.nih.gov) (pubmed.ncbi.nlm.nih.gov). For instance, an algorithm trained on hospital images may perform well on internal test data, but engineers have found that accuracy often falls on truly independent sets (pubmed.ncbi.nlm.nih.gov). One review emphasized that internal validation often overestimates performance, and independent validation is “crucial” but sadly lacking in many studies (pmc.ncbi.nlm.nih.gov).

Indeed, breakthrough performance in one setting may not generalize: a fundus AI could achieve 93% sensitivity in Chinese hospitals, but in a high-myopia population it dropped to ~82% (pmc.ncbi.nlm.nih.gov). The high-myopia study noted that myopia changes the optic disc appearance and made glaucoma much harder for the AI to detect (pmc.ncbi.nlm.nih.gov). Likewise, differences in retina color or optic nerve structure between ethnicities can skew results. For example, one fundus model was accurate in Japanese eyes, but might underperform in African or European eyes if not retrained.

Researchers are actively developing bias mitigation strategies. One promising method is to explicitly balance feature importance across groups. A project called Fair Identity Normalization (FIN) did this for race and ethnicity in OCT-based glaucoma models (www.nature.com). Adding FIN lifted the overall AUC from 0.82 to 0.85, but more importantly it raised the AUC for Black patients from 0.77 to 0.82 (closing the gap with others) (www.nature.com). In another example, the FairDist progression model introduced an “equity-aware” learning layer so that gender and racial subgroups were weighted fairly. FairDist achieved the highest equity-scaled AUC among competing methods, meaning it was both accurate and balanced across subgroups (www.nature.com).

In practice, deploying glaucoma AI should involve continual auditing. Teams may retrain their networks on new data from underrepresented groups and use data augmentation to mimic different devices or conditions. Public datasets like REFUGE and EyePACS help by providing images from multiple camera types and ethnicities (www.nature.com). Going forward, the field recognizes that bias and overfitting are major challenges (pmc.ncbi.nlm.nih.gov) (www.nature.com). Models must be validated on varied real-world data and tested for any performance gaps. Tools like equity-scaled metrics (as above) can quantify fairness and guide improvements (www.nature.com) (www.nature.com).

Regulatory Status (2025)

By 2025, the regulatory landscape for ophthalmic AI is evolving but still sparse for glaucoma. In the US, the FDA has cleared several AI tools for retina diseases (e.g., autonomous diabetic retinopathy screening), but no FDA-cleared device specifically for autonomous glaucoma detection exists yet (pmc.ncbi.nlm.nih.gov). In contrast, Europe has been more permissive. Medical AI devices there are given CE marks according to risk class. The EU’s system (now under the new Medical Device Regulation) classifies devices as Class II or up, but enforcement is typically done by private notified bodies (pmc.ncbi.nlm.nih.gov). For example, an Indian company’s smartphone-based AI (“Medios HI Glaucoma AI”) received a CE mark in late 2025 as a Class II software medical device (www.biospectrumindia.com). Globally, CE-marked tools for referable glaucoma screening do exist, indicating they meet EU safety/efficacy standards.

Other countries are moving similarly. In early 2025, India’s drug/device regulator (CDSCO) granted pan-India approval for an offline AI tool to detect glaucoma (and AMD) on a portable fundus camera (www.pharmabiz.com). This builds on India’s earlier approvals of AI for diabetic retinopathy. Essentially, the tool is now cleared for use in clinics across India, reflecting official confidence in its performance (www.pharmabiz.com). China has also approved AI devices (though mostly DR-focused to date) and Japan/others are considering similar pathways under their national rules.

Regulators are also developing guidance. The FDA’s new Good Machine Learning Practice (GMLP) draft guidance lays out expectations (data quality, performance tracking, etc.). The impending EU AI Act (focused on transparency and risk categories) will also apply to medical imaging AI. All these mean that future glaucoma AI will need documented safety studies, bias assessments, and post-market monitoring. For now, most products are developers’ own solutions (or in pilot programs); official oversight is limited but growing. The key point: by 2025 there is no simple “FDA-approved glaucoma AI”; instead, various national approvals (CE, CDSCO) are appearing for screening or decision-support systems.

Medico-Legal Considerations

Introducing AI into glaucoma care raises complicated legal questions. If an AI misses a glaucoma diagnosis, who is responsible? Current medical law expects doctors to meet the standard of care. Some experts suggest that an FDA-cleared AI should set a similar standard for judges comparing physician decisions (pmc.ncbi.nlm.nih.gov). Liability today is likely shared among multiple parties: the treating physician (negligence), the healthcare institution (vicarious liability), and the AI provider/manufacturer (product liability) (pmc.ncbi.nlm.nih.gov). For example, if a doctor blindly trusts an AI referral that is wrong, should blame fall on the software maker or the doctor who used it? Such questions are unsettled.

Many ethicists advise caution: AI output should not be treated as free advice. Often, recommendations state that physicians must remain in the loop, reviewing AI findings and making the final call. Healthcare systems are also wary of unproven claims by some vendors (so they require evidence before trust). In practice, liability coverage (malpractice insurance) will likely treat AI-support like any other diagnostic tool. Some have even proposed new legal concepts (e.g., “common enterprise liability” or special AI product indemnity) to share responsibility (pmc.ncbi.nlm.nih.gov).

For now, doctors using AI must document it properly. This means keeping logs of AI results, how they were interpreted, and discussing AI’s limitations with patients. Data privacy is another concern: models often require storing images and clinical data, triggering HIPAA or GDPR rules. Moreover, as the technology evolves, providers should follow guidelines from professional bodies. In fact, the American Academy of Ophthalmology and others have begun publishing principles on AI use (emphasizing transparency, validation, and accountability). Ultimately, no legal framework is complete yet, but the field recognizes that clear guidance and liability rules are needed (www.sciencedirect.com). Until then, the safest practice is to use AI as an aid – augmenting clinician judgement rather than replacing it (pmc.ncbi.nlm.nih.gov) (pmc.ncbi.nlm.nih.gov).

Clinical Integration and Workflows

To deliver value, glaucoma AI must fit into real clinical workflows – which vary widely by setting. Below we sketch two ends of the spectrum: primary care/screening environments and tertiary care (specialized glaucoma clinics).

Primary Care Screening

Most people see optometrists, general practitioners, or go to health fairs for eye screening. These settings lack expert glaucoma specialists and sometimes lack expensive OCT machines. AI-driven screening is most likely here. For example, in Australia, how optometrists find glaucoma can be uneven. One review notes that AI analysis of optic nerve photos could help optometrists quickly flag high-risk eyes (pmc.ncbi.nlm.nih.gov). Clinical trials are underway: a 2025 study tested automated retinal photography plus AI-triage in primary care (part of an ongoing pragmatic trial). Early reports suggest using such AI can dramatically increase the number of cases detected with minimal extra cost or time.

Low-resource contexts use smartphone cameras. For instance, one group developed a tele-screening workflow: health workers take a fundus photo on a mobile device, AI gives an immediate referral suggestion (normal vs glaucoma signs). In trials of these systems, sensitivity was often in the 90% range. A recent report found a smartphone AI achieved ~94% sensitivity at ~90% specificity during community screenings (pmc.ncbi.nlm.nih.gov). That means it caught almost all true cases with few false alarms. Because this runs offline, it can work in rural clinics without internet.

Another primary-care AI approach is non-image triage. The “Multi-Glau” system in China had a special screening model for rural clinics without any imaging. It used only simple exam data (age, gender, eye pressure, vision score, and an optic disc C/D ratio estimation) to decide who to refer (www.nature.com). This XGBoost model achieved AUC ≈0.93 for flagging likely glaucoma in a primary-care population (www.nature.com) (www.nature.com). Patients who screen positive then get a full eye exam. Such tiered workflows (screen with simple data → refer etc.) could scale well in areas lacking equipment.

Tertiary Eye Care

In specialized glaucoma clinics, doctors already use OCT and VFs routinely. Here AI tools might refine diagnosis or predict outcomes. For example:

  • Decision support: AI models can highlight suspicious areas. In a busy clinic, an OCT-AI could automatically flag fields with thinning, letting doctors review those more carefully. One study found that when glaucoma specialists reviewed AI heatmaps for OCT, they improved their own reading speed and consistency.
  • Progression monitoring: In follow-up, combining past data for risk scores can guide treatment. One large trial showed an AI flagging fast progressors years earlier than humans (pmc.ncbi.nlm.nih.gov). Clinics might integrate such predictions into electronic health records: if an AI charges a patient as “high risk now”, the doctor could intensify treatment sooner.
  • Tele-ophthalmology: In some health systems, tertiary centers review images remotely. An AI could pre-screen cases before a specialist even sees them. For example, the European REFLECT study (2024) used AI to triage referrals: those judged glaucoma-positive by AI were bumped up on the specialist list. Preliminary data suggest AI-assistance can improve overall detection rates without much extra cost.

Each model above must fit the workflow. Ophthalmology departments are experimenting with “AI-in-box” solutions that attach to OCT or cameras and give a second opinion. The practice changes slowly, but initial pilots indicate that combining AI with clinician review yields the best results. In fact, one systematic review noted that when specialists used AI assistance along with their judgement, diagnostic accuracy improved compared to either alone (pmc.ncbi.nlm.nih.gov).

Cost-Effectiveness

Will AI save money? Many studies suggest yes, even though results vary. On the one hand, automated screening can detect disease earlier (saving future treatment costs). On the other hand, running an AI program needs investment in cameras, software, and follow-up care.

A recent systematic review looked at economic evaluations of AI screening for eye diseases. It found that most studies (about 11 of 15) concluded AI screening was cost-effective or cost-saving in some way (www.sciencedirect.com). For glaucoma specifically, one Chinese community program modeled a 15-year horizon of AI-assisted screening for seniors. They found the incremental cost was roughly $1464 per new glaucoma case detected (www.sciencedirect.com). In their setting this was not strictly cost-saving, but it helped avoid some blindness and was still a modest cost per case. Whether that is "worth it" depends on local healthcare budgets and values for preventing vision loss.

More generally, cost-effectiveness tends to improve if the AI is used on large populations (spreading out overhead) and if high-risk patients are targeted. For example, performing an AI fundus screen on everyone over 60 in a clinic is more cost-effective than on younger low-risk people. Mobile AI screening that also catches diabetes and AMD (as with some smartphone tools) spreads the benefit further. Comprehensive analyses also consider intangible gains: reducing patient travel, avoiding emergency glaucoma cases, and so on.

In summary, while exact numbers differ, early models suggest AI “pays off” if implemented at scale in appropriate settings. As one review put it, AI can reduce costs and improve efficiency, especially for mass screening programs (www.sciencedirect.com) (www.sciencedirect.com). Workflow improvements (like shifting testing from scarce doctors to machines) also free up clinician time for complex patients, adding hidden value that is hard to quantify.

Conclusion

By 2025, AI has come a long way in glaucoma care. Sophisticated deep learning models can analyze OCT scans, fundus photos, and visual fields with accuracy comparable to human glaucoma specialists (pmc.ncbi.nlm.nih.gov) (pubmed.ncbi.nlm.nih.gov). Multimodal and time-series models now even attempt to predict who will get worse or need surgery next. Field trials show AI screening tools achieving very high sensitivities (~90–95%) in diverse communities (pmc.ncbi.nlm.nih.gov) (www.nature.com).

However, technical promise is only half the story. Real-world use must navigate data bias and regulatory hurdles. Many published AI systems still need independent testing on different patient groups (pubmed.ncbi.nlm.nih.gov) (pmc.ncbi.nlm.nih.gov). Developers are actively addressing this with fairness techniques (for example, increasing AUC in minority subgroups (www.nature.com) (www.nature.com)). Meanwhile, clear legal frameworks are still being defined. Liability for AI-aided diagnoses remains shared between doctors, hospitals, and device makers (pmc.ncbi.nlm.nih.gov). Clinical guidelines so far urge caution: use AI as a second reader, and keep patients in the loop.

Workflow integration is also key. In primary care, AI-screening can transform detection: optometrists using AI photo analysis can find early glaucoma cases they would have missed (pmc.ncbi.nlm.nih.gov). In tertiary centers, AI can help triage referrals and monitor progression. Unmanned tele-screening (even on smartphones) is practical now, potentially extending care to remote areas. Each setting requires its own model design – for example, China’s “three-tier” strategy uses simple exams in village clinics and full imaging in hospitals (www.nature.com).

Finally, early economic studies are encouraging: AI screening can be done cost-effectively when targeted and well-managed (www.sciencedirect.com) (www.sciencedirect.com). As more data and real-world pilots come in, the picture will clarify.

In the near future, we expect glaucoma AI to function as a clinical decision support rather than an independent doctor. With rigorous external validation, bias audits, and adherence to evolving regulations (e.g. EU AI Act), these tools can safely augment clinicians. In sum, AI has the potential to catch glaucoma earlier and manage it more precisely, improving outcomes and access – as long as we address performance, fairness, and legal issues head-on (pmc.ncbi.nlm.nih.gov) (pmc.ncbi.nlm.nih.gov).

Like this research?

Subscribe to our newsletter for the latest eye care insights, longevity and visual health guides.

Ready to check your vision?

Start your free visual field test in less than 5 minutes.

Start Test Now
This article is for informational purposes only and does not constitute medical advice. Always consult with a qualified healthcare professional for diagnosis and treatment.
AI for Glaucoma Detection and Prognosis: Performance, Bias, and Clinical Integration in 2025 - Visual Field Test | Visual Field Test