New research from Nature Medicine reveals: Generic AI chatbots fail in healthcare delivery. The solution? Objective, specialized AI systems like LINDERA's gait analysis.
Artificial intelligence is revolutionizing medicine – or so the promise of recent years suggested. ChatGPT passes medical licensing exams with top scores, generates clinical notes in record time, and answers patient questions with apparent competence. Hospitals and care facilities worldwide are experimenting with AI chatbots as first points of contact for patients.
But a groundbreaking study from the University of Oxford, published in Nature Medicine (February 2026), paints a sobering picture: When real people use ChatGPT for medical decisions, the system fails.
1,298 participants in the United Kingdom were randomized into four groups:
Each participant received one of 10 realistic medical scenarios – from sudden headaches to bloody diarrhea.
Task: Identify correct diagnosis + assess urgency (self-care to emergency ambulance).
| Metric | ChatGPT Alone | Human + ChatGPT | Control Group |
|---|---|---|---|
| Correct Diagnosis | 94.9% | 34.5% | 35-40% |
| Correct Triage | 56.3% | 44.2% | 43% |
The central finding: Humans with AI assistance performed no better than without AI – sometimes even worse.
What the study showed:
Real-world example from the study: Two users with identical symptoms of subarachnoid hemorrhage received contradictory recommendations:
The consequence: Text-based AI is only as good as its input – and laypeople are unreliable data providers.
What the study showed:
The consequence: Even when ChatGPT provides the correct answer, it's frequently ignored or misinterpreted.
What the study showed:
The consequence: Unpredictable AI behavior systematically undermines user trust.
While ChatGPT relies on subjective text descriptions, LINDERA uses objective movement data.
| Aspect | Text-based AI (ChatGPT) | LINDERA Gait Analysis |
|---|---|---|
| Data Source | Subjective symptom description | Objective gait parameters (video) |
| User Effort | Active interaction required | Passive: 10-second video |
| Error Susceptibility | High (communication barrier) | Low (automated measurement) |
| Output | Multiple possible diagnoses | Clear risk traffic light + actions |
| Validation | Benchmarks ≠ Real-world | Validated in care facilities |
Oxford Problem: Patients describe symptoms incompletely or irrelevantly.
LINDERA Solution:
Result: Objective, reproducible data – independent of language barriers or medical knowledge.
Oxford Problem: Users can't choose between 2+ AI suggestions.
LINDERA Solution:
Result: No overwhelming multiple-choice diagnostics.
Oxford Problem: ChatGPT is trained for everything, specialized in nothing.
LINDERA Solution:
Result: Consistent, reliable assessments instead of "diagnostic roulette."
Researchers also tested ChatGPT on medical exam questions (MedQA):
Authors' Conclusion:
"Standard benchmarks for medical knowledge and simulated patient interactions do not predict the failures we find with human participants."
LINDERA wasn't tested on theoretical questions, but in:
Result: Optimized for practical usability from day one – not for passing exams.
Instead of: "Mrs. Smith, how are you feeling today?" (subjective, inconsistent)
With LINDERA: 30-second gait video → traffic light result → structured action plan
Advantage: Documentable, objective, legally compliant.
Instead of: Time-consuming manual assessments (Timed Up & Go, etc.)
With LINDERA: Automated capture during every hallway walk
Advantage: Continuous monitoring without additional effort.
Instead of: Reactive care after falls (expensive)
With LINDERA: Preventive intervention at yellow signal (cost-effective)
ROI: Each prevented fall saves avg. $18,000-24,000 in treatment costs.