Generator Prompt Labels Medical Concept Polish Total
Ghost (Human Baseline) 5 5 5 5 5 25
AIMAGE 4 5 4 4 4 21
Sora 3 5 3 4 4 19
Flux 2 3 5 2 3 3 16
Nano Banana 2 5 2 3 3 15
Midjourney 2 5 1 2 2 12
Grok 2 5 1 2 0 10

Model-by-Model Notes

Ghost (Human Baseline) 25/25

Nails the requested inside-the-heart viewpoint, with convincing endocardial surfaces and an OR-style cutaway feel. The device placement reads as clinically plausible, seated at the annulus with tissue hugging the skirt in a believable way. Catheter alignment, guidewire continuity, and device details look intentional, not invented. Most importantly, this is CAD-based and surgeon-approved for placement and realism. That's what "as accurate and validated as it gets" looks like.

AIMAGE 21/25

Very close to Ghost's framing and lighting, with believable wet tissue and a strong "in vivo cutaway" read. Valve looks broadly like a TAVR, with recognizable skirt and leaflets. Includes procedure cues (catheter and wires) in a way that mostly supports the story. Lost points because small geometry and anatomy decisions feel a bit improvised compared to Ghost, especially around how tissue interfaces with the skirt. It's close enough to be useful as a concept assist, but it's not at the "validated illustration" bar.

Sora 19/25

Clean, polished cutaway look, good tissue shading, and a clear valve-at-annulus idea. The valve is centered and easy to read, with good overall clarity. Lost points because it drops key prompt requirements, most notably the catheter with marker rings and the continuous guidewire. It's "medical-looking" and well-rendered, but less procedurally truthful.

Flux 2 16/25

Includes the procedural setup (catheter, guidewire) and a readable cutaway scene. Decent polish and lighting. Lost points because the valve proportions and shape are off. The stent and skirt relationship looks wrong, and the overall geometry reads like something that would not realistically seat at the annulus. This is the exact failure mode that makes device marketing risky. It can look credible to non-experts while being physically implausible to anyone who knows the implant.

Midjourney 12/25

Produces a dramatic medical render aesthetic. Device geometry becomes exaggerated and mechanically nonsensical relative to a real TAVR. View and anatomy are not reliably aligned with the specific instructions, and the result reads more like "cool sci-med concept art" than a constrained clinical illustration.

Grok 10/25

Attempts a heart cutaway context. The device becomes an invented object. The stent and internal structure don't read like a deployed transcatheter valve. This is the "confident hallucination" problem: it outputs something that looks engineered, but it's not the right engineered thing. In a medical context, that's an automatic fail.