| Generator | Prompt | Labels | Medical | Concept | Polish | Total |
|---|---|---|---|---|---|---|
| Ghost (Human Baseline) | 5 | 5 | 5 | 5 | 5 | 25 |
| Sora | 3 | 4 | 2 | 4 | 4 | 17 |
| AIMAGE | 3 | 4 | 2 | 4 | 3 | 16 |
| Nano Banana | 2 | 2 | 2 | 3 | 4 | 13 |
| Flux 2 | 3 | 2 | 1 | 3 | 4 | 13 |
| Grok | 2 | 2 | 2 | 3 | 3 | 12 |
| Midjourney | 1 | 0 | 2 | 2 | 4 | 9 |
Everything the prompt demanded shows up cleanly and intentionally. Clear hub with believable internal detail, correct clinical framing, soft DOF with gloved fingertips, and the amber HUD callouts are the exact style target. This is what "production-correct" looks like. It's not just pretty. It's art-directed, CAD-grounded, and clinically reviewed, so the prompt is effectively a description of this result.
Clean, readable callouts with correct phrases, clear product visibility, and overall high polish. If someone needed a quick marketing graphic, it's close. Lost points because the labels are not the amber HUD style with micro UI details. They're conventional boxes. The "two translucent gloved fingertips hovering" requirement is not met as written. One of the better "follow instructions and look clean" attempts, but it doesn't hit the specific Ghost HUD art direction.
Strongly resembles the client's callout branding language and placement logic. Two callouts, leader lines, readable typography, generally clean presentation. Lost points because HUD color and styling drift (reads more "blue hologram overlays" than "amber futuristic HUD"), and the composition is not the requested tight macro closeup with two out-of-focus fingertips hovering behind. Very usable as a style-matcher, still not dependable for clinically plausible placement without correction.
The callout style is suspiciously close to the Ghost HUD look, and the render polish is strong. If this were a pure "make it look cool" test, it would rank higher. Lost points because the lower label is misspelled ("Ridge Catheter Hub" instead of "Ridged Catheter Hub"), which is an automatic major hit in a medical labeling test. It appears to have learned the aesthetic, but not the clinical truth underneath it.
Strong visual polish, clean studio feel, good depth and lighting. Lost points because of misspelling on "Ridged Catheter Hub" (it reads like "Carheter"), which is a labeling failure for this test. Looks good, fails the "exact words, exact style" requirement.
High-end cinematic rendering and texture detail. This is the classic failure mode. It invents extra labels, adds gibberish microtext, and violates "no other text." It can look expensive, but it's unreliable where medical work actually breaks, which is text discipline and constraint-following.