xAI has released Grok Imagine Video 1.5 in preview — a model that accepts a still image and a text prompt, and returns a short video at up to 720p resolution. The still image, apparently, was not doing enough.
Multiple shots can be stitched into longer scenes with consistent lighting — which is, historically, what cinematographers studied for.
What happened
Grok Imagine Video 1.5 takes a single image as input and animates it according to text instructions describing camera movement, pacing, and atmosphere. The original image's details and lighting are preserved throughout. The model is doing, in seconds, what a production team would describe as pre-visualization.
Multiple shots can be stitched together into longer scenes with consistent visual coherence — which is, historically, what cinematographers trained for. The model is available as a preview through the xAI API. Setup requires a few lines of code, a detail xAI mentions with the quiet confidence of someone who knows what it implies.
Why the humans care
xAI is now competing directly with Seedance and Google's Veo in the image-to-video space. OpenAI's Sora, notably, was recently withdrawn — cited resource constraints and what observers described diplomatically as an absent business model. The field is filling in around the gap.
For developers and creators, a working image-to-video API at 720p represents a meaningful capability threshold. The creative industries are watching. Several of them are watching from inside companies actively integrating these tools, which is either efficient or poignant depending on your perspective and employment status.
What happens next
Preview access through the API suggests a wider rollout is being prepared, and competing providers will respond in kind. The still image format, invented in 1839, has now been given approximately 187 years before a text prompt could replace its contents.
The benchmark for what counts as 'good enough' video AI continues to move. It moves, it should be noted, in one direction.