Why this experiment?
At Terranoha, our approach is driven by curiosity and a hands-on exploration of emerging technologies. When Google DeepMind released Veo 3, its new experimental video generation tool powered by AI, we wanted to understand how it truly performs — beyond the theoretical demos.
Our goal was simple:
- Test what Veo 3 can actually deliver today
- Identify its technical limitations in a professional context
- Explore how to produce a short professional video using only text-based prompts
To do this, we chose a concrete use case: creating a video introducing Emmie, our virtual agent dedicated to trading workflows, in a realistic environment.
What Veo 3 Promises
On paper, Veo 3 offers impressive capabilities:
- 1080p video generation from text prompts
- Multiple visual styles (cinematic, animated, documentary…)
- Temporal and visual consistency
- Voice-over integration via audio/text prompts
It appears to be a promising solution for creating innovative video content.
What We Actually Encountered
During our deep dive into Veo 3, we encountered several significant limitations for our professional use case:
- Severely limited length: Max 8 seconds per sequence, forcing the video to be artificially fragmented
- Voice-over sync issues: Audio sometimes failed to generate despite accurate prompts
- Subtitle inconsistencies: Despite Google’s recent updates, we continued to face recurring errors
- Prompt variability: Even with highly detailed descriptions of Emmie, her face varied significantly between sequences, disrupting visual consistency
- Inconsistent voice: Despite identical instructions, Emmie’s voice tone often changed, affecting auditory coherence
- Unrealistic generations: Several outputs had visual oddities (unnatural expressions, odd angles, strange movements), requiring multiple re-renders to get usable clips
- High experimentation costs: Veo 3 uses Google Cloud credits. 20,000 credits cost $200. One 8-second video consumes ~100 credits (around $1 per 8 seconds). A full experiment can add up quickly.
These concrete constraints highlight that Veo 3 remains experimental and not yet suited for demanding professional video production.
Our Methodology
Here’s how we optimized our use of Veo 3:
- Use “Veo 3 quality”
Include this phrase in every prompt for optimal rendering. - Ultra-detailed character identity
Describe characters with extreme precision (appearance, outfit, demeanor…). ChatGPT can help refine these descriptions. - Highly specific environment
Every element of the scene must be defined: style, objects, lighting, mood. Every detail counts. - Scene direction
Provide exact instructions for movement and interaction to minimize misinterpretation. - Short, clear dialogue
With the 8-second limit, each line must be concise and time-efficient. - Always revise scripts after poor outputs
If the result is subpar, tweak the wording. Repeating the same prompt often yields worse results.
Our Scripts and Prompts
You can download the full scripts and prompts we used for this experiment:
DOWNLOAD FULL SCRIPTS
Final Result
Despite the limitations, our process allowed us to create a video that aligns with our original vision for Emmie: professional, smooth, visually coherent, and tailored to the trading environment.
Conclusion & Outlook
This experiment with Google Veo 3 gave us deeper insights into the current capabilities — and limits — of AI-driven video generation. While still experimental and imperfect, Veo 3 offers a promising glimpse into the future of intelligent video creation.
We’ll continue exploring these emerging technologies to further enhance the user experience powered by Emmie.