This is the fastest way to create B‑roll, concept shots, and scenes you don't already have assets for
When to use this mode
You want a scene you don't have footage for a city street, a kitchen, an abstract background
You're generating B‑roll to cut between other clips
You're exploring a concept before committing to a full shoot or product setup
If you already have a product photo you want to animate, use Image to Video instead
Steps
Open the prompt box, or go to 'Generate'
Make sure you've selected video tab
Write a descriptive prompt of the scene you want
Set your aspect ratio, duration, resolution, and model
Click the paper plane icon to generate
Your video appears in the same window in about 2–5 minutes. You can queue another while it renders. Tip: Text to Video renders vertical 1080p with audio on Kling 2.6 Turbo Pro or Google Veo 3.1
Writing the prompt
Text to video has no image to lean on, so your prompt does all the work. Use the full formula:
Subject → Action → Environment → Camera → Style → Constraints
Aim for roughly 60-100 words. Build it in this order:
Name the shot type first: close‑up, wide, tracking, aerial
One clear action, in the present tense
The environment: where it happens, time of day
One camera move: and only one
A lighting line: this is the highest‑value sentence in the whole prompt
A global style line at the end: cinematic, film grain, muted color
A negative constraint: avoid jitter, avoid bent limbs
Template
[Shot type] of [subject] [doing one action] in [environment]. [One camera movement]. [Lighting description]. [Style line]. Avoid [things to exclude].
Example
Wide tracking shot of a runner crossing an empty bridge at dawn. The camera follows alongside at a steady pace. Soft golden light, long shadows, light mist. Cinematic, shallow depth of field, gentle film grain. Avoid jitter, avoid distorted limbs.
Timeline prompting for paced sequences
If you want the shot to evolve over its duration, use timestamps. Add 2–3 timestamps per 5 seconds, with one camera move + one action + one atmospheric detail per beat:
[0s] Close-up of a coffee cup on a wooden table, steam rising. [3s] Camera pulls back slowly to reveal a quiet café at sunrise. [6s] A hand enters frame and lifts the cup. Warm morning light.
Get the best result
One primary camera move only. A pan and a zoom and an orbit in one prompt fights itself.
Separate camera motion from subject motion. "The leaves fall slowly. The camera holds a fixed frame."
Lighting is your biggest lever. One specific lighting sentence does more than ten adjectives.
Avoid the word "fast." It's the term most likely to degrade quality. Use brisk or quick, or describe the motion directly.
Never give contradictory directions — "locked tripod with handheld shake" confuses the model.
Common mistakes
A vague prompt. "A nice video of a city" gives the model nothing to anchor to. Name the shot, the action, the light.
Too many actions. Pick one or two. Sequences belong in timeline prompts, not piled into one sentence.
Mixing camera and subject motion in a single instruction. Split them into two sentences.
Asking for "fast." Describe the motion instead.
Credits
The cost is shown next to the Generate button before you commit. Text‑to‑video cost scales with model × duration × resolution. Failed generations are refunded automatically. Start with one output to test your prompt before generating variations.
Next: Image to Video — for animating a still you already have.
