Text to Video | PushOwl Help Center

This is the fastest way to create B‑roll, concept shots, and scenes you don't already have assets for

When to use this mode

You want a scene you don't have footage for a city street, a kitchen, an abstract background
You're generating B‑roll to cut between other clips
You're exploring a concept before committing to a full shoot or product setup

If you already have a product photo you want to animate, use Image to Video instead

Steps

Open the prompt box, or go to 'Generate'
Make sure you've selected video tab
Write a descriptive prompt of the scene you want
Set your aspect ratio, duration, resolution, and model
Click the paper plane icon to generate

Your video appears in the same window in about 2–5 minutes. You can queue another while it renders. Tip: Text to Video renders vertical 1080p with audio on Kling 2.6 Turbo Pro or Google Veo 3.1

Writing the prompt

Text to video has no image to lean on, so your prompt does all the work. Use the full formula:

Subject → Action → Environment → Camera → Style → Constraints

Aim for roughly 60-100 words. Build it in this order:

Name the shot type first: close‑up, wide, tracking, aerial
One clear action, in the present tense
The environment: where it happens, time of day
One camera move: and only one
A lighting line: this is the highest‑value sentence in the whole prompt
A global style line at the end: cinematic, film grain, muted color
A negative constraint: avoid jitter, avoid bent limbs

Template

[Shot type] of [subject] [doing one action] in [environment]. [One camera movement]. [Lighting description]. [Style line]. Avoid [things to exclude].

Example

Wide tracking shot of a runner crossing an empty bridge at dawn. The camera follows alongside at a steady pace. Soft golden light, long shadows, light mist. Cinematic, shallow depth of field, gentle film grain. Avoid jitter, avoid distorted limbs.

Timeline prompting for paced sequences

If you want the shot to evolve over its duration, use timestamps. Add 2–3 timestamps per 5 seconds, with one camera move + one action + one atmospheric detail per beat:

[0s] Close-up of a coffee cup on a wooden table, steam rising. [3s] Camera pulls back slowly to reveal a quiet café at sunrise. [6s] A hand enters frame and lifts the cup. Warm morning light.

Get the best result

One primary camera move only. A pan and a zoom and an orbit in one prompt fights itself.
Separate camera motion from subject motion. "The leaves fall slowly. The camera holds a fixed frame."
Lighting is your biggest lever. One specific lighting sentence does more than ten adjectives.
Avoid the word "fast." It's the term most likely to degrade quality. Use brisk or quick, or describe the motion directly.
Never give contradictory directions — "locked tripod with handheld shake" confuses the model.

Common mistakes

A vague prompt. "A nice video of a city" gives the model nothing to anchor to. Name the shot, the action, the light.
Too many actions. Pick one or two. Sequences belong in timeline prompts, not piled into one sentence.
Mixing camera and subject motion in a single instruction. Split them into two sentences.
Asking for "fast." Describe the motion instead.

Credits

The cost is shown next to the Generate button before you commit. Text‑to‑video cost scales with model × duration × resolution. Failed generations are refunded automatically. Start with one output to test your prompt before generating variations.

Next: Image to Video — for animating a still you already have.