Avatar to Video | PushOwl Help Center

Avatar to Video is how you make talking‑head UGC, founder messages, and testimonials where the same face and voice show up in every video.

When to use this mode

Aa spokesperson or creator who looks identical across a whole campaign
UGC‑style content; testimonials, product explainers, founder updates
You want a script spoken aloud, lip‑synced, without filming anyone

Step 1 - Create your avatar

Go to Avatars
Click Create avatar
(Optional) Attach a voice. Pick one, or clone your own.

Cloning your own voice

To create an avatar

Click on 'New avatar'
Write a prompt and let AI generate or Upload upto five hi-res images
Whenever you're happy with your selection, click on 'Create avatar'

Cloning your own voice

Upload a snippet, or record directly in the app; 10 seconds minimum
Attach the cloned voice to your custom avatar

Once attached, the voice is pre‑selected automatically whenever you use that avatar to create videos or UGC content

Step 2 - Generate the video

In the prompt box, @‑mention your avatar. This automatically adds both the avatar and its attached voice
Write or paste your script
Set your aspect ratio, duration, and model. Use a model with native audio and lip‑sync — Veo 3.1, Kling 2.6, or Seedance 2.0
Click the paper plane to generate

Writing a script that lands

Avatar video lives and dies on the script and the framing. Keep both tight.

Structure

Open with a hook in the first 2 seconds. This is the thumb‑stopper; lead with the payoff, not a slow intro.
Tie each beat to a visible on‑screen action. Don't just talk; show.
Close with a soft, creator‑style CTA; "I'll leave this here" lands better than a hard "link in bio."

Formatting the script for lip‑sync

Put dialogue in quotes. This triggers automatic lip‑sync
Assign each line to one speaker, one voice per line
Add emotion and pace cues in brackets — [whispering], [excited], [slowly]

Example

[warm, conversational] "Okay, I have to show you this. I've been using it every morning for two weeks." [holds product up to camera] "And honestly? I'm not going back." [sets it down, small smile] "I'll leave this right here for you."

Get the best lip‑sync

Lip‑sync is strongest under controlled conditions:

One character at a time. Multiple speakers split the model's attention
Short sentences. Long run‑ons drift out of sync
Stable, locked framing. Remove big head turns, extreme camera moves, and busy hand gestures while mouth accuracy matters
A front‑facing or 3/4 portrait, well lit. Avoid side profiles — the model needs to see the mouth
Clean audio at a natural pace
Match the language of your audio to your written dialogue

The avatar's portrait actually shapes the mouth movements, so the face and the audio are linked. A clear, front‑facing avatar gives noticeably better sync.

Assembling a multi‑scene UGC video

Magicfit can stitch several avatar scenes into one finished video. The assembly tool takes:

Up to 3 green‑screen avatar videos (MP4): the green screen is chroma‑keyed out automatically
Up to 3 product images (JPG/PNG): composited in behind the avatar
A voiceover or music track (MP3)

You can set the overlay position and the avatar scale (0.5×–2×), then the tool concatenates the three scenes into a single video.

Common mistakes

A side‑profile avatar portrait. The AI model can't see the mouth
Noisy or rushed audio. Clean, natural‑pace audio syncs far better
Long, complex sentences. Short lines stay in sync
Busy framing. Big head turns and hand gestures break mouth accuracy
A hard sales CTA. Soft, creator‑style closes outperform "link in bio."

Credits

The cost is shown next to the Generate button before you commit, and scales with model × duration × resolution. Failed generations are refunded automatically. Voice cloning is a one‑time setup per avatar — after that, the voice is reused at no extra setup cost.