How to create videos using Magicfit

Magicfit turns your products, photos, and ideas into finished video; ready for ads, social, and your store. This guide covers the four ways to make a video, the reference system that controls how each one looks, and how to get the best result from every model.

If you already know which mode you want, jump straight to its article:

Text to video: describe a scene, get a video; no assets needed
Image to video: bring a still image to life with first‑ and last‑frame control
Avatar to video: a consistent presenter speaks your script, lip‑synced
Video to video: copy the flow of one video, swap in your own product

Quick start

Open the prompt box, or go straight to Generate
Make sure the video tab is selected
Write a prompt
Set your aspect ratio, duration, resolution, and model
Click the Generate icon

Generation usually takes 2-5 minutes. Your video appears in the same window when it's ready, you can also vide it in the library. While you're on the same window, you can queue another generation while the first one renders.

The four ways to make a video

Mode	What you bring	Best for
Text to video	A prompt	Concepts, B‑roll, scenes you don't have footage for
Image to video	One or two images + a prompt	Animating a product shot, before/after transitions
Avatar to video	An avatar + voice + script	Talking‑head UGC, founder messages, testimonials
Video to video	A reference clip + your assets	Recreating an ad's structure with your own product

You can combine images, video, and audio as references inside a single prompt

The reference system: first frame, last frame, and reference image

This is the single most important concept in Magicfit video, and the most common source of confusion. These three things are not interchangeable.

Think of it like giving directions to a driver:

Text to video is telling the driver "take me somewhere nice." The model chooses everything.
A first frame is "start at this exact intersection." Your image becomes the literal opening shot.
First + last frame is "start here, end there, and pick the route between." The model animates from your first image to your last.
A reference image is "this is the kind of place I like." It guides identity, style, or content without being pinned to any single frame.

First frame

The literal opening shot. It locks composition, lighting, and subject. The quality of your first frame directly determines the quality of the whole video; start with a clean, high‑resolution image. Use it when you want the final video clip to begin exactly on your image.

Last frame

The target ending the model interpolates toward. Treat it as a strong directional guide, not a pixel‑perfect lock. Use it for transformations, product before/after, branded outros, and loops. Keep the same scene and subject between your first and last frame. Mismatched subjects cause morphing artifacts, and mismatched lighting causes mid‑clip color shifts.

Reference image

Guides identity, style, or content without being tied to a frame. The model decides how to weave it in. Use it for character consistency across shots, style matching, or holding a product's look. Reference image is available only on specific models only, if you switch models and a mode disappears, that's expected.

Settings, explained

Every setting sits in the prompt box. What's available depends on the model you pick.

Setting	What it does
×1	How many videos to generate. Use + and − to change it. Start with 1–2 to test your prompt before spending credits on variations.
9:16	Aspect ratio. Click to change. Options depend on the model (common: 16:9, 9:16, 1:1, 4:3, 3:4, 21:9).
4s	Duration. Click to change. Ranges depend on the model — most run 4–15 seconds.
720p	Resolution. Click to change. Options depend on the model (up to 1080p on most).
seedance 2.0 fast	Model. Click to switch. This changes which reference modes, durations, and resolutions are available.

Choosing a model

Magicfit is multi‑model, so you can match the engine to the job.

Model	Strengths	Reference modes	Best for
Seedance 2.0 / Fast	Physics, consistency, and the only model that takes audio + video references; native synced audio incl. beat‑matching to an uploaded track	Image + video + audio (via `@`), first/last frame	Reference‑driven ads, music‑synced content, multi‑shot stories
Google Veo 3.1 / Lite	Cinematic realism, strong prompt adherence, synced dialogue	First + last frame or up to 3 reference images (standard only); native audio	Dialogue‑heavy and narrative content, product close‑ups
OpenAI Sora 2	Best‑in‑class physics and realistic motion	Single reference image (not a frame lock)	Hero shots, complex motion
Kling 2.5 Turbo Pro / 2.6 Pro / v3	Character stability and precise start + end frame control; 2.6 adds native audio + lip‑sync	Start + end frame; single reference frame	Transitions, transformations, character animation
Grok Imagine	Lighter and faster	Start‑frame; fewer modes	Quick drafts and budget runs

Quick picks:

Animating a product shot → Kling 3 or Veo 3.1
A video that needs to match a real song's beat → Seedance 2.0
A talking presenter → a model with native audio + lip‑sync (Veo 3.1, Kling 3, Seedance 2.0)
Copying another video's camera and motion → Seedance 2.0

Seedance 2.0 and the `@` system

Seedance 2.0 is the flagship "omni‑reference" model. It's the only one that accepts images, video, and audio together in a single generation, and it generates picture and sound synchronized in one pass.

After uploading your assets, you reference each one by tag in the prompt and give it a job: @Image1 as the first frame, reference @Video1 for camera movement, use @Audio1 for background music

Goal	Prompt pattern
Set the opening shot	`@Image1 as the first frame`
Borrow motion or choreography	`Reference @Video1 for the movement`
Copy camera work	`Follow @Video1's camera movements`
Add music or rhythm	`Use @Audio1 for the background music`
Extend a clip	`Extend @Video1 by 5 seconds`
Swap a subject	`Replace the woman in @Video1 with @Image1`

You can trim and pan to select the exact portion of an audio or video reference before tagging it. The rule of thumb: one job per asset, which image is the subject, which video is the camera, which audio is the mood.

Consistent characters with avatars

For a presenter who looks the same across every video, use avatars.

Go to Avatars
Create avatar
Optionally attach a voice: pick one, or clone your own from a 10‑second recording or upload
In the prompt box, @‑mention the avatar. This automatically adds the avatar and its voice

Full walkthrough in Avatar to Video.

Writing prompts that work

One formula underpins every mode: Subject → Action → Environment → Camera → Style → Constraints. A few rules that apply everywhere:

One primary camera move per shot. Don't ask for a pan and a zoom and an orbit
Separate camera motion from subject motion. "The dancer spins slowly. The camera holds a fixed frame."
One lighting line is the highest‑value addition you can make to any prompt
Add a negative constraint; "avoid jitter, avoid bent limbs."
Avoid the word "fast" It's the single most likely term to degrade quality. Use brisk, quick, or describe the motion instead
Video works best with one or two clear actions. Don't overload a single prompt

Each mode's article has a copy‑paste template tuned for that workflow.

Credits

Credits are spent when you generate; uploading and downloading are free
The exact credit cost is shown in the prompt box before you commit
Video cost scales with model × duration × resolution, plus any references used
Failed generations are refunded automatically
Credits don't roll over between billing cycles
Credits are not refundable

Common mistakes

Overloading the prompt: Too many actions in one generation muddies the result
A low‑quality first frame: The opening image sets the ceiling for the whole video
Mismatched first and last frames: Different subjects morph; different lighting shifts color
Mixing camera and subject motion: Split them into two sentences
Generating five variations before testing one: Test a single output, refine the prompt, then scale