Magicfit turns your products, photos, and ideas into finished video; ready for ads, social, and your store. This guide covers the four ways to make a video, the reference system that controls how each one looks, and how to get the best result from every model.
If you already know which mode you want, jump straight to its article:
Text to video: describe a scene, get a video; no assets needed
Image to video: bring a still image to life with first‑ and last‑frame control
Avatar to video: a consistent presenter speaks your script, lip‑synced
Video to video: copy the flow of one video, swap in your own product
Quick start
Open the prompt box, or go straight to Generate
Make sure the video tab is selected
Write a prompt
Set your aspect ratio, duration, resolution, and model
Click the Generate icon
Generation usually takes 2-5 minutes. Your video appears in the same window when it's ready, you can also vide it in the library. While you're on the same window, you can queue another generation while the first one renders.
The four ways to make a video
Mode | What you bring | Best for |
Text to video | A prompt | Concepts, B‑roll, scenes you don't have footage for |
Image to video | One or two images + a prompt | Animating a product shot, before/after transitions |
Avatar to video | An avatar + voice + script | Talking‑head UGC, founder messages, testimonials |
Video to video | A reference clip + your assets | Recreating an ad's structure with your own product |
You can combine images, video, and audio as references inside a single prompt
The reference system: first frame, last frame, and reference image
This is the single most important concept in Magicfit video, and the most common source of confusion. These three things are not interchangeable.
Think of it like giving directions to a driver:
Text to video is telling the driver "take me somewhere nice." The model chooses everything.
A first frame is "start at this exact intersection." Your image becomes the literal opening shot.
First + last frame is "start here, end there, and pick the route between." The model animates from your first image to your last.
A reference image is "this is the kind of place I like." It guides identity, style, or content without being pinned to any single frame.
First frame
The literal opening shot. It locks composition, lighting, and subject. The quality of your first frame directly determines the quality of the whole video; start with a clean, high‑resolution image. Use it when you want the final video clip to begin exactly on your image.
Last frame
The target ending the model interpolates toward. Treat it as a strong directional guide, not a pixel‑perfect lock. Use it for transformations, product before/after, branded outros, and loops. Keep the same scene and subject between your first and last frame. Mismatched subjects cause morphing artifacts, and mismatched lighting causes mid‑clip color shifts.
Reference image
Guides identity, style, or content without being tied to a frame. The model decides how to weave it in. Use it for character consistency across shots, style matching, or holding a product's look. Reference image is available only on specific models only, if you switch models and a mode disappears, that's expected.
Settings, explained
Every setting sits in the prompt box. What's available depends on the model you pick.
Setting | What it does |
×1 | How many videos to generate. Use + and − to change it. Start with 1–2 to test your prompt before spending credits on variations. |
9:16 | Aspect ratio. Click to change. Options depend on the model (common: 16:9, 9:16, 1:1, 4:3, 3:4, 21:9). |
4s | Duration. Click to change. Ranges depend on the model — most run 4–15 seconds. |
720p | Resolution. Click to change. Options depend on the model (up to 1080p on most). |
seedance 2.0 fast | Model. Click to switch. This changes which reference modes, durations, and resolutions are available. |
Choosing a model
Magicfit is multi‑model, so you can match the engine to the job.
Model | Strengths | Reference modes | Best for |
Seedance 2.0 / Fast | Physics, consistency, and the only model that takes audio + video references; native synced audio incl. beat‑matching to an uploaded track | Image + video + audio (via | Reference‑driven ads, music‑synced content, multi‑shot stories |
Google Veo 3.1 / Lite | Cinematic realism, strong prompt adherence, synced dialogue | First + last frame or up to 3 reference images (standard only); native audio | Dialogue‑heavy and narrative content, product close‑ups |
OpenAI Sora 2 | Best‑in‑class physics and realistic motion | Single reference image (not a frame lock) | Hero shots, complex motion |
Kling 2.5 Turbo Pro / 2.6 Pro / v3 | Character stability and precise start + end frame control; 2.6 adds native audio + lip‑sync | Start + end frame; single reference frame | Transitions, transformations, character animation |
Grok Imagine | Lighter and faster | Start‑frame; fewer modes | Quick drafts and budget runs |
Quick picks:
Animating a product shot → Kling 3 or Veo 3.1
A video that needs to match a real song's beat → Seedance 2.0
A talking presenter → a model with native audio + lip‑sync (Veo 3.1, Kling 3, Seedance 2.0)
Copying another video's camera and motion → Seedance 2.0
Seedance 2.0 and the @ system
Seedance 2.0 is the flagship "omni‑reference" model. It's the only one that accepts images, video, and audio together in a single generation, and it generates picture and sound synchronized in one pass.
After uploading your assets, you reference each one by tag in the prompt and give it a job: @Image1 as the first frame, reference @Video1 for camera movement, use @Audio1 for background music
Goal | Prompt pattern |
Set the opening shot |
|
Borrow motion or choreography |
|
Copy camera work |
|
Add music or rhythm |
|
Extend a clip |
|
Swap a subject |
|
You can trim and pan to select the exact portion of an audio or video reference before tagging it. The rule of thumb: one job per asset, which image is the subject, which video is the camera, which audio is the mood.
Consistent characters with avatars
For a presenter who looks the same across every video, use avatars.
Go to Avatars
Create avatar
Optionally attach a voice: pick one, or clone your own from a 10‑second recording or upload
In the prompt box,
@‑mention the avatar. This automatically adds the avatar and its voice
Full walkthrough in Avatar to Video.
Writing prompts that work
One formula underpins every mode: Subject → Action → Environment → Camera → Style → Constraints. A few rules that apply everywhere:
One primary camera move per shot. Don't ask for a pan and a zoom and an orbit
Separate camera motion from subject motion. "The dancer spins slowly. The camera holds a fixed frame."
One lighting line is the highest‑value addition you can make to any prompt
Add a negative constraint; "avoid jitter, avoid bent limbs."
Avoid the word "fast" It's the single most likely term to degrade quality. Use brisk, quick, or describe the motion instead
Video works best with one or two clear actions. Don't overload a single prompt
Each mode's article has a copy‑paste template tuned for that workflow.
Credits
Credits are spent when you generate; uploading and downloading are free
The exact credit cost is shown in the prompt box before you commit
Video cost scales with model × duration × resolution, plus any references used
Failed generations are refunded automatically
Credits don't roll over between billing cycles
Credits are not refundable
Common mistakes
Overloading the prompt: Too many actions in one generation muddies the result
A low‑quality first frame: The opening image sets the ceiling for the whole video
Mismatched first and last frames: Different subjects morph; different lighting shifts color
Mixing camera and subject motion: Split them into two sentences
Generating five variations before testing one: Test a single output, refine the prompt, then scale
