Skip to main content

How to create videos using Magicfit

Create engaging video content using the latest models

Written by Magicfit

Magicfit turns your products, photos, and ideas into finished video; ready for ads, social, and your store. This guide covers the four ways to make a video, the reference system that controls how each one looks, and how to get the best result from every model.

If you already know which mode you want, jump straight to its article:

  • Text to video: describe a scene, get a video; no assets needed

  • Image to video: bring a still image to life with first‑ and last‑frame control

  • Avatar to video: a consistent presenter speaks your script, lip‑synced

  • Video to video: copy the flow of one video, swap in your own product


Quick start

  1. Open the prompt box, or go straight to Generate

  2. Make sure the video tab is selected

  3. Write a prompt

  4. Set your aspect ratio, duration, resolution, and model

  5. Click the Generate icon

Generation usually takes 2-5 minutes. Your video appears in the same window when it's ready, you can also vide it in the library. While you're on the same window, you can queue another generation while the first one renders.


The four ways to make a video

Mode

What you bring

Best for

Text to video

A prompt

Concepts, B‑roll, scenes you don't have footage for

Image to video

One or two images + a prompt

Animating a product shot, before/after transitions

Avatar to video

An avatar + voice + script

Talking‑head UGC, founder messages, testimonials

Video to video

A reference clip + your assets

Recreating an ad's structure with your own product

You can combine images, video, and audio as references inside a single prompt


The reference system: first frame, last frame, and reference image

This is the single most important concept in Magicfit video, and the most common source of confusion. These three things are not interchangeable.

Think of it like giving directions to a driver:

  • Text to video is telling the driver "take me somewhere nice." The model chooses everything.

  • A first frame is "start at this exact intersection." Your image becomes the literal opening shot.

  • First + last frame is "start here, end there, and pick the route between." The model animates from your first image to your last.

  • A reference image is "this is the kind of place I like." It guides identity, style, or content without being pinned to any single frame.

First frame

The literal opening shot. It locks composition, lighting, and subject. The quality of your first frame directly determines the quality of the whole video; start with a clean, high‑resolution image. Use it when you want the final video clip to begin exactly on your image.

Last frame

The target ending the model interpolates toward. Treat it as a strong directional guide, not a pixel‑perfect lock. Use it for transformations, product before/after, branded outros, and loops. Keep the same scene and subject between your first and last frame. Mismatched subjects cause morphing artifacts, and mismatched lighting causes mid‑clip color shifts.

Reference image

Guides identity, style, or content without being tied to a frame. The model decides how to weave it in. Use it for character consistency across shots, style matching, or holding a product's look. Reference image is available only on specific models only, if you switch models and a mode disappears, that's expected.


Settings, explained

Every setting sits in the prompt box. What's available depends on the model you pick.

Setting

What it does

×1

How many videos to generate. Use + and to change it. Start with 1–2 to test your prompt before spending credits on variations.

9:16

Aspect ratio. Click to change. Options depend on the model (common: 16:9, 9:16, 1:1, 4:3, 3:4, 21:9).

4s

Duration. Click to change. Ranges depend on the model — most run 4–15 seconds.

720p

Resolution. Click to change. Options depend on the model (up to 1080p on most).

seedance 2.0 fast

Model. Click to switch. This changes which reference modes, durations, and resolutions are available.


Choosing a model

Magicfit is multi‑model, so you can match the engine to the job.

Model

Strengths

Reference modes

Best for

Seedance 2.0 / Fast

Physics, consistency, and the only model that takes audio + video references; native synced audio incl. beat‑matching to an uploaded track

Image + video + audio (via @), first/last frame

Reference‑driven ads, music‑synced content, multi‑shot stories

Google Veo 3.1 / Lite

Cinematic realism, strong prompt adherence, synced dialogue

First + last frame or up to 3 reference images (standard only); native audio

Dialogue‑heavy and narrative content, product close‑ups

OpenAI Sora 2

Best‑in‑class physics and realistic motion

Single reference image (not a frame lock)

Hero shots, complex motion

Kling 2.5 Turbo Pro / 2.6 Pro / v3

Character stability and precise start + end frame control; 2.6 adds native audio + lip‑sync

Start + end frame; single reference frame

Transitions, transformations, character animation

Grok Imagine

Lighter and faster

Start‑frame; fewer modes

Quick drafts and budget runs

Quick picks:

  • Animating a product shot → Kling 3 or Veo 3.1

  • A video that needs to match a real song's beat → Seedance 2.0

  • A talking presenter → a model with native audio + lip‑sync (Veo 3.1, Kling 3, Seedance 2.0)

  • Copying another video's camera and motion → Seedance 2.0


Seedance 2.0 and the @ system

Seedance 2.0 is the flagship "omni‑reference" model. It's the only one that accepts images, video, and audio together in a single generation, and it generates picture and sound synchronized in one pass.

After uploading your assets, you reference each one by tag in the prompt and give it a job: @Image1 as the first frame, reference @Video1 for camera movement, use @Audio1 for background music

Goal

Prompt pattern

Set the opening shot

@Image1 as the first frame

Borrow motion or choreography

Reference @Video1 for the movement

Copy camera work

Follow @Video1's camera movements

Add music or rhythm

Use @Audio1 for the background music

Extend a clip

Extend @Video1 by 5 seconds

Swap a subject

Replace the woman in @Video1 with @Image1

You can trim and pan to select the exact portion of an audio or video reference before tagging it. The rule of thumb: one job per asset, which image is the subject, which video is the camera, which audio is the mood.


Consistent characters with avatars

For a presenter who looks the same across every video, use avatars.

  1. Go to Avatars

  2. Create avatar

  3. Optionally attach a voice: pick one, or clone your own from a 10‑second recording or upload

  4. In the prompt box, @‑mention the avatar. This automatically adds the avatar and its voice

Full walkthrough in Avatar to Video.


Writing prompts that work

One formula underpins every mode: Subject → Action → Environment → Camera → Style → Constraints. A few rules that apply everywhere:

  • One primary camera move per shot. Don't ask for a pan and a zoom and an orbit

  • Separate camera motion from subject motion. "The dancer spins slowly. The camera holds a fixed frame."

  • One lighting line is the highest‑value addition you can make to any prompt

  • Add a negative constraint; "avoid jitter, avoid bent limbs."

  • Avoid the word "fast" It's the single most likely term to degrade quality. Use brisk, quick, or describe the motion instead

  • Video works best with one or two clear actions. Don't overload a single prompt

Each mode's article has a copy‑paste template tuned for that workflow.


Credits

  • Credits are spent when you generate; uploading and downloading are free

  • The exact credit cost is shown in the prompt box before you commit

  • Video cost scales with model × duration × resolution, plus any references used

  • Failed generations are refunded automatically

  • Credits don't roll over between billing cycles

  • Credits are not refundable


Common mistakes

  • Overloading the prompt: Too many actions in one generation muddies the result

  • A low‑quality first frame: The opening image sets the ceiling for the whole video

  • Mismatched first and last frames: Different subjects morph; different lighting shifts color

  • Mixing camera and subject motion: Split them into two sentences

  • Generating five variations before testing one: Test a single output, refine the prompt, then scale

Did this answer your question?