Character-driven video from references
Wan 2.7 Reference to Video is a latest-generation AI video model that transforms your reference images, videos, and text prompts into stunning, coherent video content. Designed for creators who need to bring characters and scenes to life with consistency and cinematic quality, this model excels at generating videos that faithfully preserve the appearance of subjects you provide — whether that's a specific character, an object, or even a stylized look drawn from your own visual references.
At its core, this model solves one of the most challenging problems in AI video generation: maintaining visual identity across frames and shots. By uploading reference images or videos of your characters and objects, you give the model a clear visual anchor. Combine that with a descriptive text prompt, and Wan 2.7 produces videos with enhanced motion smoothness, superior scene fidelity, and greater visual coherence than previous generations. The result is video content that feels intentional and polished — not random or inconsistent.
Who Is This For?
Wan 2.7 Reference to Video is built for a wide range of creative professionals. Filmmakers and video producers can use it to rapidly prototype scenes, pre-visualize storyboards, or generate supplementary footage featuring consistent characters. Animators and motion designers can leverage reference images to maintain a character's look across multiple generated clips. Content creators working on social media, music videos, or branded content can produce stylized, eye-catching video from just a handful of reference materials and a written description. Concept artists and designers can explore how their still artwork might translate into motion, testing cinematic ideas before committing to a full production pipeline.
What You Can Create
The model generates video up to 1080p resolution, giving you crisp, high-definition output suitable for professional use. You can also choose 720p if you prefer faster iteration or smaller file sizes. Videos can range from 2 to 10 seconds in duration, making it easy to generate anything from a quick motion snippet to a more developed scene.
One of the standout features is support for multiple aspect ratios. You can generate widescreen 16:9 videos ideal for cinematic and YouTube-style content, vertical 9:16 videos perfect for social platforms like TikTok and Instagram Reels, square 1:1 formats for social media posts, or 4:3 and 3:4 ratios for more traditional or portrait-oriented compositions. This flexibility means you can tailor your output to any platform or creative context without cropping or reformatting.
Reference-Driven Generation
What sets this model apart is its reference-driven approach. You can upload one or more reference images to define the appearance of characters or objects in your video. Need two distinct characters interacting in a scene? Provide separate reference images for each, and the model handles multi-subject generation. You can also supply reference videos, which inform both the appearance and the motion style of your subjects. This is incredibly powerful for maintaining continuity — imagine generating multiple clips of the same character in different settings, all looking consistent.
The model also supports tags for stylized transformations and lip sync capabilities, opening up creative possibilities for character animation and dialogue-driven scenes.
Creative Controls
Your primary creative tool is the text prompt, which can be up to 5,000 characters long — giving you ample room to describe complex scenes, moods, camera movements, and narrative details. You can also use a negative prompt (up to 500 characters) to steer the model away from undesirable qualities, such as low resolution, visual artifacts, or specific styles you want to avoid.
A particularly exciting feature is multi-shot mode. When enabled, the model intelligently segments your video into multiple shots rather than producing a single continuous take. This is ideal for creating narrative sequences or dynamic edits that feel more like professionally cut footage. When left off, you get a smooth, uninterrupted single shot — perfect for establishing shots, character reveals, or flowing motion pieces.
For projects that require reproducibility, a seed value lets you lock in specific results. If you generate a video you love and want to recreate it exactly — or make slight prompt adjustments while keeping the same visual foundation — using the same seed ensures consistent output. This is invaluable for iterative creative workflows where you're refining a concept step by step.
Quality and Coherence
Wan 2.7 represents a generational leap in AI video quality. The documentation highlights three core strengths: enhanced motion smoothness, meaning characters and objects move naturally without jittering or unnatural transitions; superior scene fidelity, ensuring that the environments and settings you describe are rendered with accuracy and detail; and greater visual coherence, so elements within your video maintain their appearance and spatial relationships from frame to frame.
Content Safety
The model includes a built-in content moderation system that is enabled by default, screening both your inputs and the generated output. This helps ensure that the content you create stays within appropriate boundaries.
Practical Considerations
When working with reference images, each file can be up to 20 MB, while reference videos can be up to 100 MB each. These generous limits mean you can provide high-quality source material without heavy compression. Keep in mind that the model works best when your text prompt clearly describes the scene you want, and your reference materials provide clean, well-lit depictions of the subjects you want to feature.
Whether you're building a character-driven narrative, generating stylized social content, prototyping cinematic sequences, or exploring motion design concepts, Wan 2.7 Reference to Video gives you a powerful, flexible tool to turn your creative vision into moving imagery with remarkable consistency and quality.
A woman kneeling in darkness, illuminated by a warm, radiant beam of light emerging from her raised hand.
Beskriv videoscenen din med bevegelse, kameravinkler og stemning
Modellen lager filmatisk bevegelse med naturlig fysikk og lys
Last ned og del din produksjonsklare video
Bytt til resonneringsstyrt syntese i dag

Fast, high-quality text-to-video
2.1 kreditter
![Kling Video v3 Text to Video [Standard]](/marketing-assets/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8cfc9f%2Fdei5OqFRB9HK8AgSHwk8f_9a5eea197b3045d1be55aedb0213f6f9.jpg&w=3840&q=75)
Cinematic text-to-video with audio
4.2 kreditter
![Kling Video v3 Text to Video [Pro]](/marketing-assets/_next/image?url=https%3A%2F%2Fv3b.fal.media%2Ffiles%2Fb%2F0a8cfd13%2Ft6TSkWzl6cFAzvO1PCdDu_f38263f637d245929f03881454951540.jpg&w=3840&q=75)
Cinematic video, fluid motion, audio
4 kreditter

Fast cinematic video with audio
0.1 kreditter

Cinematic video with native audio
1.4 kreditter

Fast balanced text-to-video generation
1.6 kreditter

High-quality, fast video generation
2 kreditter

Film-grade video with audio
0.1 kreditter

Stylish text-to-video generation
0.1 kreditter