Icon

MAGREF

Masked Guidance for Any-Reference Video Generation

GitHub Project Benchmark

Abstract

Video generation has made substantial strides with the emergence of deep generative models, especially diffusion-based approaches. However, video generation based on multiple reference subjects still faces significant challenges in maintaining multi-subject consistency and ensuring high generation quality. In this paper, we propose MAGREF, a unified framework for any-reference video generation that introduces masked guidance to enable coherent multi-subject video synthesis conditioned on diverse reference images and a textual prompt. Specifically, we propose (1) a region-aware dynamic masking mechanism that enables a single model to flexibly handle various subject inference, including humans, objects, and backgrounds, without architectural changes, and (2) a pixel-wise channel concatenation mechanism that operates on the channel dimension to better preserve appearance features. Our model delivers state-of-the-art video generation quality, generalizing from single-subject training to complex multi-subject scenarios with coherent synthesis and precise control over individual subjects, outperforming existing open-source and commercial baselines. To facilitate evaluation, we also introduce a comprehensive multi-subject video benchmark. Extensive experiments demonstrate the effectiveness of our approach, paving the way for scalable, controllable, and high-fidelity multi-subject video synthesis. We will open-source implementations & checkpoint of MAGREF together with the evaluation benchmark in future.

Single-ID Video Generation
Given a single facial reference, MAGREF generates identity-consistent videos across diverse scenes and actions. Our method ensures high-fidelity facial appearance preservation while animating the subject in alignment with the prompt, enabling vivid storytelling and expressive motion grounded in a single identity.
Loading...
A man standing at an easel, focused intently as his brush dances across the canvas. His expression is one of deep concentration, with a hint of satisfaction as each brushstroke adds color and form. He wears a paint-splattered apron, and his hands move with confident precision.
Loading...
A man with a rugged beard, wearing a leather jacket, riding a vintage motorcycle along a desert highway. His expression is focused, eyes narrowed slightly against the wind, as the setting sun casts a warm glow over the landscape. The highway stretches endlessly, bordered by arid land with occasional cacti and rocky outcrops. The motorcycle roars smoothly, leaving a light trail of dust. In the distance, hazy mountains are silhouetted against the amber sky.
Loading...
A man is hiking through a dense forest. He is wearing a backpack, hiking boots, and a cap, with a light jacket to protect against the wind. The trail is narrow, surrounded by tall trees and thick underbrush. The man pauses to take in the scenery, breathing in the fresh air while the sounds of birds and rustling leaves fill the background.
Loading...
A man sitting at a rustic wooden table in a café, sipping his coffee. He has short black hair and is dressed in a casual button-up shirt and jeans. The café has a warm, cozy atmosphere, with soft lighting and a few patrons in the background. The man seems relaxed as he scrolls through his phone, enjoying the peaceful moment away from the hustle of the day.
Loading...
A girl standing by the beach, her long black hair gently swaying in the breeze. She is wearing a flowy, white sundress and has her hands on her hips. The golden sand stretches out behind her, with the waves softly crashing on the shore. The sun is setting in the background, casting a warm orange glow over the scene.
Loading...
A man is playing an acoustic guitar on a beach at sunset. He is wearing a simple t-shirt, shorts, and sunglasses, with his feet buried in the warm sand. The sun is dipping below the horizon, casting an orange glow across the sky and reflecting off the ocean. The sound of gentle waves adds to the relaxing atmosphere as the man strums a soft tune.
Loading...
A man celebrating his birthday, holding a piece of cake while he prepares to blow out the candles. He is smiling warmly, and as he closes his eyes to make a wish, a hint of emotion crosses his face—perhaps a moment of nostalgia or excitement. Around him, soft, ambient lighting from nearby candles casts a warm glow, and there are a few balloons and decorations in the background, suggesting a cozy celebration at home. He then blows out the candles gently, and his smile grows wider, radiating happiness and satisfaction.
Loading...
A young boy sitting at a table, eating a piece of food. He appears to be enjoying his meal, as he takes a bite and chews it. The boy is wearing a blue shirt and has short hair. The background is dark, with some light coming from the left side of the frame. There is a straw visible on the right side of the frame, suggesting that there may be a drink next to the boy's plate.
Loading...
A boy walking along a city street, filmed in black and white on a classic 35mm camera. His expression is thoughtful, his brow slightly furrowed as if he's lost in contemplation. The film grain adds a textured, timeless quality to the image, evoking a sense of nostalgia. Around him, the cityscape is filled with vintage buildings, cobblestone sidewalks, and softly blurred figures passing by, their outlines faint and indistinct. Streetlights cast a gentle glow, while shadows play across the boy's path, adding depth to the scene.
Loading...
A man sitting on a park bench under a large oak tree, reading a book. He has a beard and is wearing a casual sweater and jeans. The park is quiet and green, with sunlight filtering through the tree branches. The man seems completely absorbed in his book, occasionally glancing up to enjoy the peaceful surroundings.
Loading...
A young man standing outdoors in a snowy park. he is wearing a colorful winter jacket with a floral pattern and a white knit hat. The background shows a snowy landscape with trees, benches, and a metal fence. The ground is covered in snow, and there is a light snowfall in the air. The man appears to be enjoying the winter weather, as he smiles and gives a thumbs-up gesture towards the camera. The overall atmosphere of the video is cheerful and festive, capturing the beauty of a snowy day in a park.
Loading...
A young man who appears to be a content creator or streamer. he is wearing a green sleeveless top and red headphones. The background is illuminated with vibrant neon lights, predominantly in shades of purple and blue, creating a lively and energetic atmosphere. The man is seated in front of a microphone, suggesting he is recording a podcast, streaming a live broadcast, or engaging in some form of online communication. The setting appears to be a well-lit room with a curtain and a lamp visible in the background, adding to the cozy and inviting ambiance.
Loading...
A woman sitting at a red plastic table on a lively Beijing night market street. She is casually dressed and uses plastic gloves to peel spicy crayfish while smiling and chatting with local diners. Lanterns sway above, and the sizzling sounds of street food fill the air. The video captures a surprising cultural crossover, blending global celebrity with authentic local experience.
Loading...
A woman is sitting at a café terrace, enjoying a cup of tea. She is wearing a light blouse and has her hair pulled back into a bun. The table in front of her is set with a small plate of pastries, and she takes a slow sip from her cup, savoring the moment. The bustling city street in the background creates a lively but relaxed atmosphere, as people walk by and cars pass in the distance.
Loading...
A woman jogging along a trail beside a serene lake. She has short, curly hair and is wearing athletic wear and sneakers. The surrounding trees and the shimmering water create a peaceful atmosphere, while the woman maintains a steady pace, focusing on her exercise. The morning sunlight casts a soft glow on the scene, adding to the sense of calm.
Multi-ID Video Generation
MAGREF enables coherent video generation conditioned on multiple distinct reference identities. Given separate facial images for each subject, our framework preserves individual identity features while synthesizing videos with realistic multi-person interactions, consistent expressions, and contextual scene alignment. It supports diverse configurations including two-person pairs, group scenarios, and subject-to-scene integration.
Loading...
Two individuals taking a selfie together in an indoor setting. The man is holding a smartphone with his right hand extended forward, capturing the photo. He is dressed in a light gray blazer over a white shirt. The woman beside him has long blonde hair and is wearing a white top. Both are smiling broadly, appearing cheerful and engaged in the moment. The background suggests they are in a modern office or a similar professional environment, with a mix of neutral tones and greenery visible behind them. The lighting is bright and even, likely from overhead fluorescent lights, which illuminates the scene clearly without harsh shadows.
Loading...
Two men taking a selfie together in an indoor setting. One of them, with a bright and expressive smile, holds the smartphone at arm’s length to frame the shot. He has voluminous, natural-textured hair and appears enthusiastic and energetic. Standing beside him is another man with neatly styled hair and a composed expression, wearing a white athletic jersey with black accents.
Loading...
The video captures a scene inside a modern café with large glass windows overlooking a busy city street. Two men are seated across from each other at a wooden table, engaged in a thoughtful conversation. The man on the left is dressed in a navy blue suit with a crisp white shirt and a vibrant orange tie, complemented by a red and white boutonnière on his lapel. He smiles warmly as he speaks, his expression animated and enthusiastic.
Loading...
A group of three young women enjoying a casual outdoor gathering on a patio. They are seated on chairs and a bench, surrounded by greenery and potted plants, suggesting a relaxed, sunny day. The setting appears to be a residential area with white railings and balconies visible in the background. The women are holding drinks in glass bottles, with one containing a light-colored beverage and another a darker, possibly fruit-based drink. They are clinking their bottles together in a toast, indicating camaraderie and celebration. One woman is lying back on a cushioned chair, using her smartphone, while the others sit upright, smiling and engaging with each other.
Loading...
Three individuals seated outdoors on a bench, engaged with electronic devices. The setting appears to be a park or a similar public space, with trees and greenery visible in the background. The lighting suggests it is daytime, possibly late afternoon given the softness of the light. The person on the left, wearing a green and white checkered shirt, is focused on a smartphone, occasionally gesturing with his hands as if explaining something. The individual in the middle, dressed in a beige sweater, holds a tablet and looks intently at it, occasionally turning his head slightly towards the person next to him.
Loading...
The video captures a lively scene featuring three women dressed in glamorous, sequined outfits, celebrating with gifts. They stand against a backdrop of twinkling fairy lights, creating a festive atmosphere. Each woman holds a wrapped gift box, which they occasionally toss into the air, adding to the celebratory mood. Their movements are energetic; they dance and jump, their hair bouncing with each step. The camera remains static throughout, focusing on capturing the joyful expressions and interactions among the women. The overall setting suggests a party or holiday celebration, with the bright lights and festive attire enhancing the cheerful ambiance.
Loading...
The video depicts an indoor scene where two individuals are engaged in a conversation. The setting appears to be a well-lit room with natural light streaming through large windows. The room is furnished with a wheelchair positioned near the back, suggesting it might be a medical or care facility. A desk with a laptop and some other items is visible on the right side of the frame. The person on the left, dressed in a gray uniform, is holding a tablet and gesturing with her hands while speaking. Her posture indicates she is explaining something, possibly related to the tablet's content.
Loading...
The video captures a cozy indoor scene featuring a couple sitting closely on a beige couch. The woman is holding a tablet and appears engaged with its content, smiling warmly. The man, seated beside her, leans in affectionately, his arm around her shoulders, and gestures towards the tablet screen, possibly discussing something amusing or interesting. Both individuals exhibit relaxed and happy expressions. The background reveals a simple, modern living room setting with neutral tones.
Loading...
The video captures a serene and static scene of two individuals seated on a couch in a cozy living room setting. The person on the left is holding a blue bottle, possibly a beverage, while the individual on the right is holding a white bowl, which could contain snacks. They are both dressed casually, with the person on the left wearing a light-colored top and the one on the right in a red and black checkered shirt. The room is warmly lit by a floor lamp to their left, casting a soft glow that enhances the intimate atmosphere.
Loading...
The video captures a scene set indoors, likely in an art studio or classroom, where two individuals are engaged in a creative activity. The primary focus is on a man seated at an easel, actively painting on a canvas. He is wearing a white shirt and a dark apron splattered with paint, indicating his involvement in the artistic process. His posture suggests concentration as he works on his artwork. A woman stands beside him, leaning slightly forward with her arms around his shoulders. She is dressed in a denim jacket and a patterned dress, her hands resting gently on his shoulders and occasionally gesturing towards the canvas.
Loading...
The video depicts an emotional scene set outdoors, likely in a park or wooded area, given the blurred greenery and trees in the background. The lighting suggests it is daytime, possibly overcast due to the soft shadows. A man and a woman are the central figures. The man, wearing a light blue denim shirt, has his arm around the woman's shoulder, offering comfort. The woman, dressed in a brown leather jacket, appears distressed, covering her face with her hands at one point. Her posture and facial expression suggest she might be crying or overwhelmed by emotion.
Loading...
The video opens with a serene outdoor setting in a forest, featuring a green tent and two individuals seated on a log. The scene is set during the daytime, with clear weather and sunlight filtering through the trees. The individuals are dressed casually, with one person wearing a white tank top and patterned shorts, and the other in a plaid shirt and dark pants.
ID-Object-Enviroment Video Generation
MAGREF enables compositional video synthesis by incorporating multiple reference types—including human identities, objects, and background environments. Our model generates coherent and natural interactions across these modalities, supporting scenarios such as pet companionship, character-object interplay, and contextual scene alignment.
Loading...
A man sitting on the grass in the park, a dog walking around him.
Loading...
A man sitting in the park, a cat walking around his feet.
Loading...
A man sitting in the office, a cat sitting on his legs.
Loading...
A man is standing on the beach, holding a dog in his arms.
Loading...
A man feeding a bird in the park.
Loading...
A man standing in the park watching the flying bird.
Loading...
A man playing with his dog in front of the house.
Loading...
A woman stands on the bridge, wearing a black one-piece swimsuit with the Nike Air logo across the front. The swimsuit fits snugly, and she poses confidently against the backdrop of the bridge, her arms relaxed by her sides.
Loading...
A woman stands in front of a shop, wearing a simple black oversized t-shirt with a small logo on the chest. The t-shirt is paired with casual attire, and she stands confidently, holding nothing in her hands as she gazes around the scene.

Ethics Concerns

All images featured in these demos are either generated by models or sourced from publicly available datasets, and are intended solely for the purpose of demonstrating the technical capabilities of our research. If you believe any content infringes upon rights or raises ethical concerns, please contact us. We will address the issue and remove the material promptly.