Icon

MAGREF

Masked Guidance for Any-Reference Video Generation with Subject Disentanglement

Intelligent Creation, ByteDance

GitHub Research Paper Project Hugging Face Video

Episodes Created from MAGREF-generated Videos

Turn on the sound for a better viewing experience

Abstract

We tackle the task of any-reference video generation, which aims to synthesize videos conditioned on arbitrary types and combinations of reference subjects, together with textual prompts. This task faces persistent challenges, including identity inconsistency, entanglement among multiple reference subjects, and copy-paste artifacts. To address these issues, we introduce MAGREF, a unified and effective framework for any-reference video generation. Our approach incorporates masked guidance and a subject disentanglement mechanism, enabling flexible synthesis conditioned on diverse reference images and textual prompts. Specifically, masked guidance employs a region-aware masking mechanism combined with pixel-wise channel concatenation to preserve appearance features of multiple subjects along the channel dimension. This design preserves identity consistency and maintains the capabilities of the pre-trained backbone, without requiring any architectural changes. To mitigate subject confusion, we introduce a subject disentanglement mechanism which injects the semantic values of each subject derived from the text condition into its corresponding visual region. Additionally, we establish a four-stage data pipeline to construct diverse training pairs, effectively alleviating copy-paste artifacts. Extensive experiments on a comprehensive benchmark demonstrate that MAGREF consistently outperforms existing state-of-the-art approaches, paving the way for scalable, controllable, and high-fidelity any-reference video synthesis.

Single-ID Video Generation
Given a single facial reference, MAGREF generates identity-consistent videos across diverse scenes and actions. Our method ensures high-fidelity facial appearance preservation while animating the subject in alignment with the prompt, enabling vivid storytelling and expressive motion grounded in a single identity.
Loading...
A man standing at an easel, focused intently as his brush dances across the canvas. His expression is one of deep concentration, with a hint of satisfaction as each brushstroke adds color and form. He wears a paint-splattered apron, and his hands move with confident precision.
Loading...
A man sitting at a rustic wooden table in a café, sipping his coffee. He has short black hair and is dressed in a casual button-up shirt and jeans. The café has a warm, cozy atmosphere, with soft lighting and a few patrons in the background. The man seems relaxed as he scrolls through his phone, enjoying the peaceful moment away from the hustle of the day.
Loading...
A girl standing by the beach, her long black hair gently swaying in the breeze. She is wearing a flowy, white sundress and has her hands on her hips. The golden sand stretches out behind her, with the waves softly crashing on the shore. The sun is setting in the background, casting a warm orange glow over the scene.
Loading...
A man is playing an acoustic guitar on a beach at sunset. He is wearing a simple t-shirt, shorts, and sunglasses, with his feet buried in the warm sand. The sun is dipping below the horizon, casting an orange glow across the sky and reflecting off the ocean. The sound of gentle waves adds to the relaxing atmosphere as the man strums a soft tune.
Loading...
A young boy sitting at a table, eating a piece of food. He appears to be enjoying his meal, as he takes a bite and chews it. The boy is wearing a blue shirt and has short hair. The background is dark, with some light coming from the left side of the frame. There is a straw visible on the right side of the frame, suggesting that there may be a drink next to the boy's plate.
Loading...
A young man standing outdoors in a snowy park. he is wearing a colorful winter jacket with a floral pattern and a white knit hat. The background shows a snowy landscape with trees, benches, and a metal fence. The ground is covered in snow, and there is a light snowfall in the air. The man appears to be enjoying the winter weather, as he smiles and gives a thumbs-up gesture towards the camera. The overall atmosphere of the video is cheerful and festive, capturing the beauty of a snowy day in a park.
Loading...
A young man who appears to be a content creator or streamer. he is wearing a green sleeveless top and red headphones. The background is illuminated with vibrant neon lights, predominantly in shades of purple and blue, creating a lively and energetic atmosphere. The man is seated in front of a microphone, suggesting he is recording a podcast, streaming a live broadcast, or engaging in some form of online communication. The setting appears to be a well-lit room with a curtain and a lamp visible in the background, adding to the cozy and inviting ambiance.
Loading...
A woman sitting at a red plastic table on a lively Beijing night market street. She is casually dressed and uses plastic gloves to peel spicy crayfish while smiling and chatting with local diners. Lanterns sway above, and the sizzling sounds of street food fill the air. The video captures a surprising cultural crossover, blending global celebrity with authentic local experience.
Loading...
A woman jogging along a trail beside a serene lake. She has short, curly hair and is wearing athletic wear and sneakers. The surrounding trees and the shimmering water create a peaceful atmosphere, while the woman maintains a steady pace, focusing on her exercise. The morning sunlight casts a soft glow on the scene, adding to the sense of calm.
ID-Object Video Generation
MAGREF enables coherent video generation conditioned on a combination of a reference identity and various objects like clothing, hats, eyewear, jewelry, and necklaces. Given separate images of each subject, our framework preserves the unique identity features while synthesizing videos with consistent expressions and accurate contextual scene alignment. It supports diverse configurations, including subject-object interactions, where each subject is integrated with different accessories or environments in a coherent video sequence.
Loading...
A young woman stands against a gray wall, wearing a pink ribbed dress with short sleeves. She has long dark hair tied back and is wearing round sunglasses with gold frames. Her hand rests on her head as she looks off to the side. She wears a white handbag with blue floral embroidery and a silver chain strap.
Loading...
A man stands outdoors with a mountainous background. He wears a white tank top, light blue jeans, and a dark cap. His hands are in his pockets, and he has a relaxed smile. The top cloth is a beige Burberry hoodie with a drawstring hood.
Loading...
A young woman sitting on rocks by the sea, wearing a mint green sheer blouse with a high neckline and long sleeves. The blouse has a ruffled collar and a tie at the neck. The background is a serene blue ocean under a clear sky.
Loading...
A close-up portrait of a young woman's face and upper body. She is wearing a beige beret, a white shirt with a button-down collar, and a beige cardigan. Her hair is styled in loose waves, and she is wearing large, dangling earrings with a floral design and green accents. The background is blurred, focusing attention on her face and the earrings.
Loading...
A close-up portrait of a young woman's face and upper body. She is wearing a black turtleneck sweater with intricate floral embroidery on the shoulders. Her hair is styled in a braided headband adorned with sparkling embellishments. She is wearing a green sheer veil with glittering accents. The woman is looking directly at the camera with a slight smile. The background is a dark teal wall with yellow and black abstract designs. The woman is wearing pink-tinted sunglasses with gold frames.
Loading...
A close-up portrait of a young woman's face and upper body. She is wearing a light blue shirt with a button-down collar and long sleeves. Her hair is styled in loose waves, and she is wearing large hoop earrings. The woman is looking directly at the camera with a neutral expression. She wears black-framed glasses on her face.
Loading...
A young man stands in an urban setting with a blurred building in the background. He wears a red jacket with white stripes on the shoulders over a gray t-shirt. His curly hair frames his face as he looks directly at the camera with a neutral expression. He wears small diamond earrings. The hat from the second image is worn on his head, featuring a camouflage pattern with the Adidas logo prominently displayed.
Loading...
A close-up portrait of a young woman's face and upper body. She is wearing a yellow off-the-shoulder top with a thin gold chain necklace around her neck. Her hair is styled in loose waves and she is wearing large white sunglasses. The woman is smiling with her head slightly tilted to the side. The background is blurred, showing a cityscape with buildings and vehicles. The necklace reads "Angel.
Loading...
A close-up portrait of a young woman leaning on a railing. She is wearing a black leather jacket with a high collar and long sleeves. Her hair is dark and falls over her shoulders. The woman is looking directly at the camera with a neutral expression. The background is blurred, showing an outdoor setting with greenery and a soft light. The woman wears a gold ring with a large emerald stone surrounded by smaller diamonds.
Multi-ID Video Generation
MAGREF enables coherent video generation conditioned on multiple distinct reference identities. Given separate facial images for each subject, our framework preserves individual identity features while synthesizing videos with realistic multi-person interactions, consistent expressions, and contextual scene alignment. It supports diverse configurations including two-person pairs, group scenarios, and subject-to-scene integration.
Loading...
Two individuals taking a selfie together in an indoor setting. The man is holding a smartphone with his right hand extended forward, capturing the photo. He is dressed in a light gray blazer over a white shirt. The woman beside him has long blonde hair and is wearing a white top. Both are smiling broadly, appearing cheerful and engaged in the moment. The background suggests they are in a modern office or a similar professional environment, with a mix of neutral tones and greenery visible behind them. The lighting is bright and even, likely from overhead fluorescent lights, which illuminates the scene clearly without harsh shadows.
Loading...
Two men taking a selfie together in an indoor setting. One of them, with a bright and expressive smile, holds the smartphone at arm’s length to frame the shot. He has voluminous, natural-textured hair and appears enthusiastic and energetic. Standing beside him is another man with neatly styled hair and a composed expression, wearing a white athletic jersey with black accents.
Loading...
The video captures a scene inside a modern café with large glass windows overlooking a busy city street. Two men are seated across from each other at a wooden table, engaged in a thoughtful conversation. The man on the left is dressed in a navy blue suit with a crisp white shirt and a vibrant orange tie, complemented by a red and white boutonnière on his lapel. He smiles warmly as he speaks, his expression animated and enthusiastic.
Loading...
A group of three young women enjoying a casual outdoor gathering on a patio. They are seated on chairs and a bench, surrounded by greenery and potted plants, suggesting a relaxed, sunny day. The setting appears to be a residential area with white railings and balconies visible in the background. The women are holding drinks in glass bottles, with one containing a light-colored beverage and another a darker, possibly fruit-based drink. They are clinking their bottles together in a toast, indicating camaraderie and celebration. One woman is lying back on a cushioned chair, using her smartphone, while the others sit upright, smiling and engaging with each other.
Loading...
The video captures a cozy indoor scene featuring a couple sitting closely on a beige couch. The woman is holding a tablet and appears engaged with its content, smiling warmly. The man, seated beside her, leans in affectionately, his arm around her shoulders, and gestures towards the tablet screen, possibly discussing something amusing or interesting. Both individuals exhibit relaxed and happy expressions. The background reveals a simple, modern living room setting with neutral tones.
Loading...
The video captures a serene and static scene of two individuals seated on a couch in a cozy living room setting. The person on the left is holding a blue bottle, possibly a beverage, while the individual on the right is holding a white bowl, which could contain snacks. They are both dressed casually, with the person on the left wearing a light-colored top and the one on the right in a red and black checkered shirt. The room is warmly lit by a floor lamp to their left, casting a soft glow that enhances the intimate atmosphere.
Loading...
The video captures a scene set indoors, likely in an art studio or classroom, where two individuals are engaged in a creative activity. The primary focus is on a man seated at an easel, actively painting on a canvas. He is wearing a white shirt and a dark apron splattered with paint, indicating his involvement in the artistic process. His posture suggests concentration as he works on his artwork. A woman stands beside him, leaning slightly forward with her arms around his shoulders. She is dressed in a denim jacket and a patterned dress, her hands resting gently on his shoulders and occasionally gesturing towards the canvas.
Loading...
The video depicts an emotional scene set outdoors, likely in a park or wooded area, given the blurred greenery and trees in the background. The lighting suggests it is daytime, possibly overcast due to the soft shadows. A man and a woman are the central figures. The man, wearing a light blue denim shirt, has his arm around the woman's shoulder, offering comfort. The woman, dressed in a brown leather jacket, appears distressed, covering her face with her hands at one point. Her posture and facial expression suggest she might be crying or overwhelmed by emotion.
Loading...
The video opens with a serene outdoor setting in a forest, featuring a green tent and two individuals seated on a log. The scene is set during the daytime, with clear weather and sunlight filtering through the trees. The individuals are dressed casually, with one person wearing a white tank top and patterned shorts, and the other in a plaid shirt and dark pants.
ID-Object-Enviroment Video Generation
MAGREF enables compositional video synthesis by incorporating multiple reference types—including human identities, objects, and background environments. Our model generates coherent and natural interactions across these modalities, supporting scenarios such as pet companionship, character-object interplay, and contextual scene alignment.
Loading...
A man sitting on the grass in the park, a dog walking around him.
Loading...
A man sitting in the park, a cat walking around his feet.
Loading...
A man sitting in the office, a cat sitting on his legs.
Loading...
A man is standing on the beach, holding a dog in his arms.
Loading...
A man feeding a bird in the park.
Loading...
A man standing in the park watching the flying bird.
Loading...
A man playing with his dog in front of the house.
Loading...
A woman stands on the bridge, wearing a black one-piece swimsuit with the Nike Air logo across the front. The swimsuit fits snugly, and she poses confidently against the backdrop of the bridge, her arms relaxed by her sides.
Loading...
A woman stands in front of a shop, wearing a simple black oversized t-shirt with a small logo on the chest. The t-shirt is paired with casual attire, and she stands confidently, holding nothing in her hands as she gazes around the scene.

Ethics Concerns

All images featured in these demos are either generated by models or sourced from publicly available datasets, and are intended solely for the purpose of demonstrating the technical capabilities of our research. If you believe any content infringes upon rights or raises ethical concerns, please contact us at dengyufan10@stu.pku.edu.cn, we will address the issue and remove the material promptly.

BibTeX

@article{deng2025magref,
        title={MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement},
        author={Deng, Yufan and Guo, Xun and Yin, Yuanyang and Fang, Jacob Zhiyuan and Yang, Yiding and Wang, Yizhi and Yuan, Shenghai and Wang, Angtian and Liu, Bo and Huang, Haibin and others},
        journal={arXiv preprint arXiv:2505.23742},
        year={2025}
        }