Text-to-Motion Generation
Designing diffusion-based motion models with stronger semantic alignment, temporal awareness, and frequency-aware supervision.
Undergraduate Researcher in Embodied AI · Shandong University · Qingdao, China
I am an undergraduate researcher at Shandong University working on embodied AI, generative motion models, and robot control. I am interested in a simple but stubborn question: how can we make AI-generated motion not only look good on a screen, but also stand up, stay balanced, and actually move a robot? My work explores diffusion models, flow matching, and dynamics-aligned representations for humanoid motion generation, whole-body control, and cross-modal understanding, aiming to bridge language, motion, and physical execution in the real world.
Designing diffusion-based motion models with stronger semantic alignment, temporal awareness, and frequency-aware supervision.
Studying representation alignment and physically grounded guidance so diffusion models generalize beyond surface statistical shortcuts.
Building efficient radio map reconstruction frameworks with diffusion and flow matching under sparse measurements and noisy conditions.
Developing deployable language-to-motion pipelines for humanoid robots with compact motion representations and edge-cloud orchestration.
Introduces step-aware temporal modulation and late-stage CFG reduction for diffusion motion models, improving semantic alignment and retrieval performance.
View paperDevelops an end-to-end diffusion model in DCT space for higher-resolution image generation with stronger efficiency and spectral interpretability.
View paperProposes REPA-P to align denoising features with physics-aware representations, improving physical consistency and out-of-distribution robustness.
View paperCombines a PINN-based field initializer with a diffusion refiner to reconstruct sparse radio maps accurately under physically constrained settings.
View paperEdge-cloud architecture for language-to-motion humanoid control: cloud diffusion generates 38-DoF motion references; on-device lightweight controller performs closed-loop tracking. Validated in MuJoCo and on real hardware.
View paperProposes DAJI, a hierarchical framework for streaming language-conditioned humanoid control. A future-aware teacher distills Dynamics-Aligned Joint Intent representations encoding support transfer, contact switching, and balance preparation before motion onset. Achieves 94.42% streaming execution success rate on real humanoid hardware.
View paperUses temporal semantic anchors and low-frequency motion anchors to improve deep U-Net alignment, gradient flow, and convergence in diffusion motion synthesis.
View paperIntroduces OTMS and MMMD, two evaluation metrics designed to better correlate text-to-motion quality with human judgment.
View paperDerives a theoretically grounded projection schedule for diffusion inversion, improving reconstruction quality without substantial extra computation.
View paperUses deterministic flow matching for fast radio map construction, reducing parameter count and inference time while maintaining reconstruction quality.
View paperIntroduces frequency-aware consistency supervision to stabilize motion denoising and improve semantic fidelity in diffusion-based text-to-motion generation.
View paperApplies manifold-aware interpolation and dynamic guidance schedules to keep diffusion sampling closer to valid data trajectories.
View paperLimX Dynamics
Hong Kong University of Science and Technology (Guangzhou)
Feel free to leave a message or discuss collaboration ideas.