Text-to-Motion Generation
Designing diffusion-based motion models with stronger semantic alignment, temporal awareness, and frequency-aware supervision.
Undergraduate Researcher in Embodied AI · Shandong University · Qingdao, China
I am an undergraduate researcher at Shandong University working on embodied AI, generative motion models, and robot control. I am interested in a simple but stubborn question: how can we make AI-generated motion not only look good on a screen, but also stand up, stay balanced, and actually move a robot? My work explores diffusion models, flow matching, and dynamics-aligned representations for humanoid motion generation, whole-body control, and cross-modal understanding, aiming to bridge language, motion, and physical execution in the real world.
Designing diffusion-based motion models with stronger semantic alignment, temporal awareness, and frequency-aware supervision.
Studying representation alignment and physically grounded guidance so diffusion models generalize beyond surface statistical shortcuts.
Building efficient radio map reconstruction frameworks with diffusion and flow matching under sparse measurements and noisy conditions.
Developing deployable language-to-motion pipelines for humanoid robots with compact motion representations and edge-cloud orchestration.
Introduces step-aware temporal modulation and late-stage CFG reduction for diffusion motion models, improving semantic alignment and retrieval performance.
View paperDevelops an end-to-end diffusion model in DCT space for higher-resolution image generation with stronger efficiency and spectral interpretability.
View paperProposes REPA-P to align denoising features with physics-aware representations, improving physical consistency and out-of-distribution robustness.
Combines a PINN-based field initializer with a diffusion refiner to reconstruct sparse radio maps accurately under physically constrained settings.
View paperBuilds an edge-cloud language-to-motion system where a diffusion generator proposes robot-native trajectories and a lightweight controller tracks them in simulation and hardware.
View paperUses temporal semantic anchors and low-frequency motion anchors to improve deep U-Net alignment, gradient flow, and convergence in diffusion motion synthesis.
View paperIntroduces OTMS and MMMD, two evaluation metrics designed to better correlate text-to-motion quality with human judgment.
View paperDerives a theoretically grounded projection schedule for diffusion inversion, improving reconstruction quality without substantial extra computation.
View paperUses deterministic flow matching for fast radio map construction, reducing parameter count and inference time while maintaining reconstruction quality.
View paperIntroduces frequency-aware consistency supervision to stabilize motion denoising and improve semantic fidelity in diffusion-based text-to-motion generation.
View paperApplies manifold-aware interpolation and dynamic guidance schedules to keep diffusion sampling closer to valid data trajectories.
View paperHong Kong University of Science and Technology (Guangzhou)