Haozhe Jia 贾浩哲

Undergraduate Researcher in Embodied AI · Shandong University · Qingdao, China

I am an undergraduate researcher at Shandong University working on embodied AI, generative motion models, and robot control. I am interested in a simple but stubborn question: how can we make AI-generated motion not only look good on a screen, but also stand up, stay balanced, and actually move a robot? My work explores diffusion models, flow matching, and dynamics-aligned representations for humanoid motion generation, whole-body control, and cross-modal understanding, aiming to bridge language, motion, and physical execution in the real world.

Email GitHub Google Scholar

Research

Text-to-Motion Generation

Designing diffusion-based motion models with stronger semantic alignment, temporal awareness, and frequency-aware supervision.

Scientific and Physics-Informed Diffusion

Studying representation alignment and physically grounded guidance so diffusion models generalize beyond surface statistical shortcuts.

Wireless Scene Modeling

Building efficient radio map reconstruction frameworks with diffusion and flow matching under sparse measurements and noisy conditions.

Embodied Control Systems

Developing deployable language-to-motion pipelines for humanoid robots with compact motion representations and edge-cloud orchestration.

Selected Publications

ACM MM 2025 Co-first author 2025.03 - 2025.06

ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model

Introduces step-aware temporal modulation and late-stage CFG reduction for diffusion motion models, improving semantic alignment and retrieval performance.

View paper

ICML 2025 Second co-author 2024.08 - 2024.12

DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space

Develops an end-to-end diffusion model in DCT space for higher-resolution image generation with stronger efficiency and spectral interpretability.

View paper

ICML 2026 First author 2025.10 - 2025.12

Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment

Proposes REPA-P to align denoising features with physics-aware representations, improving physical consistency and out-of-distribution robustness.

View paper

ACM MM 2025 Oral First author 2024.05 - 2024.10

RMDM: Physics-Informed Representation Alignment for Sparse Radio-Map Reconstruction

Combines a PINN-based field initializer with a diffusion refiner to reconstruct sparse radio maps accurately under physically constrained settings.

View paper

Under Review First author 2026.01 - 2026.04

ECHO: Edge-Cloud Humanoid Orchestration for Language-to-Motion Control

Edge-cloud architecture for language-to-motion humanoid control: cloud diffusion generates 38-DoF motion references; on-device lightweight controller performs closed-loop tracking. Validated in MuJoCo and on real hardware.

View paper

Under Review First author 2026.03 - 2026.05

Before the Body Moves: Learning Anticipatory Joint Intent for Language-Conditioned Humanoid Control

Proposes DAJI, a hierarchical framework for streaming language-conditioned humanoid control. A future-aware teacher distills Dynamics-Aligned Joint Intent representations encoding support transfer, contact switching, and balance preparation before motion onset. Achieves 94.42% streaming execution success rate on real humanoid hardware.

View paper

Under review at ECCV First author 2025.06 - 2025.08

LUMA: Low-Dimension Unified Motion Alignment with Dual-Path Anchoring for Text-to-Motion Diffusion Model

Uses temporal semantic anchors and low-frequency motion anchors to improve deep U-Net alignment, gradient flow, and convergence in diffusion motion synthesis.

View paper

WWW 2026 Co-first author 2025.05 - 2025.07

Towards Better Evaluation Metrics for Text-to-Motion Generation

Introduces OTMS and MMMD, two evaluation metrics designed to better correlate text-to-motion quality with human judgment.

View paper

arXiv preprint Collaborating author 2025.03 - 2025.06

POLARIS: Projection-Orthogonal Least Squares for Robust and Adaptive Inversion in Diffusion Models

Derives a theoretically grounded projection schedule for diffusion inversion, improving reconstruction quality without substantial extra computation.

View paper

Under review at TCNN First author 2025.02 - 2025.06

RadioFlow: Efficient Radio Map Construction Framework with Flow Matching

Uses deterministic flow matching for fast radio map construction, reducing parameter count and inference time while maintaining reconstruction quality.

View paper

Under review at ICRA First author 2024.09 - 2025.03

Free-T2M: Frequency Enhanced Text-to-Motion Diffusion Model With Consistency Loss

Introduces frequency-aware consistency supervision to stabilize motion denoising and improve semantic fidelity in diffusion-based text-to-motion generation.

View paper

WWW 2026 Collaborating author 2024.09 - 2025.02

Guided Path Sampling: Steering Diffusion Models Back on Track with Principled Path Guidance

Applies manifold-aware interpolation and dynamic guidance schedules to keep diffusion sampling closer to valid data trajectories.

View paper

Experience

2025.12 - present Beijing, China

Embodied AI Algorithm Intern

LimX Dynamics

Developed ECHO, a language-driven humanoid motion control system with a compact 38-DoF action representation.
Built a cloud-edge streaming pipeline: cloud diffusion generates motion references; on-device lightweight controller performs closed-loop tracking.
Validated in MuJoCo simulation and on real humanoid hardware.

2024.12 - 2025.12 Guangzhou, China

Research Assistant

Hong Kong University of Science and Technology (Guangzhou)

Led research on PhyRMDM, Free-T2M, and LUMA, spanning radio map reconstruction and text-driven human motion generation.
Owned the full pipeline from model selection and training to ablation design; all code open-sourced.
First-author / co-first-author publications at ICML and ACM MM (Oral).

Contact

Email GitHub Google Scholar

Leave a Note

Feel free to leave a message or discuss collaboration ideas.