Haozhe Jia 贾浩哲

Undergraduate Researcher in Embodied AI · Shandong University · Qingdao, China

I am an undergraduate researcher at Shandong University working on embodied AI, generative motion models, and robot control. I am interested in a simple but stubborn question: how can we make AI-generated motion not only look good on a screen, but also stand up, stay balanced, and actually move a robot? My work explores diffusion models, flow matching, and dynamics-aligned representations for humanoid motion generation, whole-body control, and cross-modal understanding, aiming to bridge language, motion, and physical execution in the real world.

Portrait of Haozhe Jia

Research

Text-to-Motion Generation

Designing diffusion-based motion models with stronger semantic alignment, temporal awareness, and frequency-aware supervision.

Scientific and Physics-Informed Diffusion

Studying representation alignment and physically grounded guidance so diffusion models generalize beyond surface statistical shortcuts.

Wireless Scene Modeling

Building efficient radio map reconstruction frameworks with diffusion and flow matching under sparse measurements and noisy conditions.

Embodied Control Systems

Developing deployable language-to-motion pipelines for humanoid robots with compact motion representations and edge-cloud orchestration.

Selected Publications

ACM MM 2025 Co-first author 2025.03 - 2025.06

ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model

Introduces step-aware temporal modulation and late-stage CFG reduction for diffusion motion models, improving semantic alignment and retrieval performance.

View paper
ICML 2025 Second co-author 2024.08 - 2024.12

DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space

Develops an end-to-end diffusion model in DCT space for higher-resolution image generation with stronger efficiency and spectral interpretability.

View paper
ICML 2026 First author 2025.10 - 2025.12

Learning to Think in Physics: Breaking Shortcut Learning in Scientific Diffusion via Representation Alignment

Proposes REPA-P to align denoising features with physics-aware representations, improving physical consistency and out-of-distribution robustness.

ACM MM 2025 Oral First author 2024.05 - 2024.10

RMDM: Physics-Informed Representation Alignment for Sparse Radio-Map Reconstruction

Combines a PINN-based field initializer with a diffusion refiner to reconstruct sparse radio maps accurately under physically constrained settings.

View paper
Target: IROS First author 2025.10 - present

ECHO: Edge-Cloud Humanoid Orchestration for Language-to-Motion Control

Builds an edge-cloud language-to-motion system where a diffusion generator proposes robot-native trajectories and a lightweight controller tracks them in simulation and hardware.

View paper
Under review at ECCV First author 2025.06 - 2025.08

LUMA: Low-Dimension Unified Motion Alignment with Dual-Path Anchoring for Text-to-Motion Diffusion Model

Uses temporal semantic anchors and low-frequency motion anchors to improve deep U-Net alignment, gradient flow, and convergence in diffusion motion synthesis.

View paper
WWW 2026 Co-first author 2025.05 - 2025.07

Towards Better Evaluation Metrics for Text-to-Motion Generation

Introduces OTMS and MMMD, two evaluation metrics designed to better correlate text-to-motion quality with human judgment.

View paper
arXiv preprint Collaborating author 2025.03 - 2025.06

POLARIS: Projection-Orthogonal Least Squares for Robust and Adaptive Inversion in Diffusion Models

Derives a theoretically grounded projection schedule for diffusion inversion, improving reconstruction quality without substantial extra computation.

View paper
Under review at TCNN First author 2025.02 - 2025.06

RadioFlow: Efficient Radio Map Construction Framework with Flow Matching

Uses deterministic flow matching for fast radio map construction, reducing parameter count and inference time while maintaining reconstruction quality.

View paper
Under review at ICRA First author 2024.09 - 2025.03

Free-T2M: Frequency Enhanced Text-to-Motion Diffusion Model With Consistency Loss

Introduces frequency-aware consistency supervision to stabilize motion denoising and improve semantic fidelity in diffusion-based text-to-motion generation.

View paper
WWW 2026 Collaborating author 2024.09 - 2025.02

Guided Path Sampling: Steering Diffusion Models Back on Track with Principled Path Guidance

Applies manifold-aware interpolation and dynamic guidance schedules to keep diffusion sampling closer to valid data trajectories.

View paper

Experience

Guangzhou, China

Research Assistant

Hong Kong University of Science and Technology (Guangzhou)

  • Leading research and implementation across PhyRMDM, Free-T2M, and LUMA, spanning radio map reconstruction and text-driven human motion generation.
  • Responsible for model selection, training pipelines, and ablation design with a strong focus on reproducibility and open implementation.

Contact