Absolute Coordinates Make Motion Generation Easy

Zichong Meng, Zeyu Han, Xiaogang Peng, Yiming Xie, Huaizu Jiang

Northeastern University

Abstract

State-of-the-art text-to-motion generation models rely on the kinematic-aware, local-relative motion representation popularized by HumanML3D, which encodes motion relative to the pelvis and to the previous frame with built-in redundancy. While this design simplifies training for earlier generation models, it introduces critical limitations for diffusion models and hinders applicability to downstream tasks. In this work, we revisit the motion representation and propose a radically simplified and long-abandoned alternative for text-to-motion generation: absolute joint coordinates in global space. Through systematic analysis of design choices, we show that this formulation achieves significantly higher motion fidelity, improved text alignment, and strong scalability, even with a simple Transformer backbone and no auxiliary kinematic-aware losses. Moreover, our formulation naturally supports downstream tasks such as text-driven motion control and temporal/spatial editing without additional task-specific reengineering and costly classifier guidance generation from control signals. Finally, we demonstrate promising generalization to directly generate SMPL-H mesh vertices in motion from text, laying a strong foundation for future research and motion-related applications.

Architecture Overview

Model and Patch Size Scalability

----------------------------------------------------------------------------------------------------------------------------------

Text-to-Motion Generation

----------------------------------------------------------------------------------------------------------------------------------

Text-to-Motion Generation Gallery

(Our method is capable of generating high-quality, textual instruction-following 3D human motions)

The person puts something on its side and then brings it back to normal.

A person walks forward slowly , their arms swinging slightly, then they turn around.

A person stands still, and then one quick step forward.

An individual takes a long slow drink of something.

A person jumps in the air , then abruptly stumbles to his left as if he had been pushed, and finally he regains his balance.

A person runs to their right , then left, then right again, and finally walk abck to their starting position.

A person is bent over forward and moves their body left to right like a snake several times.

A person is sitting down , using a phone with their hands and puts it up to their ear.

A person who is standing with his hands by his sides turns to the left as he takes four steps and stops.

Comparing to Other Text-to-Motion Methods

(Our method generates motion that is more realistic and more accurately follows the fine details of the textual condition)

man steps to the right steps in a small counterclockwise circle, throws right arm then steps in a larger backward counterclockwise circle.

MotionLCM V2

Ours

MLD++

MDM

MARDM

A person stands on one legs in yoga pose.

MotionLCM V2

Ours

MLD++

MDM

MARDM

A person climbs a ladder.

MotionLCM V2

Ours

MLD++

MDM

MARDM

Person turns to left, takes three steps forward, sits down, and then walks back to starting place.

MotionLCM V2

Ours

MLD++

MDM

MARDM

A person walks forward, stepping up with their right leg and down with their left, then turns to their left and walks, then turns to their left and starts stepping up.

MotionLCM V2

Ours

MLD++

MDM

MARDM

----------------------------------------------------------------------------------------------------------------------------------

Text Driven Controllable Motion Generation

----------------------------------------------------------------------------------------------------------------------------------

Comparing to Other Text Driven Controllable Motion Generation Methods

(Our method generates motion much faster (2.51 second) and near-flawlessly follows the user-provided controling signal)

A person slowly walks in an s shape while shifting weight between each leg.

Pelvis

Ours + ControlNet

MotionLCM V2 + ControlNet

OmniControl

Left Foot

Ours + ControlNet

MotionLCM V2 + ControlNet

OmniControl

Right Foot

Ours + ControlNet

MotionLCM V2 + ControlNet

OmniControl

Head

Ours + ControlNet

MotionLCM V2 + ControlNet

OmniControl

Left Wrist

Ours + ControlNet

MotionLCM V2 + ControlNet

OmniControl

Right Wrist

Ours + ControlNet

MotionLCM V2 + ControlNet

OmniControl

Text Driven Controllable Motion Generation Gallery

(Our method is capable of generating high-quality 3D human motions following textual instruction and control signals)

A person slowly walks forward with arms swinging, turns in a clockwise circle.

Person walks quickly down a short incline.

A person takes eight steps forming a complete circle.

A person shifts rightwards and then shifts back.

The toon is walking acroos the plane at a diagonal pattern, reaching the end of the plane & turning around.

A person walks in a curve to the left.

This person stumbles right and left while moving forward.

A person carefully stepping backwards.

A person picks something up with each hand and then stacks the item from their left hand on top of the item their right hand.

A person jogging in place.

The person is moving from side to side.

A person jumped on the place.

Spatial Editing

(Our method is capable of spatially editing 3D human motions)

Original: A person bends to collect something, turns and goes back.
Editing Upper Body: A person holds his hands up.

Original: A person is sitting on a chair.
Editing Lower Body: A person performs a kick.

----------------------------------------------------------------------------------------------------------------------------------

Direct Text-to-SMPL-H Mesh Vertices Motion Generation

----------------------------------------------------------------------------------------------------------------------------------

Benefits of Direct Text-to-SMPL-H Mesh Vertices Motion Generation

(Our direct SMPL-H mesh generation method produces more realistic SMPL-H mesh vertices motions and better captures
natural human movement compared to generated-joints-to-SMPL-H meshes through a SMPL fitting model approach)

A person is sitting.

Better Motion, No Self-Penatration,
Better Hands Movement,
Implicitly Modeled DMPLs (See Belly Flesh Movements).

Worse Motion, Self-Penatration,
Unnatural Hands Movement,
Does Not Model DMPLs.

A person walks backwards.

Better Motion, No Jittering Head,
More Natural Hand Movements.

Worse Motion, Jittering Head,
Unnatural Hand Movements.

Direct Text-to-SMPL-H Mesh Vertices Motion Generation Gallery

(Our method is capable of directly generating high-quality, textual instruction-following SMPL-H mesh vertices motions)

Man extends right arm direcly in front of him, moves it in front of his body from left to right and back down.

A person steps forward and leans over; they grad a cup with their left hand and empty it before putting it down and stepping back to their original position.

A person who is standing with his hands at this sides reaches down to his right, picks up something, moves to his left and places it down and retun to his standing position.

A person raises his left hand to shoulder height and raises his right hand and impersonates strumming a guitar

A person is pushed hard to their left and they recover into a standing position.

A man switches his standing position towards the right and then towards the left.

A person stands up from a sit down position, then sits back down.

A person is walking upstairs in a straight line.

A person walks forward the turns to the left.

BibTeX


@article{meng2025absolute,
  title={Absolute Coordinates Make Motion Generation Easy},
  author={Meng, Zichong and Han, Zeyu and Peng, Xiaogang and Xie, Yiming and Jiang, Huaizu},
  journal={arXiv preprint arXiv:2505.19377},
  year={2025}
}