Carnegie Mellon University

Photograph of an arm outstretched, pointing a finger

An Action (Human Motion) Generation from a High-Level Description

By Alexander Hauptmann

The capability of synthesizing realistic human motions is one of the crucial techniques in 3D content creation, which can be useful in many applications such as movies, computer games, pedestrian simulation, etc. Although many generative models have been proposed that can automatically synthesize human motions from some types of inputs, the models require low-level descriptive inputs (e.g., directions at every time-step) or can generate very short motions corresponding to specific singular (atomic) action. Instead, we aim to develop a generative model that can synthesize motions that match to long-lasting complicated actions from a user’s high-level description (e.g., natural language). 

More specifically, given a high-level action, our method will generate synthetic SMPL sequences. The motivation is derived from how humans understand a new action through factorizing it into the combination of known atomic actions. Our proposed method will learn alignments between atomic verb phrases and SMPL sequence clips unsupervisedly and requires no extra temporal annotations. Meanwhile, the application of our model will thus no longer be limited by types of actions. Through factorizing into combinations of atomic actions, the model can generate a synthetic sequence of complicated actions for the first time. We think this method can be widely applied in VR-related fields. For example, it provides a new method to control characters in VR games, it can generate synthetic 3D sequences for other tasks’ training, and it provides a text to video explanation in VR chatting, etc.