Teaching AI-Generated Scenes To Obey Physics
Thursday, March 19, 2026

The Breakdown
- PAT3D generates 3D scenes from text prompts that remain stable and realistic under physical forces like gravity.
- The system significantly reduces the time required to manually design complex virtual environments.
- PAT3D creates simulation-ready environments that can be used for applications such as video game design and robotics training.
***
Researchers in Carnegie Mellon University's School of Computer Science have developed a new scene-generation framework that creates physically realistic, simulation-ready 3D scenes from text prompts.
Physics-Augmented Text-to-3D Scene (PAT3D) creates final products that not only look convincing but also behave correctly under real-world physical forces. The work could significantly reduce the amount of time it takes to create training simulation scenes for applications like robotics and video game design.
"PAT3D moves beyond purely visual generation by creating scenes that remain stable when interacted with under physical forces such as gravity and object contact," said Guying Lin, a Ph.D. student in CMU's Computer Science Department (CSD). "This technique makes the generated environments more useful and realistic for applications such as video game design, robotics training and simulation-based research."
Many artificial intelligence systems will instantly generate a 3D environment based on short scene descriptions that users supply: a stack of colorful blocks, a toothbrush resting in a cup or a basket filled with fruit. But those scenes often don't follow the rules of physics, like gravity. Objects may fuse together or float unnaturally in midair, while some scenes may collapse altogether.
PAT3D uses both large language and visual language models to turn a simple text prompt into a fully constructed, physically viable 3D scene. First, the large language models generate a visually plausible draft of the scene based on the user's description. Next, a visual language model analyzes the scene and extracts relationships between objects, like which objects support others or how they should be positioned relative to each other.
These relationships are then evaluated by a physics simulator. This "physics teacher" checks whether the scene follows real-world physical constraints. Based on its observations, the simulator adjusts object placements until the environment becomes physically plausible and stable. The final generated scene is then physically grounded, interactive and ready for multiple applications.
The system currently takes anywhere from 10 to 30 minutes to generate a scene, with exact times depending on the number of objects and the complexity of the prompt. This automated process is significantly faster than traditional 3D scene design, which can take designers days or weeks to build manually.
"During my undergraduate studies, I spent a lot of time manually designing game scenes, so I'm very familiar with how extensive the process can be," Lin said. "Now, during my Ph.D., I'm excited that technology like this can automate parts of that work while also benefiting fields like robotics. Since robots are often trained in simulated environments before being deployed in the real world, those simulations need to closely match physics in real life. PAT3D does just that."
The research was advised by CSD Assistant Professor Minchen Li and Jun-Yan Zhu, the Michael B. Donohue Assistant Professor of Computer Science and Robotics. Along with Lin, Li and Zhu, the PAT3D research team included Michael Liu, a CSD doctoral research assistant; Ruihan Gao, a Robotics Institute doctoral student; Hanke Chen, an SCS undergraduate student; Lyuhao Chen, an Electrical and Computer Engineering Department master's student; and Beijia Lu, a Robotics Institute master's student. Researchers from the University of Hong Kong and the Hong Kong University of Science and Technology were also involved, including Kemeng Huang, Taku Komura and Yuan Liu.
PAT3D was accepted to the 2026 International Conference on Learning Representations. Read the paper to learn more about the project.
Media Contact
Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu