Research Scientist @ Google Brain Robotics
PhD Student @ Princeton with Thomas Funkhouser
BA in Math and Computer Science @ UC Berkeley

Github  ●  G. Scholar  ●  LinkedIn  ●  Twitter  ●  CV
Email: andyzeng at google dot com

Recent Work

Dynamic manipulation and interaction.
Can robot learning benefit from explicit physics models? Project Link

Perception and planning for manipulation.
How can robots self-learn complex dexterous skills? Project Link

Industrial applications eg. pick-and-place.
How do we bring deep robot learning technologies to industry? Project Link

News

Research

I am interested in developing algorithms that enable machines to intelligently interact with the physical world and improve themselves over time. My research lies at the intersection of computer vision, robotics, and machine learning. In particular, I work on deep learning for 3D vision and robotic manipulation.

TossingBot: Learning to Throw Arbitrary Objects with Residual Physics

Throwing is an excellent means of exploiting dynamics to increase the capabilities of a manipulator. In the case of pick-and-place for example, throwing can enable a robot arm to rapidly place objects into selected boxes outside its maximum kinematic range — improving its physical reachability and picking speed. In this work, we propose an end-to-end formulation that jointly learns to infer control parameters for grasping and throwing motion primitives from visual observations (images of arbitrary objects in a bin) through trial and error. Within this formulation, we investigate the synergies between grasping and throwing (i.e., learning grasps that enable more accurate throws) and between simulation and deep learning (i.e., using deep networks to predict residuals on top of control parameters predicted by a physics simulator). The resulting system, TossingBot, is able to grasp and throw arbitrary objects into boxes located outside its maximum reach range at 500+ mean picks per hour (600+ grasps per hour with 85% throwing accuracy); and generalizes to new objects and target locations.

Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser
Robotics: Science and Systems (RSS) 2019
Featured on the front page of The New York Times Business!
★ Best Systems Paper Award, RSS ★
Webpage  •   PDF  •   Google AI Blog  •   New York Times  •   IEEE Spectrum

DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions

Through vision and interaction, can robots discover the physical properties of objects? In this work, we propose DensePhysNet, a system that actively executes a sequence of dynamic interactions (e.g., sliding and colliding), and uses a deep predictive model over its visual observations to learn dense pixel-wise representations that reflect the physical properties of observed objects. Our experiments in both simulation and real settings demonstrate that the learned representations carry rich physical information, and can directly be used to decode physical object properties such as friction and mass. The use of dense representations enables DensePhysNet to generalize to novel scenes with more objects than in training. With knowledge of object physics, the learned representations also lead to more accurate and efficient manipulation in downstream tasks than state-of-the-art alternatives.

Zhenjia Xu, Jiajun Wu, Andy Zeng, Joshua B. Tenenbaum, Shuran Song
Robotics: Science and Systems (RSS) 2019
Webpage  •   PDF

Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning

Skilled robotic manipulation benefits from complex synergies between non-prehensile (e.g. pushing) and prehensile (e.g. grasping) actions: pushing can help rearrange cluttered objects to make space for arms and fingers; likewise, grasping can help displace objects to make pushing movements more precise and collision-free. In this work, we demonstrate that it is possible to discover and learn these synergies from scratch by combining visual affordance-based manipulation with model-free deep reinforcement learning. Our method is sample efficient and generalizes to novel objects and scenarios.

Andy Zeng, Shuran Song, Stefan Welker, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser
IEEE International Conference on Intelligent Robots and Systems (IROS) 2018
★ Best Cognitive Robotics Paper Award Finalist, IROS ★
Webpage  •   PDF  •   Code (Github)  •   2 Minute Papers

Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching

We built a robo-picker that can grasp and recognize novel objects (appearing for the first time during testing) in cluttered environments without needing any additional data collection or re-training. It achieves this with pixel-wise affordance-based grasping and one-shot learning to recognize objects using only product images (e.g., from the web). The approach was part of the MIT-Princeton Team system that took 1st place in the stowing task at the 2017 Amazon Robotics Challenge.

Andy Zeng, Shuran Song, Kuan-Ting Yu, Elliott Donlon, Francois R. Hogan, Maria Bauza, Daolin Ma, Orion Taylor, Melody Liu, Eudald Romo, Nima Fazeli, Ferran Alet, Nikhil Chavan Dafle, Rachel Holladay, Isabella Morona, Prem Qu Nair, Druck Green, Ian Taylor, Weber Liu, Thomas Funkhouser, Alberto Rodriguez
IEEE International Conference on Robotics and Automation (ICRA) 2018
★ Best Systems Paper Award, Amazon Robotics ★
Project  •   PDF  •   Code (Github)  •   MIT News  •   Amazon News  •   Engadget

Im2Pano3D: Extrapolating 360° Structure and Semantics Beyond the Field of View

We explore the limits of leveraging strong contextual priors learned from large-scale synthetic and real-world indoor scenes. To this end, we trained a network that can generate a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360° panoramic view of an indoor scene when given only a partial observation (<= 50%) in the form of an RGB-D image -- i.e., it can infer what's behind you.

Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018
★ Oral Presentation, CVPR ★
Project  •   PDF

Matterport3D: Learning from RGB-D Data in Indoor Environments

We introduce Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided with surface reconstructions, camera poses, and 2D and 3D semantic segmentations. The precise global alignment and comprehensive, diverse panoramic set of views over entire buildings enable a variety of supervised and self-supervised computer vision tasks, including keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and scene classification.

Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, Yinda Zhang
IEEE International Conference on 3D Vision (3DV) 2017
Project  •   PDF  •   Code (Github)  •   Matterport Blog

3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions

We present a data-driven model that learns a local 3D shape descriptor for establishing correspondences between partial and noisy 3D/RGB-D data. To amass training data for our model, we propose an unsupervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions. Our learned descriptor is not only able to match local geometry in new scenes for reconstruction, but also generalize to different tasks and spatial scales (e.g. instance-level object model alignment for the Amazon Picking Challenge, and mesh surface correspondence).

Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, Thomas Funkhouser
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
★ Oral Presentation, CVPR ★
Project  •   PDF  •   Code (Github)  •   Talk  •   2 Minute Papers

Semantic Scene Completion from a Single Depth Image

We present an end-to-end model that is capable of inferring a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation. To train our model, we construct SUNCG -- a manually created large-scale dataset of synthetic 3D scenes with dense volumetric annotations.

Shuran Song, Fisher Yu, Andy Zeng, Angel X. Chang, Manolis Savva, Thomas Funkhouser
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
★ Oral Presentation, CVPR ★
Project  •   PDF  •   SUNCG Dataset  •   Code (Github)  •   Talk  •   2 Minute Papers

Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge

We developed a vision system that can recognize objects and estimate their 6D poses under cluttered environments, partial data, sensor noise, multiple instances of the same object, and a large variety of object categories. Our approach leverages fully convolutional networks to segment and label multiple RGB-D views of a scene, then fits pre-scanned 3D object models to the resulting segmentation to estimate their poses. We also propose a scalable self-supervised method that leverages precise and repeatable robot motions to generate a large labeled dataset without tedious manual annotations. The approach was part of the MIT-Princeton Team system that took 3rd place at the 2016 Amazon Picking Challenge.

Andy Zeng, Kuan-Ting Yu, Shuran Song, Daniel Suo, Ed Walker Jr., Alberto Rodriguez, Jianxiong Xiao
IEEE International Conference on Robotics and Automation (ICRA) 2017
Project  •   PDF  •   Shelf & Tote Dataset  •   Code (Github)

Honors

My research has been graciously funded by