“The next wave of AI is robotics and one of the most exciting developments is humanoid robots,” said Jensen Huang, founder and CEO of Nvidia. “We’re advancing the entire Nvidia robotics stack, opening access for worldwide humanoid developers and companies to use the platforms, acceleration libraries and AI models best suited for their needs.”
Among the offerings are new Nvidia NIM microservices and frameworks for robot simulation and learning, the Nvidia Osmo orchestration service for running multi-stage robotics workloads, and an AI- and simulation-enabled teleoperation workflow that allows developers to train robots using small amounts of human demonstration data.
The MimicGen NIM microservice generates synthetic motion data based on recorded teleoperated data from spatial computing devices like Apple Vision Pro. The Robocasa NIM microservice generates robot tasks and simulation-ready environments in OpenUSD, a universal framework for developing and collaborating within 3D worlds.
NVIDIA Osmo is a cloud-native managed service that allows users to orchestrate and scale complex robotics development workflows across distributed computing resources, whether on premises or in the cloud. According to Nvidia, Osmo simplifies robot training and simulation workflows, cutting deployment and development cycle times from months to less than a week. Users can visualize and manage a range of tasks, such as generating synthetic data, training models, conducting reinforcement learning and implementing software-in-the-loop testing at scale for humanoids, autonomous mobile robots and industrial manipulators.
An NVIDIA AI- and Omniverse-enabled teleoperation reference workflow allows researchers and AI developers to generate massive amounts of synthetic motion and perception data from a minimal amount of remotely captured human demonstrations. With this approach, Nvidia is seeking to minimize the costs and time typically required for teleoperation. This is seen as a key step in humanoid robot development as the building of training foundation models for humanoid robots requires an incredible amount of data.
To reduce teleoperation time and costs, developers can use Apple Vision Pro to capture a small number of teleoperated demonstrations. Then, they simulate the recordings in Nvidia Isaac Sim and use the MimicGen NIM microservice to generate synthetic datasets from the recordings.
The developers train the Project GR00T humanoid foundation model with real and synthetic data, enabling developers to save time and reduce costs. Following this step, they use the Robocasa NIM microservice in Isaac Lab, a framework for robot learning, to generate experiences to retrain the robot model. Throughout the workflow, NVIDIA Osmo reportedly assigns computing jobs to different resources, saving the developers weeks of administrative tasks.