Benchmarks

GenManip natively supports a series of benchmarks developed by the GenManip team and community contributors. Below is a detailed introduction to each benchmark.

If you have built your own benchmark based on GenManip, we warmly welcome you to submit an issue to our repository. Please include:

The name of your benchmark
A brief description (you can include your project’s external link)
The corresponding asset links (you can directly modify them in download_assets.py)
The corresponding config file
If your benchmark is from a paper, please include the citation information as well.

For all benchmarks, we already support outputting empty actions in client.py to ensure the benchmark runs correctly. These are also important references — you can observe the action output format required by each benchmark through them.

GenManip Scaling Pick-and-Place Benchmark

The GenManip Scaling Pick-and-Place Benchmark evaluates a model’s generalization ability across a large number of objects and tasks. It includes 200 randomly generated scenes using assets from Objaverse, each verified to be executable.

This benchmark is used to verify a model’s consistent performance across cross-scene and cross-object tasks, and is an important metric for evaluating General Manipulation Policies.

Package ID

Quick Start

Start Server

python ray_eval_server.py -cfg GenManipSuite/GenManip-Package-OOC_Bench

Run Evaluation

python standalone_tools/client.py --worker_ids 0 --gripper_type robotiq

Training Dataset: Axi404/GenManip-Dataset-OOC_Bench

Citations

GenManip: Scaling Data-Driven Robot Manipulation with Large-Scale Simulation and Generative Models

GenManip Team

CVPR, 2025

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

InternRobotics Team

TechReport, 2025

GenManip Tabletop10 Benchmark

The IROS 2025 Challenge of Multimodal Robot Learning in InternUtopia and Real World is built on top of GenManip, forming the foundation of the Manipulation Track. It also supports the InternManip framework. GenManip natively includes these benchmarks.