Benchmarks
GenManip natively supports a series of benchmarks developed by the GenManip team and community contributors. Below is a detailed introduction to each benchmark.
If you have built your own benchmark based on GenManip, we warmly welcome you to submit an issue to our repository. Please include:
- The name of your benchmark
- A brief description (you can include your project’s external link)
- The corresponding asset links (you can directly modify them in
download_assets.py) - The corresponding config file
- If your benchmark is from a paper, please include the citation information as well.
For all benchmarks, we already support outputting empty actions in client.py to ensure the benchmark runs correctly. These are also important references — you can observe the action output format required by each benchmark through them.
GenManip Scaling Pick-and-Place Benchmark
Section titled “GenManip Scaling Pick-and-Place Benchmark”The GenManip Scaling Pick-and-Place Benchmark evaluates a model’s generalization ability across a large number of objects and tasks. It includes 200 randomly generated scenes using assets from Objaverse, each verified to be executable.
This benchmark is used to verify a model’s consistent performance across cross-scene and cross-object tasks, and is an important metric for evaluating General Manipulation Policies.
python ray_eval_server.py -cfg GenManipSuite/GenManip-Package-OOC_Bench python standalone_tools/client.py --worker_ids 0 --gripper_type robotiq GenManip Tabletop10 Benchmark
Section titled “GenManip Tabletop10 Benchmark”The IROS 2025 Challenge of Multimodal Robot Learning in InternUtopia and Real World is built on top of GenManip, forming the foundation of the Manipulation Track. It also supports the InternManip framework. GenManip natively includes these benchmarks.
python ray_eval_server.py -cfg GenManipSuite/GenManip-Package-TableTop10Aloha python standalone_tools/client.py --worker_ids 0 --gripper_type piper --arm_type aloha