Steer3D, our feedforward model which injects text steering to pretrained image-to-3D models, is able to edit diverse objects. Below we show Steer3D's predictions on examples from Edit3D-Bench.
Steer3D, despite only trained on synthetic data based on Objaverse assets, can generalize to "in-the-wild" objects, such as objects from iPhone photos or online images.
Steer3D adapts ControlNet to 3D generation, and thus injects text steerability to pretrained image-to-3D models. As shown below, given an image (e.g. of a crab), existing image-to-3D models can generate a 3D crab that looks like the image. Steer3D allows the user to edit the 3D crab with language, such as "replacing its legs with sleek robotic limbs colored silver". The new crab aligns with the editing text, and keeps consistent with the original crab. Steer3D trains on 100k-scale synthetic data generated by our automated data engine. Our data engine combines existing image-to-3D models and vision language models to provide editing pairs that are diverse, consistent, and correct. Both our scalable data engine approach and our data-efficient architecture design help yield a strong editing model.
To facilitate data-efficient training, we design a ControlNet-based architecture to leverage the shape and geometry prior of pretrained image-to-3D models. The architecture is shown below. We design a two-stage training recipe based on flow-matching training and Direct Preference Optimization (DPO) to avoid the trivial local minumum of "no edit". More details can be found in the paper!
We build a data engine to generate synthetic data with a two-stage filter to provide diverse, consistent, and correct editing pairs as our training data. Check out the paper for our scaling analysis that backs up the importance of this scalable data strategy!
@misc{ma2025feedforward3deditingtextsteerable,
title={Feedforward 3D Editing via Text-Steerable Image-to-3D},
author={Ziqi Ma and Hongqiao Chen and Yisong Yue and Georgia Gkioxari},
year={2025},
eprint={2512.13678},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.13678},
}