Sections

CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for
Text-Based CAD Editing


1University of Science and Technology of China, 2Microsoft Research

Abstract

Computer-Aided Design (CAD) is indispensable across various industries. Text-based CAD editing, which automates the modification of CAD models based on textual instructions, holds great potential but remains underexplored. Existing methods primarily focus on design variation generation or text-based CAD generation, either lacking support for text-based control or neglecting existing CAD models as constraints. We introduce CAD-Editor, the first framework for text-based CAD editing. To address the challenge of demanding triplet data with accurate correspondence for training, we propose an automated data synthesis pipeline. This pipeline utilizes design variation models to generate pairs of original and edited CAD models and employs Large Vision-Language Models (LVLMs) to summarize their differences into editing instructions. To tackle the composite nature of text-based CAD editing, we propose a locate-then-infill framework that decomposes the task into two focused sub-tasks: locating regions requiring modification and infilling these regions with appropriate edits. Large Language Models (LLMs) serve as the backbone for both sub-tasks, leveraging their capabilities in natural language understanding and CAD knowledge. Experiments show that CAD-Editor achieves superior performance both quantitatively and qualitatively.

Task Formulation



We formulate text-based CAD editing as a seq2seq generation problem. To achieve this, both the editing instruction and the CAD models are represented as sequences of textual tokens. The editing instruction naturally consists of textual tokens. For both the original and edited CAD models, we adopt the sequence format introduced by FlexCAD.

Here are some text-based CAD editing examples achieved by CAD-Editor. Each sub-figure shows the editing instruction at the top, the original CAD model on the left, and the edited CAD model on the right. The rendered image is shown for better comprehension. The actual editing occurs on sketch-and-extrude operations of a CAD model to provide editability and reusability.

Automated Data Synthesis Pipeline



Our data synthesis pipeline comprises three key steps:
  1. Paired CAD Models Generation: We create paired CAD models by starting with an existing CAD model and applying design variation models to create its variations.
  2. Editing Instruction Generation: We generate editing instructions by summarizing the difference between the original CAD models and edited CAD models using LVLMs.
  3. Assembling: Finally, we assemble CAD pairs from the first step and editing instruction from the second step into triplets.

Locate-then-Infill Framework



We decompose text-based CAD editing by explicitly introducing a masked CAD sequence to indicate potential modification regions.
  • Locating Stage: This stage aims to generate a masked CAD sequence, where regions requiring modification are marked by special tokens <mask> while unchanged parts are copied from the original CAD sequence.
  • Infilling Stage: This stage focuses on generating the final edited sequence by precisely filling in the masked regions while preserving the unmodified parts.
    • Improving Performance with Selective Data: To further improve performance, we introduce a selective dataset curated with human annotations.
  • Training and Inference:
    • Training: In the locating stage, we fine-tune LLMs using Low-Rank Adapters (LoRA) with the ground-truth masked CAD sequence constructed via LCS. For the infilling stage, we first fine-tune LLMs using LoRA with the synthetic dataset, then further refine the model by fine-tuning it with LoRA on the selective dataset.
    • Inference: The locating and the infilling stage operates sequentially.

Qualitative Results


Comparison with Baselines


Qualitative results from CAD-Editor, GPT-4o-Basic (providing only an explanation of CAD operation sequences to GPT-4o as prompt) and GPT-4o-IC (providing three in-context examples retrieved based on cosine similarity between editing instructions, in addition to the basic explanation) . The text below shows the editing instruction.

Various Examples

Diverse Instructions

Additional results from CAD-Editor with various editing instructions. In each sub-figure, the left image shows the original CAD model, the right image displays the edited CAD model, and the text below provides the editing instruction.

One Model, Multiple Instructions

Multiple Instructions

Given one CAD model and various instructions, CAD-Editor produces different outcomes.

Same Input, Diverse Outcomes

One to More

Given the same CAD model and instruction, CAD-Editor produces diverse outcomes.

Iterative Editing Process

Continuous Editing

Apply CAD-Editor iteratively to edit a CAD model until it meets user requirements.

Quantitative Results


Method VR ↑ JSD ↓ CD ↓ D-CLIP ↑ H-Eval ↑
SkexGen 74.3 1.94 - - -
Hnc-CAD 77.4 1.77 - - -
FlexCAD 82.1 1.72 - - -
Text2CAD 84.8 2.39 1.91 - -
GPT-4o-Basic 63.2 1.10 2.30 -1.08 7.22
GPT-4o-IC 84.5 0.70 1.55 -0.11 15.6
CAD-Editor 95.6 0.65 1.18 0.11 43.2