GenCAD: Transforming Images into Editable CAD Models with AI-Powered Generation

Creating manufacturable 3D shapes through Computer-Aided Design (CAD) remains a time-consuming, manual process that requires specialized expertise. While most 3D shape generation research focuses on meshes or point clouds, engineering applications demand the editability and manufacturability that only parametric CAD models provide.

GenCAD solves this challenge by introducing the first generative model that transforms images into complete CAD command sequences, producing editable 3D shapes that engineers can modify and manufacture.

The CAD Generation Challenge

Traditional CAD modeling requires engineers to manually create sequences of parametric commands—lines, arcs, circles, and extrusions—that define 3D solid geometry. This process is:

  • Highly manual: Each design requires step-by-step command input
  • Time-intensive: Complex shapes demand lengthy iterative refinement
  • Expertise-dependent: CAD software has steep learning curves

Unlike meshes or voxels, CAD models use boundary representations (B-rep) that encode geometry as parametric surfaces, edges, and vertices. This complexity makes CAD generation significantly more challenging than other 3D representations.

GenCAD’s Four-Step Framework

GenCAD employs a sophisticated four-step approach that treats CAD generation as a language modeling problem:

1. Command Sequence Reconstruction (CSR)

An autoregressive transformer learns to encode and decode CAD command sequences. Each CAD command becomes a 17-dimensional vector containing:

  • Command type (line, arc, circle, extrude)
  • Geometric parameters (coordinates, angles, distances)
  • Boolean operations for solid modeling

2. Contrastive CAD-Image Pre-training (CCIP)

A contrastive learning framework creates joint representations of CAD commands and images using:

  • ResNet-18 image encoder for visual features
  • Frozen CAD encoder from step 1
  • Temperature-scaled cross-entropy loss for alignment

3. CAD Diffusion Prior (CDP)

A conditional latent diffusion model generates CAD representations from image inputs using:

  • ResNet-MLP denoising architecture
  • 500-step diffusion process
  • Image conditioning for user-guided generation

4. CAD Decoder

The pre-trained transformer decoder converts generated latents back into executable CAD command sequences that geometry kernels can process into 3D solids.

Superior Performance Results

GenCAD demonstrates significant improvements across multiple metrics:

CAD Reconstruction Accuracy:

  • Command accuracy: 99.51% vs 99.36% (DeepCAD)
  • Parameter accuracy: 97.78% vs 97.59% (DeepCAD)
  • Lower chamfer distance and invalid shape ratios

Image-Based CAD Retrieval:

  • 98.49% accuracy for 10-item searches
  • 60.77% accuracy for 2048-item searches
  • 15× more accurate than image-to-image search methods

Conditional Generation Quality:

  • Coverage: 81.37% (image) and 82.59% (sketch)
  • Outperforms unconditional models in diversity and fidelity
  • Maintains statistical alignment with ground truth data

Real-World Applications

GenCAD enables several practical engineering workflows:

Reverse Engineering: Convert product images into editable CAD models for analysis or modification

Design Acceleration: Generate initial CAD models from sketches or reference images, reducing manual modeling time

CAD Database Search: Retrieve existing designs from large repositories using visual queries instead of text descriptions

Design Iteration: Create multiple CAD variations from a single input image for rapid prototyping

Implementation Considerations

GenCAD processes CAD commands as fixed-dimensional vectors with quantized parameters, making them compatible with standard neural network architectures. The system uses:

  • Dataset: 168,674 CAD models from the DeepCAD dataset
  • Training: Multi-stage approach with frozen components for efficiency
  • Output: Complete CAD programs executable in commercial software like Onshape

Current Limitations and Future Directions

GenCAD currently supports basic CAD operations (sketches and extrusions) but lacks advanced features like:

  • Revolve operations
  • Edge operations (fillets, chamfers)
  • Complex assembly modeling

The system also requires clean, isometric CAD images rather than real-world photographs with complex backgrounds.

Getting Started

GenCAD represents a significant step toward automating the design-to-production pipeline. The framework’s ability to generate editable CAD models from images opens new possibilities for AI-assisted engineering design.

For developers interested in CAD generation, GenCAD demonstrates how combining transformer architectures with contrastive learning and diffusion models can tackle complex engineering challenges that traditional 3D generation methods cannot address.

The complete CAD programs generated by GenCAD integrate seamlessly with existing CAD workflows, making this approach immediately practical for engineering applications requiring both automation and human editability.