Abstract

Despite the power of large language models (LLMs) in various cross-modal generation tasks, their ability to generate 3D computer-aided design (CAD) models from text remains underexplored due to the scarcity of suitable datasets. Additionally, there is a lack of multimodal CAD datasets that include both reconstruction parameters and text descriptions, which are essential for the quantitative evaluation of the CAD generation capabilities of multimodal LLMs. To address these challenges, we developed a dataset of CAD models, sketches, and image data for representative mechanical components such as gears, shafts, and springs, along with natural language descriptions collected via Amazon Mechanical Turk. By using CAD programs as a bridge, we facilitate the conversion of textual output from LLMs into precise 3D CAD designs. To enhance the text-to-CAD generation capabilities of GPT models and demonstrate the utility of our dataset, we developed a pipeline to generate fine-tuning training data for GPT-3.5. We fine-tuned four GPT-3.5 models with various data sampling strategies based on the length of a CAD program. We evaluated these models using parsing rate and intersection over union (IoU) metrics, comparing their performance to that of GPT-4 without fine-tuning. The new knowledge gained from the comparative study on the four different fine-tuned models provided us with guidance on the selection of sampling strategies to build training datasets in fine-tuning practices of LLMs for text-to-CAD generation, considering the trade-off between part complexity, model performance, and cost.

References

1.
Brown
,
T. B.
,
Mann
,
B.
,
Ryder
,
N.
,
Subbiah
,
M.
,
Kaplan
,
J.
,
Dhariwal
,
P.
,
Neelakantan
,
A.
, et al
,
2020
, “
Language Models are Few-Shot Learners
,”
NIPS'20: Proceedings of the 34th International Conference on Neural Information Processing Systems
, Article No. 159, pp.
1877
1901
. https://dl.acm.org/doi/abs/10.5555/3495724.3495883
2.
Kasneci
,
E.
,
Sessler
,
K.
,
Küchemann
,
S.
,
Bannert
,
M.
,
Dementieva
,
D.
,
Fischer
,
F.
,
Gasser
,
U.
, et al
,
2023
, “
ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education
,”
Learning and Individual Differences
,
103
, p.
102274
.
3.
Kocaballi
,
A. B.
,
2023
, “
Conversational Ai-Powered Design: Chatgpt as Designer, User, and Product
,”
arXiv preprint 2302.07406
.
4.
Filippi
,
S.
,
2023
, “
Measuring the Impact of Chatgpt on Fostering Concept Generation in Innovative Product Design
,”
Electronics
,
12
(
16
), p.
3535
.
5.
Ma
,
K.
,
Grandi
,
D.
,
McComb
,
C.
, and
Goucher-Lambert
,
K.
,
2023
, “
Conceptual Design Generation Using Large Language Models
,”
Proceedings of the ASME 2023 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. Volume 6: 35th International Conference on Design Theory and Methodology (DTM)
,
Boston, MA
,
Aug. 20–23
, ASME, p. V006T06A021.
6.
Herzog
,
V. D.
, and
Suwelack
,
S.
,
2022
, “
Data-Efficient Machine Learning on 3D Engineering Data
,”
ASME J. Mech. Des.
,
144
(
2
), p.
021709
.
7.
Picard
,
C.
,
Edwards
,
K. M.
,
Doris
,
A. C.
,
Man
,
B.
,
Giannone
,
G.
,
Alam
,
M. F.
, and
Ahmed
,
F.
,
2023
, “
From Concept to Manufacturing: Evaluating Vision-Language Models for Engineering Design
,”
arXiv preprint
. https://arxiv.org/abs/2311.12668
8.
Li
,
X.
,
Sun
,
Y.
, and
Sha
,
Z.
,
2024
, “
Llm4cad: Multi-modal Large Language Models for 3d Computer-aided Design Generation
,”
ASME J. Comput. Inf.. Sci. Eng.
,
25
(
2
), p.
021005
.
9.
Lin
,
X.
,
Wang
,
W.
,
Li
,
Y.
,
Yang
,
S.
,
Feng
,
F.
,
Wei
,
Y.
, and
Chua
,
T.-S.
,
2024
, “
Data-Efficient Fine-Tuning for LLM-Based Recommendation
,”
ArXiv 2401.17197
. https://arxiv.org/abs/2401.17197
10.
Li
,
X.
,
Wang
,
Y.
, and
Sha
,
Z.
,
2023
, “
Deep Learning Methods of Cross-Modal Tasks for Conceptual Design of Product Shapes: A Review
,”
ASME J. Mech. Des.
,
145
(
4
), p.
041401
.
11.
Song
,
B.
,
Miller
,
S.
, and
Ahmed
,
F.
,
2023
, “
Attention-Enhanced Multimodal Learning for Conceptual Design Evaluations
,”
ASME J. Mech. Des.
,
145
(
4
), p.
041410
.
12.
Yang
,
Z.
,
Li
,
L.
,
Lin
,
K.
,
Wang
,
J.
,
Lin
,
C.-C.
,
Liu
,
Z.
, and
Wang
,
L.
,
2023
, “
The Dawn of LMMs: Preliminary Explorations With GPT-4V(ision)
,”
arXiv preprint 2309.17421
.
13.
Wu
,
C.
,
Yin
,
S.
,
Qi
,
W.
,
Wang
,
X.
,
Tang
,
Z.
, and
Duan
,
N.
,
2023
, “
Visual Chatgpt: Talking, Drawing and Editing With Visual Foundation Models
,”
arXiv preprint 2303.04671
. https://arxiv.org/abs/2303.04671
14.
Yang
,
Z.
,
Gan
,
Z.
,
Wang
,
J.
,
Hu
,
X.
,
Lu
,
Y.
,
Liu
,
Z.
, and
Wang
,
L.
,
2022
, “
An Empirical Study of GOT-3 for Few-Shot Knowledge-Based VQA
,”
Proceedings of the AAAI Conference on Artificial Intelligence
,
Chicago, IL and Online
,
June 27– July 1
,
36
(
3
), pp.
3081
3089
.
15.
Wang
,
Z.
,
Li
,
M.
,
Xu
,
R.
,
Zhou
,
L.
,
Lei
,
J.
,
Lin
,
X.
,
Wang
,
S.
,
Yang
,
Z.
,
Zhu
,
C.
,
Hoiem
,
D.
,
Chang
,
S.
,
Bansal
,
M.
, and
Ji
,
H.
,
2022
, “
Language Models With Image Descriptors Are Strong Few-Shot Video-Language Learners
,”
NIPS'22: Proceedings of the 36th International Conference on Neural Information Processing Systems
, Article No. 617, pp.
8483
8497
. https://dl.acm.org/doi/10.5555/3600270.3600887
16.
Shao
,
Z.
,
Yu
,
Z.
,
Wang
,
M.
, and
Yu
,
J.
,
2023
, “
Prompting Large Language Models With Answer Heuristics for Knowledge-Based Visual Question Answering
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
,
Vancouver, BC, Canada
,
June 20–22
, pp.
14974
14983
.
17.
Tsimpoukelli
,
M.
,
Menick
,
J. L.
,
Cabi
,
S.
,
Eslami
,
S.
,
Vinyals
,
O.
, and
Hill
,
F.
,
2021
, “
Multimodal Few-Shot Learning With Frozen Language Models
,”
NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems
, Article No. 16, pp.
200
212
. https://dl.acm.org/doi/10.5555/3540261.3540277
18.
Li
,
J.
,
Li
,
D.
,
Savarese
,
S.
, and
Hoi
,
S.
,
2023
, “
Blip-2: Bootstrapping Language-Image Pre-training With Frozen Image Encoders and Large Language Models
,”
40th International Conference on Machine Learning
,
Honolulu, HI
,
July 23–29
, PMLR, pp.
19730
19742
.
19.
Alayrac
,
J.-B.
,
Donahue
,
J.
,
Luc
,
P.
,
Miech
,
A.
,
Barr
,
I.
,
Hasson
,
Y.
,
Lenc
,
K.
, et al
,
2022
, “
Flamingo: a Visual Language Model for Few-Shot Learning
,”
NIPS'22: Proceedings of the 36th International Conference on Neural Information Processing Systems
, Article No. 1723, pp.
23716
23736
. https://dl.acm.org/doi/10.5555/3600270.3601993
20.
Hurst
,
A.
,
Lerer
,
A.
,
Goucher
,
A.
,
Perelman
,
A.
,
Ramesh
,
A.
,
Clark
,
A.
,
Ostorw
,
A.
,
Welihinda
,
A.
,
Hayes
,
A.
,
Radford
,
A.
, et al
,
2023
, “
GPT-4V(ision) System Card
,”
arXiv preprint 2410.21276
. https://api.semanticscholar.org/CorpusID:263218031
21.
Xian
,
Y.
,
Schiele
,
B.
, and
Akata
,
Z.
,
2017
, “
Zero-Shot Learning—The Good, the Bad and the Ugly
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Honolulu, HI
,
July 21–26
, pp.
4582
4591
.
22.
Pourpanah
,
F.
,
Abdar
,
M.
,
Luo
,
Y.
,
Zhou
,
X.
,
Wang
,
R.
,
Lim
,
C. P.
,
Wang
,
X.-Z.
, and
Wu
,
Q. J.
,
2022
, “
A Review of Generalized Zero-Shot Learning Methods
,”
IEEE. Trans. Pattern. Anal. Mach. Intell.
,
45
(
4
), pp.
4051
4070
.
23.
Radford
,
A.
,
Narasimhan
,
K.
,
Salimans
,
T.
, and
Sutskever
,
I.
,
2018
, “
Improving Language Understanding by Generative Pre-Training
,” Preprint. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
24.
Devlin
,
J.
,
Chang
,
M.-W.
,
Lee
,
K.
, and
Toutanova
,
K.
,
2019
, “
Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding
,”
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)
,
1
(Long and Short Papers), pp.
4171
4186
.
25.
Howard
,
J.
, and
Ruder
,
S.
,
2018
, “
Universal Language Model Fine-Tuning for Text Classification
,”
arXiv preprint
.
26.
Makatura
,
L.
,
Foshey
,
M.
,
Wang
,
B.
,
Hähnlein
,
F.
,
Ma
,
P.
,
Deng
,
B.
, and
Tjandrasuwita
,
M.
,
2024
, “
How Can Large Language Models Help Humans in Design And Manufacturing? Part 2: Synthesizing an End-to-End LLM-Enabled Design and Manufacturing Workflow
,”
Harvard Data Sci. Rev.
,
5
.
27.
Jiang
,
S.
, and
Luo
,
J.
,
2024
, “
AutoTRIZ: Artificial Ideation With TRIZ and Large Language Models
,”
Proceedings of the ASME 2024 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. Volume 3B: 50th Design Automation Conference (DAC)
,
Washington, DC
,
Aug. 25–28
, ASME, p. V03BT03A055.
28.
Yuan
,
Z.
,
Lan
,
H.
,
Zou
,
Q.
, and
Zhao
,
J.
,
2024
, “
3D-PreMise: Can Large Language Models Generate 3D Shapes With Sharp Features and Parametric Control?
arXiv preprint
.
29.
Wu
,
F.
,
Hsiao
,
S.-W.
, and
Lu
,
P.
,
2024
, “
An AIGC-Empowered Methodology to Product Color Matching Design
,”
Displays
,
81
, p.
102623
.
30.
Ma
,
K.
,
Grandi
,
D.
,
McComb
,
C.
, and
Goucher-Lambert
,
K.
,
2023
, “
Conceptual Design Generation Using Large Language Models
,”
Proceedings of the ASME 2023 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. Volume 6: 35th International Conference on Design Theory and Methodology (DTM)
,
Boston, MA
,
Aug. 20–23
, ASME, p. V006T06A021.
31.
D. Nelson
,
M.
,
2023
, “
Utilizing ChatGPT to Assist CAD Design for Microfluidic Devices
,”
Lab Chip
,
23
(
17
), pp.
3778
3784
.
32.
Brisco
,
R.
,
Hay
,
L.
, and
Dhami
,
S.
,
2023
, “
Exploring the Role of Text-to-Image AI in Concept Generation
,”
Proc. Design Soc.
,
3
(
1
), pp.
1835
1844
.
33.
Nelson
,
M. D.
,
Goenner
,
B. L.
, and
Gale
,
B. K.
,
2023
, “
Utilizing Chatgpt to Assist CAD Design for Microfluidic Devices
,”
Lab Chip
,
23
(
17
), pp.
3778
3784
.
34.
Picard
,
C.
,
Edwards
,
K. M.
,
Doris
,
A. C.
,
Man
,
B.
,
Giannone
,
G.
,
Alam
,
M. F.
, and
Ahmed
,
F.
,
2023
, “
From Concept to Manufacturing: Evaluating Vision-Language Models for Engineering Design
,”
arXiv preprint 2311.12668
.
35.
Jones
,
R. K.
,
Barton
,
T.
,
Xu
,
X.
,
Wang
,
K.
,
Jiang
,
E.
,
Guerrero
,
P.
,
Mitra
,
N.
, and
Ritchie
,
D.
,
2020
, “
Shapeassembly: Learning to Generate Programs for 3d Shape Structure Synthesis
,”
Association for Computing Machinery
,
39
(
6
), p.
234
.
36.
Chang
,
T. A.
,
Tu
,
Z.
, and
Bergen
,
B. K.
,
2024
, “
Characterizing Learning Curves During Language Model Pre-training: Learning, Forgetting, and Stability
,”
Trans. Assoc. Comput. Linguist.
,
12
, pp.
1346
1362
.
37.
Wu
,
R.
,
Xiao
,
C.
, and
Zheng
,
C.
,
2021
, “
DeepCAD: A Deep Generative Network for Computer-Aided Design Models
,”
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
, Montreal, QC, Canada, pp.
6752
6762
.
38.
Xu
,
X.
,
Willis
,
K. D.
,
Lambourne
,
J. G.
,
Cheng
,
C.-Y.
,
Jayaraman
,
P. K.
, and
Furukawa
,
Y.
,
2022
, “
Skexgen: Autoregressive Generation of CAD Construction Sequences With Disentangled Codebooks
,”
arXiv preprint
, pp.
24698
24724
.
39.
Chang
,
A. X.
,
Funkhouser
,
T.
,
Guibas
,
L.
,
Hanrahan
,
P.
,
Huang
,
Q.
,
Li
,
Z.
, and
Savarese
,
S.
,
2015
, “
ShapeNet: An Information-Rich 3D Model Repository
,”
arXiv preprint 1512.03012
.
40.
Mo
,
K.
,
Zhu
,
S.
,
Chang
,
A. X.
,
Yi
,
L.
,
Tripathi
,
S.
,
Guibas
,
L. J.
, and
Su
,
H.
,
2019
, “
Partnet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3d Object Understanding
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
,
Long Beach, CA
,
June 15–20
, pp.
909
918
.
41.
Wu
,
Z.
,
Song
,
S.
,
Khosla
,
A.
,
Yu
,
F.
,
Zhang
,
L.
,
Tang
,
X.
, and
Xiao
,
J.
,
2015
, “
3d Shapenets: A Deep Representation for Volumetric Shapes
,”
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
,
Boston, MA
,
June 8–10
, pp.
1912
1920
.
42.
Kim
,
S.
,
Chi
,
H.-G.
,
Hu
,
X.
,
Huang
,
Q.
, and
Ramani
,
K.
,
2020
, “
A Large-Scale Annotated Mechanical Components Benchmark for Classification and Retrieval Tasks With Deep Neural Networks
,”
Computer Vision – ECCV 2020: 16th European Conference
, Part XVIII, pp.
175
191
. .
Glasgow, UK, August 23–28
.
43.
Manda
,
B.
,
Dhayarkar
,
S.
,
Mitheran
,
S.
,
Viekash
,
V. K.
, and
Muthuganapathy
,
R.
,
2021
, “
‘CADSketchNet’—An Annotated Sketch Dataset for 3D CAD Model Retrieval With Deep Neural Networks
,”
Comput. Graphics
,
99
(
9
), pp.
100
113
.
44.
Lee
,
H.
,
Lee
,
J.
,
Kim
,
H.
, and
Mun
,
D.
,
2022
, “
Dataset and Method for Deep Learning-Based Reconstruction of 3D CAD Models Containing Machining Features for Mechanical Parts
,”
J. Comput. Design Eng.
,
9
(
1
), pp.
114
127
.
45.
Willis
,
K. D. D.
,
Pu
,
Y.
,
Luo
,
J.
,
Chu
,
H.
,
Du
,
T.
,
Lambourne
,
J. G.
,
Solar-Lezama
,
A.
, and
Matusik
,
W.
,
2021
, “
Fusion 360 Gallery: a Dataset and Environment for Programmatic CAD Construction From Human Design Sequences
,”
ACM Trans. Graphics
,
40
(
4
), pp.
1
24
.
46.
Koch
,
S.
,
Matveev
,
A.
,
Jiang
,
Z.
,
Williams
,
F.
,
Artemov
,
A.
,
Burnaev
,
E.
,
Alexa
,
M.
,
Zorin
,
D.
, and
Panozzo
,
D.
,
2019
, “
ABC: A Big CAD Model Dataset for Geometric Deep Learning
,”
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
,
Long Beach, CA
,
June 15–20
, IEEE, pp.
9593
9603
.
47.
Ramnath
,
S.
,
Haghighi
,
P.
,
Ma
,
J.
,
Shah
,
J.
, and
Detwiler
,
D.
,
2020
, “
Design Science Meets Data Science: Curating Large Design Datasets for Engineered Artifacts
,”
Proceedings of the ASME 2020 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. Volume 9: 40th Computers and Information in Engineering Conference
,
Virtual, Online
,
Aug. 17–19
, ASME, p. V009T09A043.
48.
Zheng
,
Z.
,
Wang
,
P.
,
Liu
,
W.
,
Li
,
J.
,
Ye
,
R.
, and
Ren
,
D.
,
2020
, “
Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
,”
Proceedings of the AAAI conference on artificial intelligence
,
New York
,
Feb. 7–12
.
49.
Sun
,
Y.
,
Li
,
X.
, and
Sha
,
Z.
,
2024
, “
Multimodal Dataset for Computer-Aided Design (CAD) Model Generation
,”
Texas Data Repository
.
You do not currently have access to this content.