Abstract
Access to large amounts of diverse design solutions can support designers during the early stage of the design process. In this article, we explored the efficacy of large language models (LLMs) in producing diverse design solutions, investigating the level of impact that parameter tuning and various prompt engineering techniques can have on the diversity of LLM-generated design solutions. Specifically, we used an LLM (GPT-4) to generate a total of 4000 design solutions across five distinct design topics, eight combinations of parameters, and eight different types of prompt engineering techniques, leading to 50 LLM-generated solutions for each combination of method and design topic. Those LLM-generated design solutions were compared against 100 human-crowdsourced solutions in each design topic using the same set of diversity metrics. Results indicated that, across the five design topics tested, human-generated solutions consistently have greater diversity scores. Using a post hoc logistic regression analysis, we also found that there is a meaningful semantic divide between humans and LLM-generated solutions in some design topics, but not in others. Taken together, these results contribute to the understanding of LLMs’ capabilities and limitations in generating a large volume of diverse design solutions and offer insights for future research that leverages LLMs to generate diverse design solutions for a broad range of design tasks (e.g., inspirational stimuli).