Professors at the University of Montana announced that anonymized standard testing of ChatGPT-4 showed it scores in the top one percent of standard creativity tests. Erik Guzik, Research: AI Tests Into Top 1% For Original Creative Thinking (July 3, 2023). The Scholastic Testing Service, oblivious to the involvement of AI, scored all the submissions. This blog tells the whole story and follows up on the last few articles of Creativity and How Anyone Can Adjust ChatGPT’s Creativity Settings To Limit Its Mistakes and Hallucinations, and Creativity Test of GPT’s Story Telling Ability Based on an Image Alone.

I tried to poke holes in the standard tests used, the Torrance Tests of Creative Thinking, and the research of Professor Guzik, but ended up with respect for the TTCT test, its creator, Professor E. Paul Torrance and for Professor Guzik. My conclusion is that the standard testing research to date strongly supports the conclusion many others have already reached, that Generative AI has extraordinary abilities of creative thinking.
University of Montana Research on Humans and GPT-4 Using Standard Creativity Test
An announcement was made on July 5, 2023, by the University of Montana of research by Erik Guzik, PhD, a Professor at its College of Business, and his colleagues, that found ChatGPT scored in the top 1% of human thinkers on a standard creativity test. UM Research: AI Tests Into Top 1% For Original Creative Thinking. The scientific paper on the research has not yet been released, so I wrote Professor Guzik for more information. He confirmed that his group made a formal presentation of the details of their findings in May 2023 at the Creativity Conference 2023, Southern Oregon University, and a paper should be published in August, 2023. The conference session presentation was entitled: The creative potential of ChatGPT: An exploratory study of the Torrance Tests of Creative Thinking and the fluency, flexibility and originality of ChatGPT (GPT-4). Professor Guzik graciously sent me the slide deck his group used and a link of the video recording of the presentation, including questions and answers. The presentation to this group of creativity experts is persuasive.
The essential claims by Professor Erik Guzik as summarized by University announcement, UM Research: AI Tests Into Top 1% For Original Creative Thinking, are:
- Guzik’s team utilized the Torrance Tests of Creative Thinking (TTCT), a widely recognized tool employed for decades to evaluate human creativity. Eight test responses were generated by ChatGPT, each different. The short essay answers were submitted to The Scholastic Testing Service for assessment, along with 24 students in Professor Gizik’s entrepreneurship and personal finance classes. The ChatGPT-4 responses to the creativity questions were compared with those of Guzik’s human students, as well as with 2,700 college students who took the TTCT in 2016. The Scholastic Testing Service was oblivious to the involvement of the AI and independently scored all of the submissions.
- ChatGPT ranked in the top one percentile for fluency, showcasing its ability to generate a large volume of ideas, and for originality, demonstrating the capacity to produce new ideas. On flexibility, the capacity to create different types and categories of ideas, the AI scored in the 97th percentile. The overall ranking by Scholastic put ChatGPT in the top 1%.
- The test results are strong evidence of AI developing creative abilities comparable or even superior to human capabilities. Professor Guzik emphasized the surprise at how well ChatGPT performed in generating original ideas, typically considered a unique characteristic of human imagination. With the advanced GPT-4, ChatGPT has now topped the 1% of all human responses, surpassing previous research on GPT-3 where it did not score as well as humans on tasks involving original thinking.
Erik Guzik, Research: AI Tests Into Top 1% For Original Creative Thinking.

Torrance Tests of Creative Thinking
Erik Guzik reported that he later told ChatGPT of the test results and asked for its comment. ChatGPT-4 responded by saying human creativity may not be fully understood and more sophisticated assessment tools may be needed “to differentiate between human and AI-generated ideas.” I agree with Chat GPT, but right now, all we have is the TTCT. In view of the way the test operates, which is explained in the conference presentation, it is not possible for GPT-4 to have been trained on the TTCT test results. The test itself is proprietary and student’s short-essay type answers are scored by individual experts at The Scholastic Testing Service.

To fact check and help prepare this article, I also asked ChatGPT-4 about the TTCT test, and, as usual, checked the answers with independent Google assisted research. I learned the TTCT is a series of standardized tests developed by psychologist E. Paul Torrance in the mid 20th Century. TTCT includes both verbal and figural sections, but according to Professor Guzik, only the verbal tests were given to ChatGPT. This was because the version they used could not review the drawings used in the figural sections of the test. The verbal section involves tasks such as asking the test-taker to think of problems that could arise from an unusual situation. Scoring not only considered originality of responses, but also viability, the practical value of the responses. The TTCT tests have been used and refined over decades to measure human creativity along, what they call, four “dimensions” of creativity:
- Fluency: This measures the quantity of ideas produced. It involves tests like listing as many ideas as possible within a given timeframe. Again, the experts scoring the answers require and evaluate practical value to all of the responses.
- Flexibility: This refers to the diversity of ideas and the ability to shift between different types of ideas or approaches to problems.
- Originality: This assesses the uniqueness of the ideas generated. An original idea is one that is statistically infrequent among the responses of people in the same age and background. Again, they have to be more that just original. Gibberish Mad Hatter type responses will not score well.
- Elaboration: This measures the amount of detail in the responses, or the ability to develop ideas and ‘build’ on them to create complex structures or plans.
The Torrance Tests of Creative Thinking are commonly used in a variety of settings, including versions for children in vocational counseling at schools, to adults and businesses evaluating executives’ creative entrepreneurial capabilities. That is why the test is well known and commonly used now in business schools, such as the school at the University of Montana where Professor Erik Guzik works.

The TTCT appears to be, as ChatGPT reports, one of the most reliable and valid measures of creativity and is used widely around the world. But ChatGPT also observes that some researchers question whether the TTCT, or any other test, can fully capture the complexity and multifaceted nature of creativity. ChatGPT reports debates among experts about the cultural bias of such tests and whether they measure creativity or some other related construct like divergent thinking, or in ChatGPT’s case, improbable thinking.
E. Paul Torrance and His Creativity Test
I was very skeptical about the creativity test results before I corresponded with Erik Guzik and studied the conference presentation. As part of my test vetting, I also dug deeper and researched the life and work of the man who created the TTCT, Ellis Paul Torrance. Torrance’s pre-graduate school background in the 1930s and 1940s was working as a counselor and high school teacher in south rural Georgia. (This is the same neck of the woods as former President Jimmy Carter, whom he reminds me of.) Paul Torrance later earned a Ph.D. from the University of Michigan, and eventually became a professor at the University of Georgia, in Athens Georgia, where he wrote 88 books!
In one of his first books on education and creativity, Torrance explained that some children are like wild colts who must have their energy directed positively to be useful. Broadley, M., Square Pegs in Square Holes,(1943). This insight enabled Torrance to see that “difficult” students had potential, and that we should not just write them off, as most teachers in the 1940s would do. What do educators need to know about the Torrance Tests of Creative Thinking: A comprehensive review (National Institute of Health, 2022) (TCTT “allows schools to identify gifted children from under-represented groups who may not be identified based on standard IQ and achievement tests.”)

This NIH published article also explained that Torrance had a personal vision of the definition of creativity, which reflected the way he measured creativity. Professor Torrance defined creativity as:
A process of becoming sensitive to problems, deficiencies, gaps in knowledge, missing elements, disharmonies, and so on; identifying the difficulty; searching for solutions, making guesses, or formulating hypotheses about the deficiencies; testing and retesting these hypotheses and possibly modifying and retesting them; and finally communicating the results.
Torrance (1966, p. 6)

That strikes me as a more thoughtful definition than any of the answers that ChatGPT provided in my last blog to the question “what is creativity.” How Anyone Can Adjust ChatGPT’s Creativity Settings To Limit Its Mistakes and Hallucinations. The answer provided at the default Temperature of 0.7 was:
Creativity is a mental spark that illuminates originality, crafting fresh ideas and unique solutions that hold value. It’s an intellectual dance, weaving thoughts together in ways unseen before. Not confined to art or literature, it flourishes in science, technology, and daily life, offering new perspectives and innovations. It’s our ability to venture beyond known boundaries, reimagining and reshaping our world. Creativity is the birthmark of human ingenuity and a key driver of evolution and progress.
ChatGPT-4 (Temperature setting 0.7)
I guess that is a top 1% worthy answer, although, as a long time student of creativity, I prefer ChatGPT’s conservative answer with a 0.0 Temperature setting.
Creativity is the human ability to generate unique and innovative ideas, products, or solutions that are both novel and valuable. It involves original thinking, imagination, and the capacity to transcend traditional ideas to create meaningful new ones. It’s present in arts, science, technology, and everyday problem-solving, reflecting our capability to envision, invent, and express in distinctive ways.
ChatGPT-4 (Temperature setting 0.0)
I wonder how the Scholastic Testing Service would have scored Chat-GPT’s answers at that 0.0 Temperature setting? Professor Erik Guzik in our correspondence advised that they used all of the default settings of ChatGPT-4, including the relatively high Temperature setting of 0.7. He is looking forward to experimenting with other settings in the near future. See: Creativity and How Anyone Can Adjust ChatGPT’s Creativity Settings To Limit Its Mistakes and Hallucinations.
It is impressive to see that the tests first developed by E. Paul Torrance in mid-century rural Georgia have evolved and been successfully employed in 2023 to test the creativity of ChatGPT4, a difficult wild colt student if ever there was one. Hats off to Professor Torrance, whose TCTT test works even on robots. Can you imagine how thrilled he would have been to evaluate robot creativity in south Georgia?
Conclusion
It is important to remember that the last version of Open’s AI’s chatbot, ChatGPT-3.5, flunked this same TTCT creativity test. It just upgraded to version 4.0 in March 2023. Also remember there is no possibility that GPT-4 had seen and memorized the “best” creativity answers. Plus, the results were anonymously scored by the Scholastic Testing Service and compared with 2,700 college students who took the TTCT in 2016. Finally, recall this was not just a one time fluke of creativity by GPT-4, it took the test eight times.

All this makes me think these first tests of Generative AI are accurate and will be confirmed over time. It does not get much better than the top 1%, but if AI does keep improving in this area, it might become impossible to test it anonymously. AI super-intelligence is likely to be pretty obvious when it arrives, especially to the human experts in creativity at the Scholastic Testing Service who evaluate the answers.
These first test results are consistent with my ad hoc studies and use of Chat GPT-4 over hundreds of hours. It is also consistent with the general reactions of most other users of GPT, who were surprised by its many creative abilities, including a very high level of creativity with visual images.
As a Photoshop user since its early days in the 90s, and to a lesser extent, an amateur videographer and FinalCut software user since the early 2000s, I am blown away at the images ChatGPT-4 based software like Midjourney can now be prompted to create. Plus, it keeps getting better and better every day. It is very hard to keep up with the new software features.
Generative Ai is still far from the quality of the best human artists. Not yet. For instance, I had an opportunity recently to show Billy Collins a poem that I prompted GPT-4 to write in his style. Billy politely smiled at its amateur effort. But then he give me a suggestion on how to improve the prompt in any subsequent efforts. Other forms of generative Ai writing are not world class either, including music and, especially, non-digital arts like sculpture and ceramics.
Still, the day may come when Ai can compete with the greatest human creatives in all fields, including the creativity required for successful entrepreneurship, as Business School Professor Erik Guzik teaches. More likely, the top 1% in all fields will be humans and Ai working together in a hybrid manner. Each will synergistically boost the other’s abilities and productive output. That has been my experience in a small way, as reflected in the changes to my blog since GPT struck in November 2022.
Copyright Ralph Losey 2023 ALL RIGHTS RESERVED
[…] and Creativity Test of GPT’s Story Telling Ability Based on an Image Alone and especially ChatGPT-4 Scores in the Top One Percent of Standard Creativity Tests. Some of the highest paid, most secure attorneys today are very creative, but so too are the new […]