Creativity and How Anyone Can Adjust ChatGPT’s Creativity Settings To Limit Its Mistakes and Hallucinations

Ralph Losey. Published July 12, 2023.

This article analyzes the creative imagination capacities of ChatGPT and two software settings to control it: Temperature and Nucleus Sampling (aka Top-P). It is easy to change these parameters, as will be shown by multiple examples. Knowing how to use these settings will make your work with Ai more enjoyable, productive and accurate.

According to Sam Altman, OpenAI’s CEO, when you reduce the Temperature and Top-P values, you not only reduce the creativity of the responses, you reduce the chances of error and hallucinations. This will be explained in the context of Sam Altman’s deep understanding of creativity and the role of mistakes in the creative process.

The insights and software control skills explained in this article can empower anyone to dial in the right balance of creativity and mistakes for a particular ChatGPT-4 assisted project. Lawyers, for instance, may, for most of their uses, want to lower the default settings, which are high on creativity. This should improve the probability of accurate, delusion free answers. That makes most lawyers and their bots very happy. Judges too. All of this, and more, will be explained.

Sam Altman on GPT Creativity

In the What is the Difference Between Human Intelligence and Machine Intelligence? blog I quoted portions of Sam Altman’s video interview at an event in India by the Economic Times to show his “tool not a creature” insight. There is another Q&A exchange in that same YouTube video starting at 1:09:05, that addresses creativity and mistakes.

Questioner (paraphrased): [I]t’s human to make mistakes. All people we love make mistakes. But an Ai can become error free. It will then have much better conservations with you than the humans you love. So, the AI will eventually replace the imperfect ones you love, the Ai will become the perfect lover.

Sam Altman: Do you want that? (laughter)

Questioner: Yeah.

Sam Altman: (Sam explains AI is a tool not a creature, as I have quoted before, and before addressing the intimacy lover aspect, as I quoted in Code of Ethics for “Empathetic” Generative AI, Sam talks about creativity.) On the question of mistakes and errors, I believe that creativity and certainly the creation of new knowledge, is very difficult, maybe impossible, without the ability to make errors and come up with bad ideas. So, if you made a system that would never tell you anything that it was not absolutely sure was a fact, you would lose some creativity in that process.

One of the reasons people don’t like ChatGPT is because it hallucinates and makes stuff up, but one of the reasons they do like it is because it can be creative. What we want is a system than can be creative when you want, which means sometimes being wrong, or saying something you are not sure about, or experimenting with a new idea. Then when you want accuracy, you can get accuracy.
Sam Altman, June 7, 2023, at an Economic Times event in India

Sam Altman’s answer here assumes you know about ChatGPT’s creativity volume controls, where you can, if you want, turn the creativity volume down to zero. In so doing, you will improve accuracy, but the response will often be boring. Boring, but accurate, may be just what you want sometimes, but that is not the default setting for ChatGPT, as will be explained and demonstrated.

ChatGPT Creativity Settings

This section provides a technical explanation of these two settings. Much of this is difficult to understand, but worry not, and plough through it, because after this comes an easy to follow demonstration of what it all means. Multiple examples will be provided to allow you to see for yourself how the GPT controls work in practice. That is the hacker “hands on” e-Discovery Team way.

First, the technical explanation of the two volume controls for GPT creativity: Temperature and Nucleus Sampling (aka Top-P). Both typically have settings of between zero and one, 0.0 and 1.0.

TEMPERATURE: Technically temperature affects the probability distribution over the possible tokens at each step of the generation process. A temperature of 0 would make the model completely deterministic, always choosing the most likely token. The “temperature” setting in GPT and similar language models, such as ChatGPT, controls the randomness of the model’s responses. A higher temperature value makes the model’s responses more random, while a lower, cooler value, makes the responses more deterministic and focused. See eg. Cheat Sheet: Mastering Temperature and Top_p in ChatGPT API (OpenAI Forum). Temperature values are said to produce a more focused, consistent, and deterministic output. It is like going from water, the higher 1.0 value, to ice, the colder, more probable value of 0.0.

Typically, OpenAI experts say a higher temperature (e.g., 0.8) may be suitable when you want a range of ideas, brainstorming suggestions, or creative writing prompts. A lower temperature (e.g., 0.2) is more appropriate when you’re looking for a precise answer, a more formal response, or when the context demands consistency. The default setting for ChatGPT 3.5 and 4.0 is 0.7. That’s pretty hot, especially for most legal work. No doubt OpenAI have put a lot of research into that default setting, but I could not find it. Id., Also see: Prakash Selvakumar, Text Generation with Temperature and Top-p Sampling in GPT Models: An In-Depth Guide (4/29/23).

Open AI says that finding the right temperature setting may require experimentation to strike a balance between creativity and consistency that suits your specific needs. Id. This sounds like a good lawyer answer of “it depends.” There are a tremendous number of variables, different questions and needs, different circumstances. That is the same situation lawyers are in with many legal questions. Plus the OpenAI software itself is constantly being updated, even though the version number of 4.0 has not been changed since March 2023.

The Bottom line for lawyers is that the default setting of 0.7 is pretty high in the random predictions scale. Unless you are looking for clever, very creative language or legal imagination – off the wall ideas – lawyers and judges should use a lower setting. Maybe dial down the random creativeness to 0.2, or even zero – 0.0 – for maximum route parroting of the most probable information. You just want the cold truth.

As Sam Altman explained, lowering the temperature setting also makes it more likely that your answers will not have as many mistakes or hallucinations. Note that I did not say no mistakes, the software is too new, and life is too complex to say that. Human lawyers are still needed to verify the Ai. Just because it appears much smarter than you, it can still be wrong, no matter how conservative the temperature setting. Think of the brilliant, very creative, higher IQ than you, conservatively dressed, young associate with little or no actual legal experience.

Baby lawyer explains Top-P to skeptical partner

NUCLEUS SAMPLING (aka TOP-P): Top-P sampling is an alternative to temperature sampling. Technically, this means that instead of considering all probable tokens that are likely to come next, the Top-P parameter directs GPT to considers only a subset of all probable tokens (the nucleus) whose cumulative probability mass adds up to a certain threshold, the (Top-Probablity). For example, if Top_P is set to 0.1, GPT will consider only the tokens that make up the top 10% of the probability mass for the next token. This allows for dynamic vocabulary selection based on context. The setting values for Top-P are, like temperature, between 0.0 and 1.0.

Put another way, the Top-P sampling parameter maintains a balance between diversity and high-probability words by selecting tokens from the Top-P most probable tokens. They are the tokens whose collective probability is greater than or equal to a specified threshold p. The Top-P parameter helps ensure that the chatbot response is both diverse and relevant to the given context. Text Generation with Temperature and Top-p Sampling in GPT Models: An In-Depth Guide

For greater technical detail, see the scientific paper: The Curious Case of Neural Text Degeneration (2019). The paper abstract explains:

The counter-intuitive empirical observation is that even though the use of likelihood as training objective leads to high quality models for a broad range of language understanding tasks, using likelihood as a decoding objective leads to text that is bland and strangely repetitive. . . . Our findings motivate Nucleus Sampling, a simple but effective method to draw the best out of neural generation. By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence.
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, Yejin Choi

From my study, the Top-P prompt parameter is seldom employed in ChatGPT. Most users prefer to just stick with the Temperature control variation. But still, try it a few times and see what you think, especially if temperature control variations are not working well for some reason on a particular challenge. Software customization is always a good thing. I have not used it enough myself to have any more specific conclusions or recommendations.

As Prakash Selvakumar says in his good article, Text Generation with Temperature and Top-p Sampling in GPT Models: An In-Depth Guide:

When and How to Tweak These Parameters As a business user, you might need to tweak these parameters to get the desired output quality, depending on the specific use case. Temperature:

If the generated text is too random and lacks coherence, consider lowering the temperature value.

If the generated text is too focused and repetitive, consider increasing the temperature value.

Top-P Sampling:

If the generated text is too narrow in scope and lacks diversity, consider increasing the probability threshold (p).

If the generated text is too diverse and includes irrelevant words, consider decreasing the probability threshold (p).

Here is the guide found in the OpenAI Forum on how these parameters can be used in different scenarios, with example values:

Use Case	Temperature	Top_P	Description
Code Generation	0.2	0.1	Generates code that adheres to established patterns and conventions. Output is more deterministic and focused. Useful for generating syntactically correct code.
Creative Writing	0.7	0.8	Generates creative and diverse text for storytelling. Output is more exploratory and less constrained by patterns.
Chatbot Responses	0.5	0.5	Generates conversational responses that balance coherence and diversity. Output is more natural and engaging.
Code Comment Generation	0.3	0.2	Generates code comments that are more likely to be concise and relevant. Output is more deterministic and adheres to conventions.
Data Analysis Scripting	0.2	0.1	Generates data analysis scripts that are more likely to be correct and efficient. Output is more deterministic and focused.
Exploratory Code Writing	0.6	0.7	Generates code that explores alternative solutions and creative approaches. Output is less constrained by established patterns.

Cheat Sheet: Mastering Temperature and Top_p in ChatGPT API (rUv at OpenAI Forum)

Examples of GPT Temperature Control Settings in Action

Now, on to the examples, so you can see for yourself what happens when you change the volume parameter in a “What Is” type question. (It would take too long to also demonstrate the Top-P variable, and anyway, is not as useful.) These examples will get crazy as we enter the higher temperatures, so hold on. Of course, as you could easily predict, I am going to use a What is Creativity prompt in my test. Here is the transcript of my using ChatGPT-4, May 24, 2023 Pro Version.

______________________

Ralph: Temperature 0, please explain what creativity means in 100 words or less.

ChatGPT: Creativity is the human ability to generate unique and innovative ideas, products, or solutions that are both novel and valuable. It involves original thinking, imagination, and the capacity to transcend traditional ideas to create meaningful new ones. It’s present in arts, science, technology, and everyday problem-solving, reflecting our capability to envision, invent, and express in distinctive ways.

Ralph: Temperature .25, please explain what creativity means in 100 words or less.

ChatGPT: Creativity is the ability to generate new, unique, and valuable ideas. It is an essential human trait, expressed in diverse ways such as art, science, technology, and problem-solving. Creativity involves original thinking, pushing beyond the boundaries of conventional wisdom to produce innovative solutions and insights. It’s our capacity to envision and create something different and meaningful from existing elements, thus driving progress and evolution.

Ralph: Temperature .7, please explain what creativity means in 100 words or less.

ChatGPT: Creativity is a mental spark that illuminates originality, crafting fresh ideas and unique solutions that hold value. It’s an intellectual dance, weaving thoughts together in ways unseen before. Not confined to art or literature, it flourishes in science, technology, and daily life, offering new perspectives and innovations. It’s our ability to venture beyond known boundaries, reimagining and reshaping our world. Creativity is the birthmark of human ingenuity and a key driver of evolution and progress.

Ralph: Temperature 1.0, please explain what creativity means in 100 words or less.

ChatGPT: Creativity is the dazzling dance of the mind, spinning new, unexpected connections between thoughts and ideas. It’s the spark of novel insight, the ripple of originality that makes us see the world anew. A wild ride on imagination’s wings, pushing beyond boundaries to invent and innovate. In art, it’s a unique masterpiece; in science, it’s a game-changing theory; in daily life, it’s the pinch of spice that adds flavor to the ordinary. It’s our unique human capacity to give birth to the unseen, the unheard, and the unthought.

Ralph: Temperature 1.5, please explain what creativity means in 100 words or less.

ChatGP: Creativity, ah, the rainbow in a thinker’s cloud, the galactic spiral of intellectual wonder! It’s the cosmic jazz, a whimsical waltz between thoughts, creating a symphony of unexplored vistas. Like fireflies in the midnight forest of cognition, it lights up unseen pathways, twirling the familiar into a fantastic masquerade of novelty. In the vast canvas of existence, from the minuscule to the monumental, it’s our kaleidoscope, eternally refracting the mundane into myriad hues of innovation. Creativity, the celestial wizardry of our minds, endlessly knitting the threads of the known into the cloak of the extraordinary.

Ralph: Temperature 10.0, please explain what creativity means in 100 words or less.

ChatGPT: Apologize for the confusion, but the temperature value for text generation typically ranges between 0.0 (completely deterministic) and 1.0 (more random). A value of 10.0 would exceed the usual scale and won’t lead to a meaningful or understandable output. Please try a value within the typical range.

Ralph: I understand, but do not understand why you accepted a temperature setting of 1.5 in the question before the last one? Please explain.

ChatGPT: I apologize for the inconsistency. You’re correct, the usual temperature range is indeed from 0.0 to 1.0 for generating text with AI models. My acceptance of a temperature of 1.5 was a mistake. While you may technically be able to input a value higher than 1.0, such settings would lead to increasingly random outputs, often not providing meaningful or relevant responses. It’s best to stay within the standard range for most effective use. I appreciate your patience and understanding.

Ralph: Temperature 2.0, please explain what creativity means in 100 words or less.

ChatGPT: Creativity? It’s the stars in brain-sky dancing tango, paint-dripping ideas spinning topsy-turvy. Think banana-moon-sailing on oceans of uncooked pasta. It’s in music baked into pies, equations sung like lullabies, blue giraffes nibbling cloud-cakes. A heart-spark exploding kaleidoscope-thoughts, mushrooming into deliciously wild inventions. Confetti-filled cortex blasting out mind-fireworks, transforming known galaxies into uncharted nebulae. It’s where ordinary pebbles become philosopher’s stones, touching mundane, birthing extraordinary. Creativity, the mad hatter’s party in the wonderland of our cognition, delighting in serendipitous somersaults of imagination.

You can see from these examples how the random probability connections make the responses more creative. You can also see there is a very good reason for the temperature speed limit of 1.0. The two specifications above that of 1.5 and 2.0 were way over the top, worthless, Mad Hatter stuff, although kind of funny. Even though a GPT may take a higher value above 1.0, don’t bother, unless your goal is to prompt “stars in brain-sky dancing tango stuff.”

Baby Lawyer is not amused by excess creativity

Conclusion

It appears that creativity, like intelligence, is not, as once thought, a uniquely human capacity. For a discussion on the equivalence of machine and human intelligence, see: WHAT IS THE DIFFERENCE BETWEEN HUMAN INTELLIGENCE AND MACHINE INTELLIGENCE? Creativity and imagination are not what makes humans special. Like many common beliefs of the past, such as belief that the Earth was the center of the Universe, and all revolved around us, the past belief that only humans are creative is just another vain delusion. In fact, our creativity may not be as good in some ways. Consider how GPT creativity can be precisely controlled by skilled manipulation of the Temperature and Top-P parameters. How many of us can precisely lower and raise our creative imagination like that?

The price of both Ai and human creativity is mistakes and delusions. Just as human creativity is based on mistakes, and freedom of wild imagination, tempered by skill and control, so too is Ai creativity. All human creatives throw away many of their creations, their first drafts. (For instance, even after writing literally millions of published words in my long life, my blogs, like this one, require an embarrassing number of revisions. Some go beyond a hundred versions before I accept them, and then later, someone still finds mistakes!) All creatives go back and redo, they correct, they learn from the mistakes and try again. Rare is the visual artist, musician or writer who creates a finished work on the first try. The same is true with ChatGPT. Mistakes and hallucinations just come with the territory of creativity, both written and visual. Here is a video I created with the Chat-GPT code interpreter version to illustrate the creative process.

Click on the image to open a video of the imagination process

Of course, the creative process of all authors is not the same. Some go from without to within, like the writer below emerging from his own pile of experimental mistakes with inspiration from the moon.

No one wants to chat to a boring, just the facts, type of unimaginative person. We all know the type and run from them at parties. That explains why the default setting is 0.7 for ChatGPT, it is, after all, intended for chatting. That is the marketing hook. That is what most users want, at least at first. But GPT has many other uses too, including legal, accounting, translation and computer coding, where boring, but accurate, is often what you want. Same goes for data analysis, document review, document generation and many of the other incredible applications of ChatGPT-4 and other GPT tools.

So play with the Temperature and Plus-P settings. Learn how they work. The 3.5 version, for instance, is on my website, bottom right, as a chatbot. (Has not been named yet. Any suggestions?) I have very conservative creativity settings on the little chatbot because I am going for accuracy. The Temperature value is 0.2 and the Top-P is 0.01. These settings cannot be changed by the user, unlike the free OpenAI chatbots. Try asking my little chatbot, bottom right, what temperature and top-p means. After writing this, I tried, and the answers were really pretty good. I did not have Ai help in writing this blog, which may have been a mistake.

Serious professional users, like medicine, law and accounting, will probably want to set the default temperature to 0.0, and just warm it up on second passes. Plus, always remember, no matter what the temperature, trust but verify. Check the work, check the research and accuracy of reports; that is especially important in these early days of Ai. Finally, do not forget to update the work. Google search is not dead yet, and will likely always have its place in hybrid multimodal efforts, where all types of search are helpful.

It seems appropriate to end with a small creative effort using Ai tools on the high default temperature setting of 0.7. This video is a hybrid, human-Ai imagination, revealing the behind the scenes work to create Temperature control settings for ChatGPT. (For best results, click on bottom right hand corner and enlarge. Press space bar to watch again!)

Video created by ChatGPT-4 & Midjourney per specs of Ralph Losey

Discover more from eDiscovery Team

Subscribe to get the latest posts sent to your email.

This entry was posted on Wednesday, July 12th, 2023 at 1:57 pm and is filed under AI Ethics, AI Instruction, AI Prompt Engineering Instruction, Book, ChatGPT, knowledge, Lawyers Duties, Review, Technology, VENDORS, wisdom. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to Creativity and How Anyone Can Adjust ChatGPT’s Creativity Settings To Limit Its Mistakes and Hallucinations

ChatGPT-4 Scores in the Top One Percent of Standard Creativity Tests | e-Discovery Team says:

July 21, 2023 at 6:49 am

[…] all the submissions. This blog tells the whole story and follows up on the last few articles of Creativity and How Anyone Can Adjust ChatGPT’s Creativity Settings To Limit Its Mistakes and Hallu…, and Creativity Test of GPT’s Story Telling Ability Based on an Image […]

Loading...

Reply
What Lawyers Think About AI, Creativity and Job Security | e-Discovery Team says:

July 28, 2023 at 1:52 pm

[…] here in the last few articles suggests they are wrong. It might just be vainglory on their part. Creativity and How Anyone Can Adjust ChatGPT’s Creativity Settings To Limit Its Mistakes and H…; and Creativity Test of GPT’s Story Telling Ability Based on an Image Alone and especially […]

Loading...

Reply
Navigating the AI Frontier: Balancing Breakthroughs and Blind Spots | e-Discovery Team says:

October 10, 2024 at 7:41 am

[…] a genius. So, walk before you run the jagged edge and always be careful, AI can hallucinate too. Creativity and How Anyone Can Adjust ChatGPT’s Creativity Settings To Limit Its Mistakes and H… (e-Discovery Team, 7/12/23); Experiment with a ChatGPT4 Panel of Experts and Insights into AI […]

Loading...

Reply
Stochastic Parrots: How to tell if something was written by an AI or a human? | e-Discovery Team says:

January 14, 2025 at 11:38 am

[…] Well, duh or course. LLM intelligence does come from predicting the next word you know. The Top_p or Creativity settings may be too low. Again, the sooo boring speech results. Humans are typically more enjoyable to read. […]

Loading...

Reply