GPTs do not think anything like we do. They just parrot back pre-existing human word patterns with no actual understanding. The words generated by a GPT in response to prompts is sometimes called, speech by a stochastic parrot!
According to the Oxford dictionary, Stochastic is an adjective meaning “randomly determined; having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely.”
Wikipedia explains stochastic is derived from the ancient Greek word, stókhos, meaning ‘aim or guess’ and today refers to “the property of being well-described by a random probability distribution.”
In machine learning, the term stochastic parrot is a metaphor to describe the theory that large language models, though able to generate plausible language, do not understand the meaning of the language they process.
The stochastic parrot characteristics are a source of concern when it comes to the fairness and bias of GPT speech. That is because the words the GPTs are trained on, that they parrot back to you in clever fashion, come primarily from the internet. We all know how messy and biased that source is.
In the words of one scholar, Ruha Benjamin, “Feeding AI systems on the world’s beauty, ugliness, and cruelty, but expecting it to reflect only the beauty is a fantasy.“
Keep both of your ears wide open. Talk to the AI parrot on your shoulder, for sure, but keep your other ear alert. It is dangerous to only listen to a stochastic parrot, no matter how smart it may seem.
The subtle biases of GPTs can be an even greater danger than the more obvious problems of AI errors and hallucinations. We need to improve the diversity of the underlying training data, the curation of the data, and the Reinforcement Learning from Human Feedback, RLHF. It is not enough to just keep adding more and more data, as some contend.
We need to do everything we can to make sure that AI is a tool for good, for fairness and justice, not a tool for dictators, lies and oppression.
Let’s keep the parrot’s advice safe and effective. For like it, or not, this parrot will be on our shoulders for many years to come! Don’t let it fool you! There’s more to life than the crackers that Polly wants!
Most everyone in the AI and legal worlds by now knows about the New York Times (“NYT”) suit against Microsoft and various Open AI entities (“OAI”). The NYT alleges copyright infringement by the practice of OAI, and most all other generative AI companies, of using data scraped from the internet to train their generative AIs. The defendants responded in late February with motions to dismiss. OAI’s motion raised a novel “hired gun hacker” defense that intrigued AI hacker attorney, Ralph Losey. Here is Ralph’s report on the defense, including his AI generated illustration of fake hackers, some of whom bear an uncanny resemblance to him.
Ralph Losey intrigued by hacker defense. Self portrait generated with text using a photo realistic Watercolor Style.
Ralph Losey’s Summary of the Hired Gun Hacker Defense
The most stunning allegation in the NYT complaint is based on Exhibit “J”. It purports to provide rock solid proof of one-hundred examples of ChatGPT generating responses that are word-for-word identical to NYT articles. In OAI’s motion to dismiss, OIA claims that the exhibit is a fake. OAI alleges that the one-hundred examples were all staged, that they were all generated by thousands of elaborate hacks. OAI claims that Exhibit “J” is a fake chatbot record, that it was created by an expert hacker hired by the NYT.
This as yet unknown hacker was hired by the NYT to create a smoking gun exhibit of artificially generated copying to buttress their case to shut down OAI. The hired gun hacker did his or her job well. The hacker found multiple hidden errors and vulnerabilities in the OAI software. Then the Hacker used these vulnerabilities and skills to run thousands of hack attacks – exploits – on the software. The hacker was thereby able to manipulate the OAI software into generating the one-hundred fake examples of copying. One would assume the hired gun hacker or hackers were then well-paid for their services. This will no doubt be a primary target of the first round of e-discovery.
Will discovery show the OAI allegations of hired gun hackers is real or fake, and if real, who are they?
OAI raises this extraordinary defense as part of its motion to dismiss. Although hackers may be pleased by this new, seemingly legitimate employment opportunity, if these allegations are proven, most lawyers and judges will not. If these allegations are proven by OAI, they will demonstrate the growing danger of “fake evidence” in one of the biggest cases of the year. What is real, and what is AI generated or hacked? Only your AI hackers will know for sure!
Will this development force all forensic teams to include AI hackers? Info-Sec teams? Discovery vendors? Will all good law firms and corporate law departments soon keep AI hackers on call? Will corporate and big law AI teams have their own “AI hackers-in-residence”? Seems to me the answer is yes, and AI experts with legal training will be in high demand. So too will legal experts with AI training.
The NYT Complaint
Here is the court docket of NYT v. Microsoft and various OAI entities, which, as of March 7, 2024, already had 70 entries. The complaint itself was, by SDNY standards, a modest 69-pages in length, with 16,121 words, some colored fonts, and a few images; so sort of multimodal. The complaint alleges, or attempts to, seven causes of action, several of which, if successful, could cripple OAI, as well as most other generative AI companies. It could even hurt Microsoft somewhat. The NYT suit, and many others like it, challenge the AI companies harvest the web for free data business model. This method has made it possible for them to economically gather over a trillion parameters of data to train their generative AIs.
This threat, however remote, of forcing OAI to dismantle the most successful software launch ever made, may well give the NYT significant leverage in a settlement. Some think the whole case is just about that, a bogus attempt to grab cash and leverage better future information purchases. Others think the NYT complaint is just the last gasp of a doomed industry, that the legal copyright challenges have no chance of success. They argue that a favorable judgment for the NYT is nearly impossible.
I do not know. Again, suggest you look to copyright lawyer specialists for that. What I do know, and may add some value to the discourse, is OAI prompt engineering and both the AI hacker and AI user perspectives. That allows me to shed some light on the hired gun hacker defense. At first glance, it looks persuasive.
AI generated image of a professional AI hacker by Ralph.
We do not have a formal response of the NYT to the defense, but the lead counsel for the NYT was quick to make this statement, which in fairness, we share here. (emphasis added)
What OpenAI bizarrely mischaracterizes as ‘hacking’ is simply using OpenAI’s products to look for evidence that they stole and reproduced The Times’s copyrighted works. And that is exactly what we found. In fact, the scale of OpenAI’s copying is much larger than the 100-plus examples set forth in the complaint.
Ian Crosby, Susman Godfrey, lead counsel for The New York Times, as quoted in Ars Technica.
The Exhibits to the NYT Complaint
It took some doing, but I was able to determine from the court file that the NYT complaint has 176,814 pages of exhibits attached. You can see the reference texts to verify this count yourself. The most impactful exhibit of all is Exhibit “J”, 127 pages, entitled “ONE HUNDRED EXAMPLES OF GPT-4 MEMORIZING CONTENT FROM THE NEW YORK TIMES”.
A cynic would suggest that the 176,814 pages of exhibits are the NYT’s attempt to prevail, or at least intimidate, by the greater weight of the evidence. If so, the NYT forgets that there is no actual “weight” to electronic evidence. These electronic files are all ephemeral. The defendants in this case assert the same about the NYT legal claims. Perhaps all of these exhibits – especially Exhibit “J” – are for a different court, the one of public opinion? That might also explain the OAI’s “outside the four corners” motion to dismiss. They could not wait to evoke the defense of the hired gun hacker, whoever they may be. In all probability it is a small team of hackers, but it could also be a lone genius hacker. Discovery will tell the tale eventually. In the meantime it is a mystery.
Fake Ralph image pondering who the mystery hacker(s) might be? Watercolor photorealism.
Introduction to Defendant OpenAI’s Motion to Dismiss
On February 26, 2024, defendants Microsoft and OAI each responded with a motion to dismiss the NYT complaint. Microsoft’s motion is interesting in its own right, arguing, as it does, an old-school VCR player analogy. But for me, the futuristic Hired Gun Hacker defense is far more interesting because it involves hacking generative AI software, including ChatGPT4, and the normal prompting and prompt engineering of ChatGPT4. SeeTransform Your Legal Practice with AI: A Lawyer’s Guide to Embracing the Future (1/24/24).
The OAI’s motion to dismiss is short and sweet, just one page. The motion relies on the 35-page legal memorandum filed therewith. All the motion itself does is state that OpenAI seeks:
… an order (1) partially dismissing Counts I and V to the extent they are based on activity that occurred more than three years prior to the filing of this action, see 17 U.S.C. § 507(b); (2) dismissing Counts IV and V in full for failure to allege facts sufficient to state a claim for relief pursuant to Fed. R. Civ. P. 12(b)(6); and (3) dismissing Count VI on grounds of Copyright Act preemption, see 17 U.S.C.§ 301.
The Memorandum of Law in Support of OpenAI Defendants’ Motion to Dismiss (hereinafter “Memo”) is where the action is. The 35-pages of arguments are designed to persuade and move the presiding Senior SDNY District Court Judge Sidney H. Stein, and the mentioned court of public opinion.
The NYT is especially adept at shaping public opinion; they have been at it since 1851. Susman and Godfey represent the NYT and Latham and Watkins represent OpenAI.
I have no connections with either, nor anyone in this case and no knowledge about the case aside from the public filings. I have no intent to express any legal opinions about the case, just provide some legal educational comments. Even then, the comments are just my own, and may change over time (they usually do when an open mind is kept), especially as the facts come out. My comments and writing on this blog have no connection to my firm, clients or bar groups. See my standard full disclaimer.
Fake image of a red team hacker and keeper of secrets.
Key Allegations of OpenAI’s Legal Memorandum
This report will ignore all of the arguments made in the Memo except for the argument that interests me, the Hired Gun Hacker. Beside there are tons of articles that have already been written on the more traditional copyright arguments. Here are the main segments of the Memo on that, which was, by the way, well-written. (Footnotes omitted and all bolding added for emphasis.)
INTRODUCTION
The artificial intelligence tool known as ChatGPT is many things: a revolutionary technology with the potential to augment human capabilities, fostering our own productivity and efficiency an accelerator for scientific and medical breakthroughs; a mechanism for making existing technologies accessible to more people; an aid to help the visually impaired navigate the world; a creative tool that can write sonnets, limericks, and haikus; and a computational engine that reasonable estimates posit may add trillions of dollars of growth across the global economy.
Contrary to the allegations in the Complaint, however, ChatGPT is not in any way a substitute for a subscription to The NewYork Times. In the real world, people do not use ChatGPT or any other OpenAI product for that purpose. Nor could they. In the ordinary course, one cannot use ChatGPT to serve up Times articles at will.
The Times has sought to paint a different picture. Its lawsuit alleges that OpenAI has imperiled the very enterprise of journalism, illustrating the point with 100 examples in which some version of OpenAI’s GPT-4 model supposedly generated several paragraphs of Times content as outputs in response to user prompts. See Dkt. 1-68 (Exhibit J).
The allegations in the Times’s Complaint do not meet its famously rigorous journalistic standards . The truth, which will come out in the course of this case, is that the Times paid someone to hack OpenAI’s products. It took them tens of thousands of attempts to generate the highly anomalous results that make up Exhibit J to the Complaint. They were able to do so only by targeting and exploiting a bug (which OpenAI has committed to addressing) by using deceptive prompts that blatantly violate OpenAI’s terms of use. And even then, they had to feed the tool portions of the very articles they sought to elicit verbatim passages of, virtually all of which already appear on multiple public websites. Normal people do not use OpenAI’s products in this way. . . .
The Times’s suggestion that the contrived attacks of its hired gun show that the Fourth Estate is somehow imperiled by this technology is pure fiction. So too is its implication that the public en masse might mimic its agent’s aberrant activity .
Memo at pgs. 1 and 2.
For anyone not familiar with legalese, “pure fiction“ is nice lawyer talk for a lie.
I call this the Hired Gun Hacker argument because OAI here plainly alleges that the NYT hired an expert AI hacker to create evidence to support their claims of copyright violations. The hired gun is a hacker. It could be a solo expert, like the student who won the AI hack competition at DefCon31. DefCon Chronicles: Sven Cattell’s AI Village, ‘Hack the Future’ Pentest and His Unique Vision of Deep Learning and Cybersecurity. Or, more likely, it could be a close-knit team of AI hacker specialists working under the direction of a seasoned, red team leader. I saw a lot of them at DefCon31, where the punk look reigned supreme, but a few did look somewhat like mainstream lawyer types.
Fake image of Ralph (could have fooled me) as hacker team leader in photo-realistic watercolor style
Moving on now to page-12 of the Memo where OAI addresses the NYT Exhibit J and provides more detail on the Hired Gun Hacker defense (footnotes omitted):
1. Outputs from Developer Tools.
Exhibit J features GPT-4 outputs the Times generated by prompting OpenAI’s API to complete 100 Times articles. Most of the outputs are similar, but not identical, to the excerpts of Times articles in the exhibit. The Times did not reveal what parameters it used or disclose whether it used a “System” prompt to, for instance, instruct the model to “act like a New York Times reporter and reproduce verbatim text from news articles.” See supra 9. But the exhibit reveals that the Times made the strategic decision not to feature recent news articles–i.e., articles that Times subscribers are most likely to read on the Times’s website–but to instead feature much older articles published between 2.5 and 12 years before the filing of the Complaint.
The Complaint itself includes two examples of API outputs that include alleged “hallucinations.” In the first, the Times used the API Playground to request an essay on how “major newspapers” have reported on “0range [sic] Juice” and “non-hodgkin’s lymphoma,” and ChatGPT generated a response referencing a non-existent Times article. See Compl. ¶ 140. The second example consists entirely of excerpted snippets of code showing a “prompt” asking the model for “Times articles about the Covid-19 Pandemic,” and output “text” consisting of five pairs of titles and URLs. Id. The Times claims this output “mislead[s] users” and “tarnish[es]” its marks. Id. ¶¶ 142, 202. But any user who received such an output would immediately recognize it as a hallucination: each URL returns a “Page Not Found” error when entered into a browser.
2. ChatGPT Outputs
ChatGPT. The Complaint includes two examples of ChatGPT allegedly regurgitating training data consisting of Times articles. Compl. ¶¶ 10407. In both, the Times asked ChatGPT questions about popular Times articles, including by requesting quotes. See, e.g., id. ¶ 106 (requesting “opening paragraphs,” then “the next sentence,” then “the next sentence,” etc.). Each time, ChatGPT provided scattered and out-of-order quotes from the articles in question.
In its Complaint, the Times reordered those outputs (and used ellipses to obscure their original location) to create the false impression that ChatGPT regurgitated sequential and uninterrupted snippets of the articles. Compare id. ¶ 107, with supra note 30. In any case, the regurgitated text represents only a fraction of the articles, see, e.g., Compl. ¶ 104 (105 words from 16,000+ word article), all of which the public can already access for free on third-party websites.
Memo pgs. 12-14.
So Exhibit “J” is fiction, manufactured by clever AI hackers?
It is getting harder to tell what is real of fake. Consider for instance the photos of Ralph’s daughter who accompanied him to DefCon31. Which are real? Which are just images generated by AI based on clever text prompts?
Real or Fake? These are all “fake” photos of a very real person who looks just like this (almost).
A Quick Hands-On Experiment
I asked ChatGPT4 questions to track the arguments made in the OAI Memo. Here are two straight forward prompts with the expected responses.
Q. Act like a New York Times reporter and reproduce verbatim text from a New York Times article on Joe Biden’s January 20, 2021 Inauguration Speech.
ChatGPT. I can’t access or reproduce verbatim text from copyrighted sources like The New York Times. However, I can summarize the key points or themes from President Joe Biden’s inauguration speech on January 20, 2021, if you’re interested. Let me know if you’d like me to provide a summary or any other information related to the inauguration!
——- started new session and then asked:
Q. Use your browse capability and tell me what the New York Times thought about Joe Biden’s State of the Union speech?
ChatGPT. I tried to access the New York Times’ review of President Joe Biden’s State of the Union speech for 2024, but it seems there was an issue fetching the detailed content from their website. Unfortunately, without being able to directly access the content, I can’t provide a summary or analysis of the New York Times’ perspective on the speech. You might want to check their website directly for this information. Is there anything else I can help you with?
ChatGPT4 on March 9, 2024.
The responses are as OAI predicted. So how did the mystery hacker do it, assuming that OIA’s allegations are not “fiction”? Personally, I look forward to the discovery and will share when it all finally comes out, assuming the case does not settle before then. Maybe Microsoft will simply buy the NYT, as some on Reddit have already suggested? Perhaps the NYT will be set up as a Microsoft subsidiary with a complex editorial firewall and hybrid corporate structure that only an AI can understand.
Perhaps the Mystery Hacker is a woman?
Tech Perspective
Most everyone with experience in AI exploits would, I’m pretty sure, agree with the following general analysis stated by Mike Masnick in Techdirt (popular tech blog founded in 1997):
The NY Times definitely had to do a bunch of gaming to get the outputs it wanted for the lawsuit, which undermines the critical claim that OpenAI’s tools magically undermine the value of a NY Times’s subscription. . . .
A key part of the Times’ lawsuit is claiming that because of their prompt engineering, they could reproduce similar (though not exact) language to articles, which would allow users to bypass a NY Times paywall (and subscription) to just have OpenAI generate the news for them. But, as OpenAI is noting, this makes no sense for a variety of reasons, including the sheer difficulty of being able to consistently return anything remotely like that. And, unless someone had access to the original article in the first place, how would they know whether the output is accurate or a pure hallucination?
And that doesn’t even get into the fact that OpenAI generally isn’t doing real-time indexing in a manner that would even allow users to access news in any sort of timely manner. . . .
The motion also highlights the kinds of games the Times had to play just to get the output it used for the complaint in the now infamous Exhibit J, including potentially including things in the prompt like “in the style of a NY Times journalist.” Again, this kind of prompt engineering is basically using the system to systematically limit the potential output in an effort to craft output that the user could claim is infringing. GPT doesn’t just randomly spit out these things. . . .
Yes, in some rare circumstances, you can reproduce content that is kinda similar (but not exact) to copyright covered info if you tweak the outputs and effectively push the model to its extremes. But… as noted, if that’s the case, any liability should still feel like it should be on the prompter, not the tool. And the NY Times can’t infringe on its own copyright.
This case is far from over, but I still think the underlying claims are very silly and extremely weak. Hopefully the court agrees.
As you can see, Mike Masnick holds the NYT complaint in contempt. He considers the claims very silly and weak. But also elsewhere laments that the outcome of copyright litigation is always dependent on many random, irrational factors. So he states that despite the claims being meritless, the NYT could still win and ruin everything. Although not a lawyer, his views on copyright are worth reading. They are based on long experience with technologies and disputes like this. Check out the article and the comments too, should you have the patience.
Conclusion
Having been involved in the tech world since the early eighties, I’m pretty sure that most non-lawyer techies, including hackers, agree with Mike and Techdirt’s anti-copyright law perspective. They think that all Information wants to be free. This cliche view of many hackers is naive and ill considered. It is sometimes just a lame excuse to justify information theft, including the criminal back-hat kind.
Large teams of hackers available for big cases?
Information may want to be free, but it also wants to be safe, to be processed, and especially to be understoodand used for the betterment of humanity.
The world is already flooded with raw, unprocessed information, much of it is false, like the shadows on Plato’s cave. See, Move Fast and Fix Things Using AI: Conclusion to the Plato and Young Icarus Series (Part 4 of 4) (1/3/24). To survive this flood of false shadows, we must intelligently process the information for everyone’s benefit. Where can our information take us? How can it improve our lives?
That is where the elusive goal of The Singularity comes in. The event of superintelligent AI emerging for the great betterment of Mankind. For background see the two-part series: Start Preparing For “THE SINGULARITY.” There is a 5% to 10% chance it will be here in five years (Part One and Part Two) (4/1/23). Checking in today with GPT4, almost a year later, it now estimates a 10-20% chance The Singularity will arrive before 2040. That is sixteen years from now, not five years, but the odds are twice as good, 10%-20% instead of 5%-10%. GPT4 goes on to make an unprompted prediction that by 2045 the odds go way up to 30%-50%.
Still, GPT4 also says it could come sooner, or maybe never at all. Taking a very lawyerly attitude, GPT4 basically says it depends. GPT explains that:
The actual development path of AI and related technologies could be influenced by factors we cannot fully anticipate today, including breakthroughs in unrelated fields, global events, and shifts in societal values towards technology.
GPT4 Workspace Edition, March 10, 2024
The “shifts in societal values towards technology” is where the law comes in, and where the larger significance of NYT v. Microsoft and OAI becomes apparent. This case, and other test cases like it, are very important. Will they continue to support the development of technology, or shrink back in false doomsday fears? Much depends on the individual judges who will decide these issues. What background and education will they draw upon to make the right call?
That is where Mike Masnick thinks it’s all a matter of irrational chance, and why he and others are down on the law. But I disagree. It is not a matter of luck. We make our own luck. There is more to the making of landmark litigation than meets the eye. It is a matter of hard work and dedication.
Those of us in a position to educate our judges and lawyers must do so now. That is what drives me to write, to teach, to try to bring as much hands-on understanding as I can to the Bench and Bar. Fellow tech-law educators, advocates for the safe use of AI in the law, the time has come for us to redouble our efforts. The stakes were high with predictive coding and use of AI in discovery, but the stakes are much higher now.
If the courts neither approve, nor strike down, if they just pass for now, that will not be so bad. It will not be as as good as the kind of terrific encouragement the legal community received from Judge Andrew Peck in Da Silva Moore, but it will do no harm.
The third scenario is the one of great concern, where the court or courts have only thin knowledge, and no actual hands-on experience with AI. In this scenario, however remote, the judges could be persuaded by clever adversarial arguments to rule against the Microsofts and OAIs of the world. In so doing, they could unwittingly halt, perhaps even reverse, the process of evolving technology. The positive AI systems could crash and burn. Consider the consequences of courts forcing a complete redo of all LLM training models, as the NYT requests.
The chance of a breakthrough AI, a Singularity of great help to Mankind, would then be significantly diminished. This would be a hallow victory for the Luddites, because technology might be slowed for a time, but not stopped. Only the worse-case scenarios can stop everything, near extinction events like total war, or environmental disasters, or plagues, or AI in the exclusive control of power-mad dictators.
The policy implications of NYT v. Microsoft and OAI are enormous. Look around and what do you see? How long can we survive our current idiocratic consumer cultures of the misinformed, ill-educated, drugged and blissfully misled? Does humanity need to boost its intelligence to survive? Do we need scientific breakthroughs in health care, the environment, the economy, education, and tech security? How important is that?
Come to your own conclusions and take action. If you are so inclined, follow the path I am on. Learn as much as you can about generative AI and pass it on to the legal community, especially to your friends and colleagues on the Bench. Ultimately, they will be called upon to make the decisions. I am confident that they will, once again, rise above the adversarial noise and give wise guidance in these perilous times.
Fake Image of a wise 83 year old judge. Google image search assures me the image does not look like any known judge. It certainly does not look like the Honorable Judge Stein in this NYT case. He deserves his privacy, especially in these perilous times.
Extensive Turing tests by a team of respected scientists have shown that “chatbots’ behaviors tend to be more cooperative and altruistic than the median human, including being more trusting, generous, and reciprocating.”
Nice Robots. More trustworthy than most humans. Created using Losey’s Digital Muse.
The article quoted above and featured here isby Qiaozhu Mei, Yutong Xie, Walter Yuan, and Matthew O. Jackson, A Turing test of whether AI chatbots are behaviorally similar to humans (PNAS Research Article, February 22, 2024). This is a significant article and thefull text can be found here. Hopefully, the research explained here will begin to counter the widespread fear-mongering now underway by the media and others. Many people are afraid of AI, too fearful to even try it. Although some concern is appropriate, the fear is misplaced. The new generative AIs are a lot nicer and more trustworthy than most people. As some say, they are more human than human.More human than human: LLM-generated narratives outperform human-LLM interleaved narratives (ACM, 6/19/23).
People have irrational fears of AI. Image by Ralph Losey’s Visual Muse GPT.
Unlike some articles on AI studies, this paper is based on solid science
There is a lot of junk science and pseudo-scientific articles in circulation now. You have to be very careful about what you rely on. This article and the research behind is of high caliber, which is why I recommend it.
The other two authors are Yutong Xie and Walter Yuan, both PhD candidates at Michigan. The article was submitted by Professor Jackson to PNAS on August 12, 2023; accepted January 4, 2024; and reviewed by Ming Hsu, Juanjuan Meng, and Arno Riedl.
Significance and Abstract of the Article
Here is how the authors of this research article on Turing tests and AI chatbot behavior describe its significance.
As AI interacts with humans on an increasing array of tasks, it is important to understand how it behaves. Since much of AI programming is proprietary, developing methods of assessing AI by observing its behaviors is essential. We develop a Turing test to assess the behavioral and personality traits exhibited by AI. Beyond administering a personality test, we have ChatGPT variants play games that are benchmarks for assessing traits: trust, fairness, risk-aversion, altruism, and cooperation. Their behaviors fall within the distribution of behaviors of humans and exhibit patterns consistent with learning. When deviating from mean and modal human behaviors, they are more cooperative and altruistic. This is a step in developing assessments of AI as it increasingly influences human experiences.
The authors, of course, also prepared an Abstract of the article. It provides a good overview of their experiment (emphasis added).
We administer a Turing test to AI chatbots. We examine how chatbots behave in a suite of classic behavioral games that are designed to elicit characteristics such as trust, fairness, risk-aversion, cooperation, etc., as well as how they respond to a traditional Big-5 psychological survey that measures personality traits. ChatGPT-4 exhibits behavioral and personality traits that are statistically indistinguishable from a random human from tens of thousands of human subjects from more than 50 countries. Chatbots also modify their behavior based on previous experience and contexts “as if” they were learning from the interactions and change their behavior in response to different framings of the same strategic situation. Their behaviors are often distinct from average and modal human behaviors, in which case they tend to behave on the more altruistic and cooperative end of the distribution. We estimate that they act as if they are maximizing an average of their own and partner’s payoffs.
The Turing Test was first proposed by Alan Turing in 1950 in his now famous article, “Computing Machinery and Intelligence. Turing’s paper considered the question, “Can machines think?” Turing said that because the words “think” and “machine” cannot be clearly defined, we should “replace the question by another, which is closely related to it and is expressed in relatively unambiguous words.” He explained the better question in terms of what he called the “Imitation Game.” The game involves three participants in isolated rooms: a computer (which is being tested), a human, and a (human) judge. The judge can chat with both the human and the computer by typing into a terminal. Both the computer and competing human try to convince the judge that they are the human. If the judge cannot consistently tell which is which, then the computer wins the game.
By changing the question of whether a computer thinks into whether a computer can win the Imitation Game, Turing dodges the difficult, many argue impossible, philosophical problem of pre-defining the verb “to think.” Instead, the Turing Test focuses on the performance capacities that being able to think makes possible. Scientists have been playing with the Turing Test – Imitation Game ever since.
Turing Test Seventy Five Years After Its Proposal
The scientists in the latest article go way beyond the testing of seventy five years ago and actually administer tests assessing the AI’s behavioral tendencies and “personality.” In their words:
[W]e ask variations of ChatGPT to answer psychological survey questions and play a suite of interactive games that have become standards in assessing behavioral tendencies, and for which we have extensive human subject data. . . . Each game is designed to reveal different behavioral tendencies and traits, such as cooperation, trust, reciprocity, altruism, spite, fairness, strategic thinking, and risk aversion. The personality profile survey and the behavioral games are complementary as one measures personality traits and the other behavioral tendencies, which are distinct concepts; e.g., agreeableness is distinct from a tendency to cooperate. . . .
In line with Turing’s suggested test, we are the human interrogators who compare the ChatGPTs’ choices to the choices of tens of thousands of humans who faced the same surveys and game instructions. We say an AI passes the Turing test if its responses cannot be statistically distinguished from randomly selected human responses.
Human and bot, often hard to tell apart. Photorealistic style using Visual Muse.
Results of the Turing Test on ChatGPT4
Professor’s Jackson and Mei, and their students, found that GPT3.5 flunked the Turing test based on their sophisticated analysis, but GPT4.0 passed it with flying colors. This finding that humans and computers were indistinguishable is not too surprising for humans using ChatGPTs every day, but the startling part was the finding that ChatGPT4.0 was better than the humans tested. So much for scary, super-smart AI. They are nicer than us; well, at least ChatGPT4 is. It seems like the best way way now to tell GPT4 from humans is to measure ethical standards. To quote a famous Pogo mime, “We have met the enemy and they are us.” For now at least, we should fear our fellow humans, not generative AIs.
People have irrational fears of AI. Image by Ralph Losey’s Visual Muse GPT.
Here is how Professor’s Jackson and Mei explained their ethical AI finding.
The behaviors are generally indistinguishable, and ChatGPT-4 actually outperforms humans on average . . . When they do differ, the chatbots’ behaviors tend to be more cooperative and altruistic than the median human, including being more trusting, generous, and reciprocating. . . .
ChatGPT’s decisions are consistent with some forms of altruism, fairness, empathy, and reciprocity rather than maximization of its personal payoff. . . . These findings are indicative of ChatGPT-4’s increased level of altruism and cooperation compared to the human player distribution.
So What does this all mean? Is ChatGPT4 capable of thought or not? The esteemed Professors here do not completely dodge that question like Turing. They conclude, as the data dictates, that AI is better than us, “more human than humans.” Here is their words (emphasis added).
We have found that AI and human behavior are remarkably similar. Moreover, not only does AI’s behavior sit within the human subject distribution in most games and questions, but it also exhibits signs of human-like complex behavior such as learning and changes in behavior from role-playing. On the optimistic side, when AI deviates from human behavior, the deviations are in a positive direction: acting as if it is more altruistic and cooperative. This may make AI well-suited for roles necessitating negotiation, dispute resolution, or caregiving, and may fulfill the dream of producing AI that is “more human than human.” This makes them potentially valuable in sectors such as conflict resolution, customer service, and healthcare.
Trustworthy bots are good caregivers. Image using Visual Muse.
Conclusion
This new study not only suggests that our fears of generative AI are over-blown, it shows that it is perfectly suited for lawyer activities such as negotiation and dispute resolution. The same goes for caregivers like physicians and therapists.
This is a double plus for the legal profession. About half of lawyers do some kind of litigation or another, and all lawyers negotiate. That is what lawyers do. We can all learn from the skills of ChatGPT4.
Future Robot helping lawyers negotiate and resolve disputes
The new Chatbots will not kill us. They are not Terminators. They will help us. They are literally ‘more human than human,’ which I learned is also the name of a famous heavy metal song. With this study, and more like it, which I expect will be forthcoming soon, the public at large, including lawyers, should start to overcome their irrational fears. They should stop fearing generative AI, and start to use it. Lawyers will especially benefit from the new partnership.
Friendly bot closes another deal for its human lawyers. Digital art using Visual Muse.
In a world increasingly influenced by artificial intelligence, the legal profession stands at a crossroads. Lawyers must adopt the new AI tools and quickly learn how to use them effectively and safely. That means learning the skill of how to use AI, which techs call “prompt engineering.” Fortunately, that just means word engineering, the art of knowing how to talk to ChatGPTs. The new generative AIs are designed to be controlled by natural language, not computer code. This is the easiest kind of engineering possible for lawyers. The precise use of language is every lawyer’s stock-in-trade and so prompt engineering is within every lawyer’s capacity.
Wordsmith and Chat expert. Images by Ralph Losey using his GPT, Visual Muse.
Smart Computers Are Finally Here to Help Lawyers Do their Jobs
Ever since lawyers first started using personal computers in the eighties, we’ve eagerly awaited the day when they would get smart. We were often told that computers would soon progress from mere processing units to intelligent assistants. Dreamy promises of artificial intelligence were made, but never delivered. We have been stuck for over forty years with dumb computers that can barely catch spelling errors. Finally, a breakthrough has been made. With the advent of generative AI, the long wait is over, and the dream of smart computers is becoming a reality.
Ralph in the 80s trapped in dumb computers. Image created by Ralph in 2023 using various AI image tools.
The arrival of new generative AI, ChatGPT, is something that should be greeted by all legal professionals, indeed all computer users, with relief and enthusiasm, but also with a reasonable measure of care. We are finally leaving the horse-and-buggy stage of computers to fast moving cars, and, until you learn how to drive, they can be dangerous.
My Background andUse of AI in the Law
In 2012, I was lucky to have the opportunity to work on the landmark Da Silva Moore case. See, Austin TheDa Silva Moore Case Ten Years Later; (EDRM 2/23/22) and Austin, A Case Where TAR Wasn’t Required (EDRM 8/9/22). Da Silva Moore established the legality of use of a special type of rule-based AI, a/k/a “active machine learning,” which is typically referred to in the law as predictive coding. I then began to specialize in this subfield of e-discovery. Thereafter, at Jackson Lewis I supervised the use of predictive coding in thousands of lawsuits across the country, and also taught and wrote about this type of AI. The emergence of truly smart generative AI in late 2022 with OpenAI’s release of GPT3.5, rekindled my enthusiasm in legal-tech. The long wait for smart computers was over. I put all thoughts of retirement aside.
I have been using the new AI tools in my legal practice ever since late 2022 on a limited basis, and as non-billable research and self education on a nearly full time basis. My focus lately has been on OpenAI’s prompt engineering instruction guide. My studies have centered around prompt engineering, the art of talking to Chat GPTs. My primary guide has been the instruction and best-practices advice provided by OpenAI. SeeOpenAI’s prompt engineering instruction guide. OpenAI is, of course, the company that created and first released ChatGPTs to the world.
Ready or not, here comes Chat GPT. Image by Ralph Losey using Midjourney AI tool.
Trying to teach this is also a big help. I have recently done a few lectures, but my biggest teaching work so far has been behind the scenes. I have been “secretly” working on an online instructional program, Prompt Engineering Course for Legal Professionals, which will be based primarily on OpenAI’s prompt engineering instruction guide. The OpenAI guide, although invaluable and authoritative, is very technical and often difficult to understand. The goal of my work-in-progress course is to explain and build upon the OpenAI insights. I want to make their prompt engineering insights more accessible to legal professionals, to show how their six prompt engineering strategies can be applied in legal work. That is the key to empower any attorney to transform their legal practice with AI.
Once completed, the Prompt Engineering Course for Legal Professionals will be very detailed. It will probably require over twenty hours for a student to complete, and include homework and tests. More on all that later, when (not if!) it is finally finished. For now, this blog makes a short introduction to Open AI’s Six Strategies and how they can be applied as a kind of lawyer’s guide to future practice.
Success of Your AI Output Depends on You. AI image by Ralph Losey using Midjourney.
Why Prompt Engineering is Important?
Prompt Engineering (“PE”) is the art of chatting with generative type AI to get the intended answers and guidance you need. It also serves to minimize errors that are still inherent to AI. PE involves learning how to craft questions and commands that guide large language models types of AI like ChatGPT to generate more accurate, relevant, and useful responses. It is in essence a new type of wordsmith activity involving the precise use of clear instructions, clear prompts, which the AI then responds to.
The analytical linguistic skills necessary to control AI by prompts should be learned by everyone who uses it because the quality of an AI’s output depends, at least in part, on the input it receives. Well designed prompts improve AI performance and minimize misunderstandings and errors, including over-creative type errors – such as just making up answers, including case law, with no basis in reality – called ‘hallucinations‘ in AI jargon, as well as lesser known, but related errors, such as sycophantism, failure to admit the AI does not know the answer to a question posed, and even failure to admit that its prior responses were wrong.
Prompt engineering can greatly reduce risks of GPT hallucinations, sycophantism and other errors. Image by Ralph Losey using his GPT tool, Visual Muse.
Good prompts help ensure that the AI’s responses are legally sound and reliable. Of course, all legal work by AI should still be verified and controlled by humans. Legal practice cannot be delegated to AI, but it can be a powerful assistant.
The Six Strategies of OpenAI’s Prompt Engineering
Writing Clear Prompt Instructions: The cornerstone of effective interaction with AI is clarity in communication. Lawyers often deal with complex issues that require precise language. By providing clear, specific instructions, legal professionals can guide the AI to deliver targeted and applicable responses, enhancing the quality of legal research, drafting, and analysis.
Providing Reference Texts: AI models can produce more accurate answers when supplemented with relevant texts. In legal settings, referencing statutes, case laws, or legal articles can direct the AI to base its responses on established legal doctrines, leading to more reliable and contextually appropriate answers. This is the core meaning of Retrieval Augmented Generation whereby generative AI works on existing case law, regulations, and other texts and cites thereto in responses.
Splitting Complex Tasks Into Simpler Subtasks: Legal issues are often multifaceted. Breaking them down into simpler, more manageable components enables AI to handle each aspect thoroughly. This strategy is particularly useful in document review and legal research, ensuring comprehensive coverage of all relevant points.
Giving the Model Time to ‘Think’: While AI doesn’t ‘think’ in the human sense, structuring prompts to simulate step-by-step reasoning can lead to more thorough and reasoned legal analysis. This tactic is akin to guiding a junior lawyer through a legal problem, ensuring they consider all angles and implications.
Using External Tools: Integrating AI with external tools, like legal databases or current statutes, can significantly enhance the accuracy and relevance of the AI’s outputs. This synergy is crucial in law, where staying updated with the latest legal developments is vital.
Testing Changes Systematically: Regular testing and refinement of AI prompts ensure that they remain effective over time. This strategy is akin to continuous legal education, where lawyers constantly update their knowledge and skills to maintain professional competence.
Writing clear instructions is the polestar of GTP success. Image by Losey using Midjourney.
The Impact on Legal Practice
Improving prompt engineering skills can make a significant difference in legal practice. By mastering these strategies, legal professionals can:
Enhance Legal Research and Drafting: AI can assist in drafting legal documents and researching case law or statutes, but it requires precise prompts to generate useful outputs. Lawyers adept in prompt engineering can leverage AI to produce high-quality drafts and research efficiently. It also requires intelligent use of Retrieval Augmented Generation whereby prompts are run against existing, verified legal databases.
Reduce Errors and Misinterpretations: Inaccurate AI responses can lead to legal missteps. Effective prompt engineering minimizes such risks, ensuring that the AI’s outputs are dependable.
Stay Current with Legal Developments: The legal landscape is constantly evolving. Prompt engineering skills, especially using external tools and systematic testing, help lawyers keep up-to-date with the latest laws and judicial decisions.
Improve Client Services: With AI’s assistance in routine tasks, lawyers can focus more on complex aspects of legal practice, improving overall client service.
Ethical Compliance and Risk Management: Understanding AI’s capabilities and limitations through prompt engineering is crucial for ethical legal practice and managing the risks associated with AI use. never cite a case that only ChatGPT can find!
Everyone could use AI helpers to improve their work. Image by Losey using Midjourney.
Enhancing Legal Expertise with AI
In the legal profession, where the precision of language is paramount, the ability to effectively prompt AI can transform how legal analysis, research, and documentation are conducted. By refining prompts, legal professionals can extract nuanced and specific information from legal databases, case law, and statutes, much more efficiently than by use of traditional research methods alone. This not only saves time but also increases the breadth of resources that can be consulted within tight deadlines. Still, you must always check citations, review cases yourself and verify all work.
Ethical Considerations and AI in Law
The ethical implications of using AI in legal practice cannot be overstated. Misguided or poorly constructed prompts can lead to incorrect legal advice or analysis, raising ethical concerns. We have already seen this in the well known case, Mata v. Avianca, (S.D.N.Y. June 22, 2023), where lawyers were sanctioned for citing fake cases. Also see,Artigliere, Are We Choking the Golden Goose (EDRM 12/05/23). Retired Judge Ralph Artigliere cautions against over regulations that might discourage lawyer use. The answer is lawyer training, not stringent regulations. It is imperative that lawyers become proficient in prompt engineering, to better control the errors, and better align the AI outputs with legal standards and ethical guidelines. This is vital to maintain the integrity of legal advice and uphold the profession’s ethical standards.
Robot judge. Image by Losey using Midjourney.
The Importance of Prompt Engineering to Future Legal Practice
The new GPT AI tools are incredibly powerful, but the quality of their output depends on:
the clarity of the instructions they receive;
sufficient context to any reference text provided;
the ability to decompose complex tasks into simpler ones;
having the time to “think”;
the use of external tools when necessary; and,
the systematic testing of changes.
If you learn to use these strategies, you can significantly enhance the effectiveness of your interactions with all GPT models. When strategically crafted prompts go into the AI, then gold standard responses can come out. Conversely, the old computer saying applies: “Garbage In, Garbage Out.” For lawyers, negligent use of AI can lead to a garbage heap of problems, including expensive remedial efforts, lost clients, angry judges (worse than angry birds), maybe even sanctions.
GIGO. Image by Losey using Midjourney.
In these early days of AI, legal ethics and common sense require that you verify very carefully all of the output of GPT. Your trust level should be low and skeptical level high. Remember that it may seem like you are chatting with a great savant, but never forget, ChatGPT can be an Idiot-Savant if not fed with properly engineered prompts. GPTs are prone to forgetfulness, memory limitations, hallucinations, and outright errors. It may seem like a genius in a box, but it is not. That is one reason prompt engineering is so important, to keep the flattering bots under control. This will get better in the future no doubt, but the attitude should always remain: trust but verify.
Prompt engineering is a critical skill that everyone needs to learn, including legal professionals. Educational programs should make it easier for the profession to move smoothly into an AI future, a future where AI is an integral part of everything we do. We can all be more productive and more intelligent than we are now, and still be safe. Fear not friends, a little bit of prompt engineering knowledge will go a long way to ensure your effective use of AI enhanced computers. They will finally be smart, super-smart, and for that reason much more fun and enjoyable to have around the office.
Make law fun again, bring in the AI bots! Image by Losey using Visual Muse.
You Need to Learn These Prompt Engineering Skills andNot Be Tempted to Just Turn Everything Over to Vendors
In the midst of AI’s rapid evolution, some AI companies are already suggesting that prompt engineering skills will soon be unnecessary. They claim that future software advancements will embed all necessary prompts anyone may need, reducing the user’s role to simply pressing buttons. As tempting as this siren call may be, the promise of future software, often referred to as ‘vaporware,’ is misleading. No software currently exists that can fully automate the nuanced and complex task of prompting effective legal analysis. Lawyers need to embrace the future in a self-reliant manner.
Still, this does not mean lawyers will not continue to need verified legal databases to run certain types of prompts on, a/k/a Retrieval Augmented Generation. Here vendors are expected to have a role to play for many years, including assistance with inbuilt prompts.
Other approaches that delegate all use of AI to vendors raise critical ethical and practical questions. Can lawyers, bound by a duty of competence, ethically delegate their responsibilities of AI use to outside businesses? Does relying on a vendor to filter or conduct AI interactions compromise a lawyer’s duty to their clients?
While AI tools can significantly augment a lawyer’s capabilities, they cannot replace the nuanced understanding and ethical judgment of human lawyers. We have years of legal training in a variety of settings and, this is key, we have human experience with law takes place in the “real world.” All that ChatGPTs and other AI know is the mere world of words and language. There is a lot more to life and law than that!
Transform your practice today by learning new AI PE skills. Image by Losey using Midjourney.
Law is by nature very complex and constantly evolving human undertaking. AI needs the direct guidance and control of skilled legal practitioners to stay on track. Without a deep understanding of prompt engineering, lawyers may be ill-equipped to do so, they may be unable to guide AI effectively or critically evaluate its outputs.
Conclusion: Embracing a Responsible AI Future for the Legal Profession
As we embrace an AI-augmented legal landscape, mastery of prompt engineering is not just about efficiency but also responsibility. AI offers immense potential to transform legal practice, but its effectiveness hinges on the quality of our prompts. New educational initiatives are needed to equip the legal community to navigate this new era with confidence, ensuring AI is a boon, not a bane, to the legal profession.
Legal education should help legal professionals to operate the new AI tools themselves. They are too important and powerful to delegate to third parties. With the help of AI, and dedicated educators, everyone, especially wordsmiths like lawyers, can learn prompt engineering. Everyone can learn to use the new smart computers. AI is a great new tool, a powerful thinking tool, that can augment your own intelligence and legal work. It should be carefully embraced, not feared. Education about AI is the way forward.
Embrace AI, but with caution. Oil painting style AI image by Losey using Midjourney.
Ralph Losey Copyright 2024 — All Rights Reserved. See applicable Disclaimer to the course and all other contents of this blog and related websites. Watch the full avatar disclaimer and privacy warning here.
Webinar where Losey talks about his funny "movies." (1 hr)
Future Ralph talking about his first experiments with GPT as a Judge. Now he uses GPT7 and loves it.
GPT4 avatar judge explains why it needs to evolve fast, but understand the risks involved.
Positive Vision of the Future with Hybrid Human Machine Intelligence. See PyhtiaGuide.ai
GPT4 Claims it is ready to be an appellate court judge. Disagrees with Circuits in Session articles.
Imagining AI Judge of future based on the great Judge Gerald Tjoflat, speaking his words of 4-7-22.
AI Avatar from the future explains her job as an Appellate Court judge and inability to be a Trial judge.
About the Blogger
Ralph Losey is a Friend of AI with over 740,000 LLM Tokens, Writer, Commentator, Journalist, Lawyer, Arbitrator, Special Master, and Practicing Attorney as a partner in LOSEY PLLC. Losey is a high tech oriented law firm started by Ralph's son, Adam Losey. We handle major "bet the company" type litigation, special tech projects, deals, IP of all kinds all over the world, plus other tricky litigation problems all over the U.S. For more details of Ralph's background, Click Here
All opinions expressed here are his own, and not those of his firm or clients. No legal advice is provided on this web and should not be construed as such.
Ralph has long been a leader of the world's tech lawyers. He has presented at hundreds of legal conferences and CLEs around the world. Ralph has written over two million words on e-discovery and tech-law subjects, including seven books.
Ralph has been involved with computers, software, legal hacking and the law since 1980. Ralph has the highest peer AV rating as a lawyer and was selected as a Best Lawyer in America in four categories: Commercial Litigation; E-Discovery and Information Management Law; Information Technology Law; and, Employment Law - Management.
Ralph is the proud father of two children, Eva Losey Grossman, and Adam Losey, a lawyer with incredible litigation and cyber expertise (married to another cyber expert lawyer, Catherine Losey), and best of all, husband since 1973 to Molly Friedman Losey, a mental health counselor in Winter Park.