gen ai | e-Discovery Team

Navigating the High Seas of AI: Ethical Dilemmas in the Age of Stochastic Parrots

April 3, 2024

Large Language Model generative AIs are well-described metaphorically as “stochastic parrots.” In fact, the American Dialect Society, selected stochastic parrot as its AI word of the year for 2023, just ahead of the runners-up “ChatGPT, hallucination, LLM and prompt engineer.” These genius stochastic parrots can be of significant value to all legal professionals, even those who don’t like pirates. You may want one on your shoulder soon, or at least in your computers and phone. But, as you embrace them, you should know that these parrots can bite. You should be aware of the issues of bias and fairness problems inherent in these new technical systems.

Pirate and his parrot sharing cracker moment in watercolor style by Losey.

The ethical issues were raised in my last blog and video, Stochastic Parrots: the hidden bias of large language model AI. In the video blog an avatar, which looks something like me with a parrot on his shoulder, quoted the famous article on LLM AI bias, and briefly discussed how the prejudices are baked into the training data. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? (FAccT ’21, (3/1/21) by Emily M. Bender, Timnit Gebru, Angelina McMillan-Major and Margaret Mitchell. In this followup blog I dig a little deeper into the article and the controversies surrounding it.

Robot Parrot with his iPhone in watercolor style by Losey.

Article Co-Author Timnit Gebru

First of all, it is interesting to note the internet rumor, based on a few tweets, concerning one of the lead authors of the Stochastic Parrots article, Timnit Gebru. She was a well-known leader of Google’s ethical AI team at the time she co-wrote it. She was allegedly forced to leave Google because its upper management didn’t like it. See: Karen Hao, We read the paper that forced Timnit Gebru out of Google. (MIT Technology Review, 12/04/2020); Shirin Ghaffary, The controversy behind a star Google AI researcher’s departure (Vox, 12/09/20). According Karen Ho’s article, more than 1,400 Google staff members and 1,900 other supporters signed a letter of protest about the alleged firing of Timnit Gebru as an act of research censorship. The rumor is Google tried to stop the publication of the article, but obviously the article was in fact published on March 1, 2021.

According to the MIT Technology Review article, Google did not like all four points of criticism of LLMs that were made in the Parrot article:

Environmental and financial costs. Pertaining to the need for vast amount of computing to create the LLMs and the energy costs, the carbon footprint.
Massive data, inscrutable models. The training data mainly comes from the internet and so contains racist, sexist, and otherwise abusive language. Moreover, the vast amount of data used makes the LLMs hard to audit and eliminate embedded biases.
Research opportunity costs. Basically complaining that too much money was spent on LLMs, and not enough on other types of AI. Note this complaint was made before the unexpected LLM breakthroughs in 2022 and 2023.
Illusions of meaning. In the words of Karen Ho, the Parrot article complained that the problem with LLM models is that they are “so good at mimicking real human language, it’s easy to use them to fool people.”

Moreover, as the Hao article in the MIT Technology Review points out, Google’s then head of AI, well-known scientist, Jeff Dean, claimed that the research behind the article “didn’t meet our bar” and “ignored too much relevant research.” Specifically, he said it didn’t mention more recent work on how to make large language models more energy efficient and mitigate problems of bias. Maybe they didn’t know?

Robot parrot in photorealistic style by Losey.

Criticisms of the Stochastic Parrot Article

The main article I found criticizing the Stochastic Parrot also has a weird name: “The Slodderwetenschap (Sloppy Science) of Stochastic Parrots – A Plea for Science to NOT take the Route Advocated by Gebru and Bender” (2021). The author, Michael Lissack, challenges the ethical “woke” stance of the original “Parrot Paper,” and suggests a reevaluation of the argumentation. It should be noted that Gebru has accused Lissack of stalking her and colleagues. See: Claire Goforth, Men in tech are harassing Black female computer scientist after her Google ouster (Daily Dot, 2/5/21) (Michael Lissack has tweeted about Timnit Gebru thousands of times). By the way, “slodderwetenschap” is Dutch for slop-science.

Here are the three criticisms that Lissack makes of Stochastic Parrots:

What is missing in the Parrot Paper are three critical elements: 1) acknowledgment that it is a position paper/advocacy piece rather than research, 2) explicit articulation of the critical presuppositions, and 3) explicit consideration of cost/benefit trade-offs rather than a mere recitation of potential “harms” as if benefits did not matter. To leave out these three elements is not good practice for either science or research.
Lissack, The Slodderwetenschap (Sloppy Science) of Stochastic Parrots, abstract.

Others have spoken in favor of Lissack’s criticisms of the Stochastic Parrots, including most notably, Pedro Domingos. Supra, Men in Tech (includes collection of Domingos’ tweets).

Lawyerly pirate with parrot eating cracker. Watercolor by Losey.

It should also be noted that Lissack’s article includes several positive comments about the Stochastic Parrots work:

The very topic of the Parrot Paper is an ethics question: does the current focus on “language models” of an ever-increasing size in the AI/NLP community need a grounding against potential questions of harm, unintended consequences, and “is bigger really better?” The authors thereby raise important issues that the community itself might use as a basis for self-examination. To the extent that the authors of the Parrot Paper succeed in getting the community to pay more attention to these issues, they will be performing a public service. . . .

The Parrot Paper correctly identifies an “elephant in the room” for the MI/ML/AI/NLP community: the very basis by which these large language models are created and implemented can be seen as multilayer neural network-based black boxes – the input is observable, the programming algorithm readable, the output observable, but HOW the algorithm inside that black box produces the output is no articulable in terms humans can comprehend. [10] What we know is some form of “it works.” The Parrot Paper authors prompt readers to examine what is meant by “it works.” Again, a valuable public service is being performed by surfacing that question. . . .

Most importantly, in my view, the Parrot Paper authors remind readers that potential harm lies in both the careless use/abuse of these language models and in the manner by which the outputs of those models are presented to and perceived by the general public. They quote Prabhu and Birhane echoing Ruha Benjamin: “Feeding AI systems on the world’s beauty, ugliness, and cruelty, but expecting it to reflect only the beauty is a fantasy.” [PP lines 565-567, 11, 12] The danger they cite is quite real. When “users” are unaware of the limitations of the models and their outputs, it is all too easy to confuse seeming coherence and exactness for verisimilitude. Indeed, Dr. Gebru first came to public attention highlighting similar dangers with respect to facial recognition software (a danger which remains, unfortunately, with us [13, 14].
The Slodderwetenschap (Sloppy Science) of Stochastic Parrots at pages 2-3.

Lissack’s main objection appears to be the argumentative nature of what the article presents as science, and the many subjective opinions underlying the Parrot article. He argues that the paper itself is “ethically flawed.”

White robot parrot in photorealistic style by Losey.

Talking Stochastic Parrots Have No Understanding

Artificial intelligences like ChatGPT4 may sound like they know what they are talking about, but they don’t. There is no understanding at all in the human sense; it is all just probability calculations of coherent speech. No self awareness, no sense of space and time, no feelings, no senses (yet) and no intuition – just math.

It is important to make a clear distinction between human cognitive processes, which are deeply linked and arise out of bodily experiences and the external world, and computational models that lack a real world, experiential basis. As lawyers we must recognize the limits of mere machine tools. We cannot over-delegate to them just because they sound good, especially when acting as legal counselors, judges, and mediators. See e.g. Yann Lecun and Browning, AI And The Limits Of Language (Noema, 8/23/22) (“An artificial intelligence system trained on words and sentences alone will never approximate human understanding.”); Valmeekam, et al, On the Planning Abilities of Large Language Models (arXiv, 2/13/23) (poor at planning capabilities); Dissociating language and thought in large language models (arXiv, 3/23/24) (poor at functional competence tasks).

Getting back to the metaphor, a parrot may not understand the words it speaks, but they at least have some self awareness and consciousness. An AI has none. As one thoughtful Canadian writer put it:

Though the output of a chatbot may appear meaningful, that meaning exists solely in the mind of the human who reads or hears that output, and not in the artificial mind that stitched the words together. If the AI Industrial Complex deploys “counterfeit people” who pass as real people, we shouldn’t expect peace and love and understanding. When a chatbot tries to convince us that it really cares about our faulty new microwave or about the time we are waiting on hold for answers, we should not be fooled.
Bart Hawkins Kreps, Beware of WEIRD Stochastic Parrots (Resilience, 2/15/24).

Robot parrot eating cracker. Watercolor by Losey.

For interesting background, see The New Yorker article of 11/15/2023, by Angie Wang, Is My Toddler a Stochastic Parrot? Also see: Scientific research article on the lack of diversity in internet model training, Which Humans? by Mohammad Atari, et al. (arXiv, 9/23/23) (“Technical reports often compare LLMs’ outputs with “human” performance on various tests. Here, we ask, “Which humans?”“).

I also suggest you look at the often cited technical blog post by the great contemporary mathematician, Stephen Wolfram What Is ChatGPT Doing … and Why Does It Work?. As Wolfram states in the conclusion ChatGPT is “just saying things that “sound right” based on what things “sounded like” in its training material.” Yes, it sounds good, but nobody’s home, no real meaning. That is ultimately why the fears of AI replacing human employment are way overblown. It is also why LLM based plagiarism is usually easy to recognize, especially by experts in the field under discussion. The Chatbot writing is obvious by its style over substance language, which is high on fluff and stereotypical language, and overuse of certain “tell” words. More on this in my next blog on how to spot stochastic parrots.

Personally, I’m already sick of the bland, low meaning, fluffy content news and analysis writing now flooding the internet, including legal writing. It is almost as bad as ChatGPT writing for political propaganda and sales. It is not only biased, and riddled with errors, it is mediocre and boring.

Parrot Pirate and his pet parrot. Watercolor style by Losey.

Conclusion

Everyone agrees that LLM AIs will, if left unchecked, reproduce biases and inaccuracies contained in the original training data. This inevitably leads to the generation of false information – to skewed output to prompts – and that in turn can lead to poor human decisions made in reliance on biased output. This can be disastrous in sensitive applications like law and medicine.

Everyone also agrees that this problem requires AI software manufacturers to model designs to curb these biases, and to monitor and test to ensure the effectiveness and trustworthiness of LLMs.

The disagreement seems to be in evaluation of the severity of the problem, and the priority that should it be given to its mitigation. There is also disagreement as to the degree of success made to date in correcting this problem, and whether the problem can even be fixed at all.

Pirate eating cracker with parrot on book. Watercolor by Losey.

My view is that these issues can be significantly reduced, but I doubt that LLMs will ever be perfect and entirely free of all bias, even though they may become better than the average human. See e.g. New Study Shows AIs are Genuinely Nicer than Most People – ‘More Human Than Human’.

Moreover, I believe that users of LLMs, especially lawyers, judges and other legal professionals, can be sensitized to these bias issues. They can learn to recognize previously unconscious bias in the data and in themselves. The sensitivity to the bias issues can then help AI users to recognize and overcome these challenges. They can realize when the responses given by an AI are wrong and must be corrected.

The language of a ChatGPT may correctly echo what most people in the past said, but that does not, in itself, make it the right answer for today. As lawyers we need the true, correct and bias free answers, the just and fair answers, not the most popular answers of the past. We have an ethical duty of competence to double check the mindless speech of our stochastic parrots. We should question why Polly always wants a cracker?