Ironically, LARGE Language Models (LLMs) Have Only SMALL Language Memories; Way Too Little for Case Analysis.
After just a 12,288 word input, about 40 pages of double-spaced text (25 words per page), ChatGPT-4, which is able to use 16,384 tokens, equal to about 12,288 words, forgets everything you told it before that. Total amnesia. Yup, it goes blank, forgets the question even. “What were we chatting about?” It just remembers the last 12,288 or so words of input, including its responses. ChatGPT-3.5, which can only use 4,096 tokens, is even worse. Its amnesia is triggered after just 3,072 words. My dog has a better memory, especially if I say a keyword like “squirrel” or “chicken.”
Too Small a Memory For Most Serious Legal Analysis
After many months of investigation and testing I must reluctantly conclude that ChatGPT has too small a memory to be of much use to judges, lawyer and complex case litigators, at least for purposes of assisting in full-case legal analysis. Serious legal case analysis, where a skilled attorney or judge could use the help of artificial intelligence, involve much, much larger word pattern sets. Think of a typical pleading set, or set of counter-motions, memorandums and exhibits. They typically involve hundreds of thousands of words and hundreds of pages of documents, at least.
I ran a test recently on a relatively small case in SDNY considering cross-motions for summary judgment. The motions were based on stipulated, uncontested facts. Most cases involve substantial issues of contested facts. The case had some complex issues, most federal cases do, but it was not an especially difficult case. It was an indemnity claim under an insurance policy. The analysis only required the judge to study the uncontested facts, apply the law to the facts and reach a judgment ruling on the motions. Hey, easy right? Not so fast.
I downloaded everything from the court file related to the cross-motions. After a lengthy, tedious process of review and cutting the fat and eliminating all documents not strictly necessary to decide the issues, I put everything left into a Word document. The properties showed that there were about 120,000 words and it was 550 pages long (using double spaced lines and 12 point Times New Roman font). With the 12,288 word, 12 page memory limit, Chat GPT-4 was useless. I tried it for a while anyway with multiple small size inputs, just to be sure, and to see for myself how it would respond. It went blank over and over again. It would even apologize occasionally. Aside form ChatGPT-4’s politeness, it was the worst associate I have ever had!
Even with GPT-4, the AI forgets everything beyond the 12 pages of input. That’s simply too small a size limitation for any serious legal case work. It is a big improvement over 3.5 to be sure. ChatGPT-3.5’s context length increased from 4,096 tokens in 3.5, which is about 3,072 words, to 32,768 tokens with GPT-4. Still, as the experiment with a stipulated facts case in SDNY shows, it is not helpful to me for case analysis.
Open AI is Trying to Fix the Amnesia Problem
A new version of ChatGPT-4, which is called ChatGPT-4-32K, has already been released for limited public testing. I have been on the waiting list since it started in March 2023. I filled out the Open AI questionnaire to plead my case for why I need it to test out legal analysis. I am still waiting and hope to get it soon. It will cost twice as much to use, but is still a trivial cost. I want to try it out mainly because I would like to see how it works in other areas. I already know that it’s doubled size will still be far too small for legal case analysis. Well maybe not too small for traffic tickets, but too small for the kind of disputes you would need a human lawyer to handle.
ChatGPT-4-32K has that name because the input size limit has been increased to a maximum token count of 32,768. That is 2^15 (2 to the power of 15), which is 32,768. That equals to about 24,5756 words. Regular old GPT-4 is 2^14, which, as mentioned before, is 16,384 tokens, equal to about 12,288 words. ChatGPT-3.5 was 2^12, or 4,096 tokens and 3,072 words. By the way, here are the simple formulas provided by Open AI for these tokens to word-count estimates, based no doubt on statistical analysis.
- 1 token ~= 4 chars in English
- 1 token ~= ¾ words
- 100 tokens ~= 75 words
Thus, the forthcoming ChatGPT-4-32k will only double the maximum token count of 32,768, which is only about 24,576 words or 98 pages, double spaced. Most pleading sets with exhibits and motions with memorandums are still much longer than that; much less a whole case. For instance, the cross-motions for summary judgment case based on stipulated facts that I studied was 120,000 words. That will be over five times the expanded capacity of GPT-4-32K.
From what I have read there are currently serious computational problems with doubling the token count again to the 16th power and 65,536 tokens. If they could, the 49,152 words that could be used for questions, especially with some summation tricks to condense and truncate input and cram in a higher word count, including possibly other more complex programming, like recursive summaries. That might make the next level of 65,536 tokens big enough for simple federal cases or arbitration. It would certainly make ChatGPT-5, which is what people think the next version of Open AI’s GPT will be called, into a serious contender in legal analysis. Not nearly as good as our top human associates, but at least not a complete loser either like current ChatGPT-4, who can not even remember the assignment a tenth of the way through the project.
Right now the GPT amnesia problem is a real deal killer for use of LLMs for anything but simple legal analysis. The jobs of litigators and judges are safe, for now. The memory problem is in addition to previously discussed limitations of time cut-off, confidentiality, hallucinations and mistakes. In my experience, these can be handled by normal quality control routines of skilled attorneys and other measures. Human associates have the same problems sometimes. We can easily spot the bull. But we cannot cut down the word count. Cases are just too damn complicated.
Even if detailed case analysis is out of the picture for now, lawyers and judges can still use LLMs like ChatGPT-4 in many other ways. AI can still help lawyers in their never ending pursuit of justice. I like very much its abilities as a new writing tool and illustrator, and simple argument maker. (Note – I did not use AI to help write this particular blog, and when I do use AI, that fact is disclosed.) Plus, we all know GPT it is a far better poet than most lawyers, and who knows, maybe that will become the next big thing in closing arguments. Ok, so, having said that, I will use ChatGPT-4 to summarize the essence of this article in a poem, and will direct the length and tone of the poem.
In the realm of words and tokens confined,
An AI named ChatGPT, trapped in its mind,
Twelve pages, its limit, the space to recall,
For cases and trials, inadequate, small.
From GPT-3.5, an improvement, indeed,
Yet far from enough for the legal work’s need,
Awaiting the new, ChatGPT-4-32K,
To test and to challenge in different ways.
In March, on the waiting list, I stood,
Pleading my case, as any lawyer would,
The cost twice as much, but a trifle to bear,
For larger capacity, legal analysis to share.
Still far from the mark, the AI’s progression,
Not enough for disputes needing human discretion,
To double the tokens, the dream of the day,
For simple federal cases, a contender, they say.
Computational hurdles, the sixteenth power,
To reach 65,536 tokens, the challenge to tower,
With tricks of summation, condensed and concise,
A serious player, ChatGPT-5, entice.
Amnesia, a deal killer, for now it remains,
Safe are the jobs of litigators and judges, no chains,
Yet with progress and breakthroughs, AI could blend,
The roles of the lawyer, the future we’ll mend.
For now, in the shadows, ChatGPT-4 does dwell,
In memory’s prison, confined to its cell,
The future uncertain, as the AI shall grow,
From amnesia to contender, the seeds we do sow.