Ironically, LARGE Language Models (LLMs), so far at least, have only SMALL language memories and input size, way too little for case analysis and other legal applications. OpenAI and other AI companies are well aware of the small input problem. This blog shares some of the solutions AI companies are coming up with to expand the input size window. These solutions do, however, pose new risks and problems of their own. Some of the tradeoffs involved with enlarging the input size of LLM GPTs will be discussed here too.
Defining the Input Size Problem
The e-Discovery Team has previously covered the problem of limited input size (a/k/a content size or context size) of LLMs, including ChatGPT-4. ChatGPT Has Severe Memory Limitations: Judges, Arbitrators and Commercial Litigation Lawyers, Your Jobs Are Safe, For Now.
As noted in that blog:
After just a 12,288 word input, about 40 pages of double-spaced text (25 words per page), ChatGPT-4, which is able to use 16,384 tokens, equal to about 12,288 words, forgets everything you told it before that. Total amnesia. Yup, it goes blank, forgets the question even. “What were we chatting about?” It just remembers the last 12,288 or so words of input, including its responses. ChatGPT-3.5, which can only use 4,096 tokens, is even worse. Its amnesia is triggered after just 3,072 words. . . .
A new version of ChatGPT-4, which is called ChatGPT-4-32K, has already been released for limited public testing. I have been on the waiting list since it started in March 2023. . . . The forthcoming ChatGPT-4-32k will only double the maximum token count of 32,768, which is only about 24,576 words or 98 pages, double spaced. Most pleading sets with exhibits and motions with memorandums are still much longer than that; much less a whole case. For instance, the cross-motions for summary judgment case based on stipulated facts that I studied was 120,000 words. That will be over five times the expanded capacity of GPT-4-32K.ChatGPT Has Severe Memory Limitations
Many developers are aware of this negative impact on the size limit and have been working on expansion modifications. For instance, on June 13, 2023, OpenAI release a new version of Chat-GPT-3.5 that has expanded the input size to that of its smarter and more talented younger brother, ChatGPT-4, namely 16K — 16,384 tokens to be exact. The new bigger memory GPT-3.5 has a new name and a new price. Here is OpenAI’s announcement:
OpenAI Product Announcement
gpt-3.5-turbo-16koffers 4 times the context length of
gpt-3.5-turboat twice the price: $0.003 per 1K input tokens and $0.004 per 1K output tokens. 16k context means the model can now support ~20 pages of text in a single request. (RCL Editors Note – that’s 20 pages of single spaced and 40 of double.)
The Solutions Of Various AI Software Developers
Other developers have made announcements that they have devised ways to make the input much larger. For a good technical blog on this see: The Secret Sauce behind 100K context window in LLMs: all tricks in one place by Galina Alperovich, who is a Lead ML Engineer at Soveren. She reports that a 65K token size input is claimed by MosaicML, MPT-7B-StoryWriter-65k+ and 100K token size is claimed by Antropic, Introducing 100K Context Windows. Alperovich notes that Google does not reveal the exact context size in its Palm-2 technical report, but it does say they “increase the context length of the model significantly.”
The announced expansions will be great news for all LLM based AIs, especially where legal applications are concerned. In our world of legal tech large databases are often required for useful reference. This is true in most AI applications. As Galina Alperovich so eloquently puts is:
Having a large context length allows an already powerful LLM (that saw the whole internet) to look at your context and data and interact with you on a completely different level with a higher personalization. And all these without changing the model’s weights and doing your “training” on the fly, “in memory.” And overall, a large context window brings more accuracy, fluency, and creativity to the model. (emphasis in original)Alperovich, The Secret Sauce
Galina’s Alperovich’s article, The Secret Sauce behind 100K context window in LLMs, describes in technical language the tricks that programmers like her use in these new models to squeeze more data into the input. I used Chat-GPT and various plugins to help me to write this, and as always, used Midjourney and Photoshop to help me create the images.
Summary of Alperovich’s Secret Sauce Article
The Secret Sauce article begins by explaining the concept of LLMs. In much more simple terms than used by Alperovich, LLMs are AI models that can generate human-like text. The best known LLM available now is ChatGPT. The LLM models can be used in a variety of applications, from drafting emails to writing code.
The article then delves into the main topic: how to speed up the use of LLMs and increase their context window to 100,000 tokens. The context window is the amount of text that the model can consider when generating a response. A larger context window allows the model to generate more coherent and relevant text. The nice round 100,000 figure, 100K, seems to be the Holy Grail sought by software developers. Frankly, since Secret Sauce is a very technical blog post, I was a little surprised to see the goal wasn’t 65,536 (216) or 131,072 (217). See eg. ChatGPT Has Severe Memory Limitations, (“ChatGPT-4-32K was 2^15 (2 to the power of 15), which is 32,768,”). I guess 100,000 is a good, round, halfway point, but, like many tech-lawyers, I find the lack of precision a bit two confusing. (Sorry AI readers, this is a tech Dad joke with intentional misspelling.)
Some Software Programmer “Tricks” Uses to Expand LLM Input
Galina describes several “tricks” (her words) that were used to expand the input size. These include:
Model Distillation: Trying summarization techniques and sophisticated chained prompts. This involves training a smaller model on the outputs of a larger one. The smaller model can handle larger context windows with less computational resources. (Reminds me of Russian dolls.)
Sparse Transformers: Maintaining vector databases to keep embeddings for custom documents and then “searching” across them by some similarity metric. According to ChatGPT, this technique involves a type of model architecture (vector databases) that can process longer sequences of text more efficiently. These mechanisms allow the model to pay attention to a subset of the input tokens (the similarity metric), thereby reducing the computational load.
Fine Tuning With Custom Data: This involves fine-tuning the LLM with custom data. As Galina notes, not all commercial LLMs allow that, and it is not an obvious task for open-source LLMs.
Custom LLMs: Developing custom smaller LLMs for particular data. This approach might be particularly useful for legal-tech applications such as discovery search and contract analysis and assembly.
Galina’s article concludes by emphasizing the importance of these optimizations for making the most of LLMs. They allow researchers and developers to use these powerful models more efficiently, opening up new possibilities for their application.
Although her article does not mention legal applications, my research and experience shows the larger input size 100,000,000 tokens, would make LLMs much more useful for lawyers, judges and arbitrators. Note, According to OpenAI, taking statistics of use in English, 1 token ~= 4 chars in English; 1 token ~= ¾ words; 100 tokens ~= 75 words. Therefor one hundred million tokens would open open LLM GPT analysis to databases of approximately 75 Million words. A million words on average equals about four thousand pages, double spaced, which would supposedly take It will take approximately 3,333 minutes, 56 hours, to read 1,000,000 words. Capaitalize My Title. So at 75 million words, you have 300,000 pages, which would take 4,200 hours to read. So, big enough for most legal application inputs.
Still for context, and this is an unfair Apples to Oranges comparison, 75 Million words is just a drop in the bucket to the full reference of law needed for legal research. For instance, LexiNexis claims “Our global legal and news database contains 144 billion documents and records with 1.2 million new legal documents added daily.” But, those very large sizes are not the input we are talking about here, that is the data upon which the LLM GPT models must train. But, it does suggest the legal search companies will need to build their own custom LLMs, not try to piggy back on OpenAI or LLM developers work.
The Trade Offs Risks Involved to Allow Larger Input Size
The Secret Sauce blog does not go into the tradeoffs involved in using shortcuts for AI model training. But I know there has to be some. There always is with shortcuts. I did some outside research and asked for help from various ChatGPT-4s of various flavors and found some key tradeoffs likely made to attain these expanded input sizes
- Efficiency vs. Accuracy: The use of shortcuts can significantly speed up the training process, but it may come at the cost of accuracy. The model might not fully understand the context of the information it’s processing, leading to potential inaccuracies in its outputs.
- Data Quality: The quality of the data used for training is crucial. If the data is not representative of the problem space or contains errors, the model’s performance can be negatively impacted. This is especially true when using shortcuts, as the model might not have the opportunity to learn from a diverse range of data.
- Model Complexity: Shortcuts can simplify the model, making it easier to train and deploy. However, this could limit the model’s ability to handle complex tasks or understand nuanced information.
- Maintenance: While shortcuts can speed up the initial training process, they might require more maintenance in the long run. For example, if a shortcut is based on a specific feature of the data, and that feature changes, the model might need to be retrained.
- Ethical Considerations: The use of shortcuts can also raise ethical considerations. For instance, if a model is trained to take shortcuts based on biased data, it could perpetuate those biases in its outputs.
It is good news that the input windows are quickly gaining size and this limitation will be mitigated soon, if not eliminated entirely. But beware, it is tricky to squeeze an elephant into a bottle. There are usually data integrity loss problems with data compression. We need to squeeze the elephant into the bottle without harming the poor creature.
It may take some times to get for software developers to get this right, especially where legal applications are concerned. ChatGPT’s analysis concluded with this good summary on the dangers of shortcuts:
While shortcuts can provide significant benefits in terms of efficiency and simplicity, they also present challenges that need to be carefully managed. It’s important to balance the need for speed and simplicity with the need for accuracy, robustness, and ethical considerations.ChatGPT-4 5/24/23 version
This again emphasizes the need for careful evaluation before you purchase expensive software. It also shows, once again, that human care, quality controls and verification will be needed when you rely on Ai in your legal practice. The miracles of generative Ai automation will continue to be a hybrid process where human lawyers have key supervisory and quality control roles to play. Ai is a tool, not a creature, and the responsibilities shall always remain on the lawyers to properly chose and use their tools. Blaming mistakes on Ai error is not a good excuse.
Copyright Ralph Losey 2023 – ALL RIGHTS RESERVED – (May also be Published on EDRM.net and JDSupra.com with permission.)