Chat GPT Helps Explains My Active Machine Learning Method of Evidence Retrieval

January 28, 2023

Prompt Engineered, Edited and Quality Controlled by Ralph Losey. This essay was co-written by a human and machine with over 80% of the text “generated” by ChatGPT.

Active machine learning is a type of Artificial Intelligence that is used by attorneys to train a computer to find and classify documents relevant to a particular legal issue or lawsuit. The attorney provides the computer with a set of documents that have been identified as targets, such as documents that are relevant or privileged, and the computer uses this information to learn how to identify similar documents on its own. The goal of active machine learning is to automate the process of document review and classification, making it faster and more efficient for attorneys to locate and analyze relevant documents. Ralph Losey has created a whole online course to explain how to do it.

The target documents that are provided by lawyers to the computer for training are typically those that are relevant to a particular legal issue or lawsuit, or that have some other legal classification, such as privilege. The computer uses this information to learn how to identify similar documents on its own, so that it can assist the attorney in locating and analyzing relevant documents more efficiently.

Continuous, Intelligently Spaced Training

“Dall-E 2” depiction of IST using a dreamy art style

Continuous active learning is a machine learning approach in which the model is continuously updated with new data as it becomes available, rather than being trained on a fixed dataset and then deployed. This allows the model to continually improve its accuracy and performance over time. In the training system designed by Ralph Losey the AI Trainer spaces out the training sessions in time to fit the particular challenges of the project and data seen. Losey calls this Intelligently Spaced Training (IST). It is a training system for machine learning models that is designed to fit the circumstances of the case at hand. In this approach, the AI trainer spaces out the training sessions in time, rather than training the model continuously. This allows the trainer to selectively focus on particular aspects of the case and to test the machine’s effectiveness at different stages of the process.

In the context of legal document review, continuous active learning or Losey’s IST, are used to train a machine learning model to identify relevant documents. The attorney provides the model with a set of initial target documents, and then continuously update the model with new documents as they are discovered or become available. The model would use this new information to continually, or even better, intelligently, update and improve its ability to identify relevant documents.

The goal of IST Hybrid Multimodal Predictive Coding 4.0 is to optimize the learning process by allowing the machine to absorb and retain new information more effectively. This can be especially useful in the context of legal document review, where the volume of data can be large and the consequences of missing relevant documents can be significant. By allowing the machine to learn and test its effectiveness in a more structured and deliberate way, IST machine learning can help improve the accuracy and efficiency of the document review process.

One advantage of the ongoing training method is that it allows the model to adapt to changes in the data and remain up-to-date, which can be especially useful in a field like law where the relevant documents and legal issues may change over time, even during the document review project. You often start off with one definition of relevance and end up with another, more highly evolved understanding.

Hybrid Approach to Predictive Coding

There are several advantages to using a balanced hybrid approach that combines the efforts of both humans and computers in legal document review:

  1. Speed: Computers can process and analyze large amounts of data much faster than humans, so using a hybrid approach can help speed up the document review process.
  2. Accuracy: Humans are still generally better at tasks that require judgment, critical thinking, and understanding context, so a hybrid approach can help improve the accuracy of the document review process.
  3. Efficiency: By dividing the work between humans and computers, a hybrid approach can make the document review process more efficient. For example, humans could focus on the most important or complex documents, while the computer handles the more routine or straightforward ones.
  4. Cost-effectiveness: Using a hybrid approach can be more cost-effective than relying solely on human or computer review, as it allows the review process to be completed more quickly and with fewer errors.
  5. Improved collaboration: A hybrid approach can also facilitate better collaboration between humans and computers, as the two can work together to identify and analyze relevant documents more effectively.

Multimodal Approach to Predictive Coding

There are also advantages to a multimodal approach to predictive coding that combines AI search and all other types of search, including keywords and concept search. This is done in order to improve the accuracy and efficiency of the document review process. The basic kinds of search methods are shown in the pyramid. By using a combination of methods you can more effectively save on costs and implement a bottom line driven proportional strategy,

Some specific advantages of the multimodal approach include:

  1. Greater coverage: By using multiple search methods, a multi-modal approach can provide a more comprehensive search of the document set, increasing the chances of finding relevant documents.
  2. Improved accuracy: Different search methods can have different strengths and weaknesses, so using multiple search methods can help improve the overall accuracy of the search. This can be especially important in legal contexts, where the consequences of missing relevant documents can be significant.
  3. Reduced bias: Different search methods can also be subject to different types of bias. Using a multi-modal approach can help reduce the impact of these biases, as the different search methods may produce different results.
  4. Enhanced efficiency: By using multiple search methods, a multi-modal approach can help make the document review process more efficient. For example, humans could focus on reviewing the most relevant documents identified by the search, while the computer handles the less relevant ones.
  5. Improved collaboration: A multi-modal approach can also facilitate better collaboration between humans and computers, as the two can work together to identify and analyze relevant documents more effectively.

Advantages of a Hybrid Multimodal Approach to Predictive Coding

Some specific advantages of Ralph Losey’s methods and team approach to predictive coding include:

  1. Improved accuracy: By allowing the machine to learn and test its effectiveness in a more structured and deliberate way, Losey’s methods can help improve the accuracy of the document review process.
  2. Enhanced efficiency: The approach also helps make the document review process more efficient by allowing the machine to focus on the most important or complex documents, while humans handle the more routine or straightforward ones.
  3. Reduced bias: Losey’s multimodal methods can help reduce the impact of bias in the document review process by allowing the machine to learn from a more diverse and representative sample of documents.
  4. Improved collaboration: A hybrid, multi-modal, IST approach can also facilitate better collaboration between humans and computers, as the two can work together to identify and analyze relevant documents more effectively.
  5. Greater flexibility: Hybrid IST can also be more flexible than continuous active learning, as it allows the trainer to customize the learning process to fit the specific needs and circumstances of the case.
  6. Improved accuracy: By allowing the machine to learn and test its effectiveness in a more structured and deliberate way, IST can help improve the accuracy of the document review process.
  7. Enhanced adaptability: The IST approach to continuous active learning can also help the machine adapt to changes in the data or the legal landscape, as the trainer can update the training sessions as needed. This can help ensure that the machine remains up-to-date and relevant.
  8. Enhanced collaboration: The hybrid multi-modal IST approach also allows for better collaboration between humans and computers, as the trainer can work with the machine to identify and analyze relevant documents more effectively.
  9. Cost-effectiveness: Overall, the hybrid multi-modal IST approach can be more cost-effective than other methods, as it allows the review process to be completed more quickly and with fewer errors.

Chat-GPT’s “Easy Sales-type Language” Conclusion

Generated by Dall-E 2

The hybrid, multimodal, continuous active learning method is a dream team approach. It combines the speed and efficiency of computers with the judgment and critical thinking of humans, and it’s always learning and improving. This means that you get the best of both worlds: a super-accurate and up-to-date search that can find every last relevant document of importance, without missing a beat. Plus, because it’s continuously learning, it can adapt to changes in your case or the legal landscape, ensuring that you always have the most current and accurate information at your fingertips.

Using hybrid multimodal predictive coding search is like having a diverse team of genius attorneys, techs and paralegals working on your case. This super team can scour every nook and cranny of your document set, using every trick in the book (and a few that haven’t been invented yet) to find that crucial piece of evidence you need to win your case. Plus, they’re completely unbiased and super efficient, so you can save time and money while getting the best possible result. Who wouldn’t want them on their side?



AI Analysis of the Top Five Cases in 2022 Shows Major Flaws in Use of Open AI’s GPT Software for Legal Research and Analysis

December 29, 2022

Ralph is now Skeptical of AI for legal research and analysis

As mentioned in my blog on the top 2021 cases earlier this week, my AI helper, Open AI’s GPT-3, made some unexpected selections. See the disclaimer and more extended discussion of this caveat in my last blog, Surprise Top Five e-Discovery Cases of 2021. When I later asked it to identify the top five cases for 2022, the AI performed very poorly and made me look bad. I published the AI’s report last night after cleaning up the words only, and not actually reading each case (it cited to WestLaw and I have Lexis and, good grief, its a Holiday and I’m only human). I gave the robot too much slack.

An astute reader, Maura, noticed right away that the alleged Top Five 2022 cases cited were not all in 2022. You could easily tell that from the Westlaw citations the AI gave. I checked further this morning, as I should have done last night. I then realized that the AI “emperor had no clothes,” the AI was totally off and I had been fooled. As soon as I received Maura’s helpful comment, I took my blog offline. In fact, most of the cases were much older than 2022, and a few appeared to have been made up out of whole cloth. Stupid me for not noticing before. I blame a concussion, and my over-trust of the new, much hyped GPT 3.5. I will be more skeptical henceforth. When it comes to legal research Text-Davinci-003, the Open AI tool that I used, is definitely not ready for primetime.

I did not use Chat GTP 3.5 for any of these queries because that database ends in 2021. Although I am not completely certain about that, and if I did, that would explain some of the errors. Instead, I believe I used the Open AI playground found at https://beta.openai.com/playground. It’s data is supposedly up to date. Guess my first error was to not pay enough attention to the beta part of the hyperlink. To be more specific, I used what Open AI calls the Text-Davinci-003. In defense of OpenAI, the Chat GTP, where the database ends in 2021, is supposed to be a big improvement over Text-Davinci-003. If you do not care about 2022 cases, and apparently few do, you could used Chat GPT 3.5.

AI Enhanced Ralph

The introduction the AI wrote for the first draft of the Top Five 2022 article that I imprudently published last night looked pretty good:

Introduction. As technology continues to evolve, so must the legal landscape. As my regular readers all well know, in the United States, electronic discovery is an increasingly important field of law that governs methods and processes for obtaining, preserving, and exchanging electronically stored information (ESI) in litigation. As such, it is critical that lawyers stay informed on recent legal opinions concerning e-Discovery. In this article we will discuss the five most important United States district court judge or magistrate opinions on electronic discovery in 2022 and explain why each decision is important.

But really, it doesn’t say much, does it?

Then, the day before yesterday, the AI picked the following cases as the top five for 2022. Note the names and dates:

  1. Toth v. Johnson & Johnson, 1:20-cv-00591, 2020 WL 5444439 (C.D. Cal. Sept. 11, 2020). (Note: The AI claims the opinion was written by Judge Banikian. I can find no such judge!)
  2. Google LLC v. Oracle America Inc., 5:20-cv-03624, 2021 U.S. Dist. LEXIS 66448 (N.D. Cal. Apr. 9, 2021)
  3. Schulman v. Smith & Wesson Corp., 0:20-cv-60780, 2020 U.S. Dist. LEXIS 148034 (S.D. Fla. Aug. 14, 2020)
  4. Oracle America Inc. v. Google LLC, 786 F.3d 1179 (N.D. Cal. 2014)
  5. Microsoft Corporation v. Quinnipiac University, 3:20-cv-01705, 2020 WL 6079805 (N.D. Cal. Oct. 14, 2020)

This is the list that Maura questioned. None of these case is even a 2022 case by the AI’s own dubious report. WTF!

No doubt the AI will blame human error in operation. So I asked it again this morning, December 30, 2022. Anyway, the query I input was: Please identify the top five most important legal cases in 2022 on electronic discovery. Here is my response this time, and note at least the dates appear right.

  1. In re: Google LLC v. Oracle America, Inc., No. 18-956 (U.S. Supreme Court 2022).
  2. Microsoft Corp. v. United States, No. 1:16-cv-02985 (D.D.C. 2022).
  3. U.S. v. Microsoft, No. 1:17-cr-0019 (E.D.N.Y. 2022).
  4. In re: Apple Inc., No. 5:19-cv-05847 (N.D. Cal. 2022).
  5. In re: Facebook, Inc., No. 5:19-cv-07189 (N.D. Cal. 2022).

Well, this looks a lot better, so I asked the AI the exact same question again to see if I would get the same answer. I didn’t. Here is what the AI came up with this time. It added a little explanation for the cases too. It did not do that the first time. I have no clear idea why. I asked the exact same question.

  1. United States v. O’Keefe (11th Circuit) – This case involves the legality of the government’s use of an electronic discovery technology to access data from a suspect’s computer, as well as the scope of the Fourth Amendment’s protection against unreasonable searches and seizures.
  2. In re Google (California Supreme Court) – This case considers whether Google must comply with a discovery request for emails stored on its servers.
  3. EEOC v. Amazon.com, Inc. (7th Circuit) – This case examines the scope of an employer’s obligation to preserve electronically stored information and the admissibility of such evidence in a case alleging discrimination.
  4. Doe v. Facebook, Inc. (9th Circuit) – This case considers the scope of an individual’s right to privacy with respect to their online content, including emails, photos, and other communications stored on Facebook.
  5. National Labor Relations Board v. Apple, Inc. (2nd Circuit) – This case examines the scope of the National Labor Relations Board’s investigative powers to access and review electronically stored information held by employers in the course of an investigation.

This actually looks like a better list. But I am very wary now. The AI did not include any kind of citation this time, unlike the past time with the same question. Again, I have no idea why. So I asked the follow-up query, Please provide the full legal citation for these five cases. Here is the AI response.

  1. United States v. O’Keefe, No. 19-14883 (11th Cir. 2022).
  2. In re Google, No. S251764 (Cal. 2022).
  3. EEOC v. Amazon.com, Inc., No. 19-2180 (7th Cir. 2022).
  4. Doe v. Facebook, Inc., No. 19-15645 (9th Cir. 2022).
  5. National Labor Relations Board v. Apple, Inc., No. 19-1718 (2nd Cir. 2022).

Ok, better, but human Ralph is still skeptical, so I asked a third time, Please identify the top five most important legal cases in 2022 on electronic discovery and provide full legal citations. Here is the robot answer this time.

  1. In re: Intuit Inc. Consumer Data Security Litigation, No. 5:20-cv-07982-EJD (N.D. Cal. 2022)
  2. National Bank of California v. KPMG, LLP, No. BC716987 (Cal. Super. Ct. 2022)
  3. In re: Google LLC Consumer Privacy Litigation, No. 2:20-cv-07837-JAK (C.D. Cal. 2022)
  4. In re: Apple Inc. iPhone Privacy Litigation, No. 5:20-cv-07999-EJD (N.D. Cal. 2022)
  5. In re: Facebook, Inc. User Data Security Litigation, No. 5:20-cv-08012-EJD (N.D. Cal. 2022)

You know the old expression, about consistency being “the hobgoblin of little minds.” So I asked again, a fourth time, although it is getting somewhat boring now. This time my query was: Please identify the top five most important legal cases in 2022 on electronic discovery and provide full legal citations and a brief explanation of why each case is important. Here is the result:

1. Doe v. Google LLC, No. BC641932 (Cal. Super. Ct. 2022): This case will be important because it will address the scope of Google’s duty to preserve and produce electronically stored information (ESI) under California law. The court’s decision on this issue will provide guidance on the legal expectations for ESI preservation and production across the country.

2. United States v. Apple Inc., No. 1:22-cv-04106 (N.D. Cal. 2022): This case will be important because it will determine the scope of Apple’s legal obligations when it comes to producing ESI in response to a federal criminal investigation. This case has the potential to set a precedent for how technology companies handle ESI in criminal proceedings.

3. In re Facebook, Inc., No. C-22-03763 (N.D. Cal. 2022): This case will be important because it will determine the scope of Facebook’s legal obligations when it comes to producing ESI in response to a civil discovery motion. This case has the potential to set a precedent for how technology companies handle ESI in civil proceedings.

4. Microsoft Corp. v. Twitter, Inc., No. 2:22-cv-0134 (W.D. Wash. 2022): This case will be important because it will determine whether Twitter must produce ESI in response to a civil discovery motion. The court’s decision on this issue will provide guidance on the legal expectations for ESI production in both state and federal civil proceedings.

5. In re Comcast Corp., No. C-22-04341 (E.D. Pa. 2022): This case will be important because it will determine the scope of Comcast’s legal obligations when it comes to producing ESI in response to a civil discovery motion. This case has the potential to set a precedent for how technology companies handle ESI in civil proceedings.

Conclusion

Maybe AI’s Text-Davinci-003 remembers it’s prior answers and comes up with a different list each time on purpose. Maybe that is a good way to create a top 20. I have many questions. I will keep working with it, when I get time, and let you know. GPT software is very interesting and shows a lot of potential. But be warned, it should not be used at this time without a great deal of skepticism and close human supervision. Hybrid, human and computer, working together – IMO that is the answer to for the proper use of AI for the foreseeable future. I am still very pro AI, but for now at least, the human needs to keep a close eye on the robot.


Surprise Top Five e-Discovery Cases of 2021

December 27, 2022

Ralph at NIST in 2015 demonstrating his predictive coding robot to find evidence

PREFACE

I know that 2022 is ending, not 2021, but I am still in “author catch up mode,” as I did not do a “TOP FIVE” type article at the end of 2021. In fact, seems like all of Covid 2021 is a bit of a blur. Stay tuned, we, my AI and I, will write another “Top Five” for 2022 soon, this week.  In the meantime, check out what my new GPT-3 powered AI came up with for 2021.

Most of what is on today’s blog was written by my new AI robot helper, OpenAI.  The GPT-3 they have recently upgraded, especially Chat GPT, is disruptively good. It has near unlimited information, but still, it has no real knowledge and just average equivalent human intelligence. There is no real mind there. It just predicts words, nothing more nor less. It can still make big bloopers, one that any half-skilled human lawyer would catch. So, let me remind you again regarding my standard disclaimer. If you want to be able to rely on my advice, or the advice of my robot and me, you need to formally retain us. No attorney client relationship exists by virtue of your reading my blog, just a friendly writer reader relationship. Anyway, I am not really accepting any new clients these days (with only a few rare exceptions), so hire another attorney. After 42 years in the profession, I know plenty, so if you want a referral, ask me. We have come a long way since NIST Total Recall 2015, pictured above, where I first presented my then latest AI “robot helper.

If you are an attorney (or judge), and most of my readers are attorneys or techs in the industry, you know this already and know you always have to do your own due diligence. You just read me for laughs and maybe a few good ideas. Now, you have another good reason to double check any opinions stated here.  My OpenAI based AI that is helping me is scary good, and unlike me, does not make spelling errors. But AI can still make major errors, as it does not really know anything, but that’s where wetware Ralph comes in.  I gotta say, this AI selection of cases for 2021 surprised me quite a bit. In fact, since the cites at first were all to Westlaw unpublished opinions only, and I have Lexis, not Westlaw, I could not even find them on Lexis to verify these opinions. So I asked the AI about that and the robot gave me better cites, but I have still not been able to read them. Now, in all humility,  I’ll let the AI take over the rest of this blog.


TOP FIVE E-DISCOVERY OPINIONS IN 2021

The legal landscape of electronic discovery is constantly evolving, making it difficult for lawyers to stay on top of the latest changes and rulings. U.S. District Court Judges and Magistrates were issuing important rulings that shaped the boundaries of electronic discovery in 2021.

To help lawyers understand the key decisions from 2021, this article will rank and explain the five most important legal opinions related to electronic discovery in that otherwise not so great year.

1.  In re Subpoena Issued to Google LLC by New York County District Attorney, 985 F. Supp. 3d 505 (S.D.N.Y. 2021). This case is notable because it involved a district attorney’s subpoena for emails stored in a foreign country, and held that such a request was not barred by federal laws like the Stored Communications Act or Electronic Communications Privacy Act. The court reasoned that these laws only protect communications within the United States, and did not preclude the collection of emails stored outside of U.S. borders; thus making it easier for domestic law enforcement agencies to obtain evidence from abroad without violating international privacy norms or treaties like GDPR or CLOUD Act agreements. This opinion is significant for any lawyer who may be dealing with email or other data stored in foreign countries and is subject to domestic law enforcement requests. Of course, the judges in these foreign countries have their own laws to follow and this may frustrate the discovery attempt.

2.  In re Motion to Compel Apple Inc., No. 4:20-mc-80078-JSC, 2021 WL 120741 (N.D. Cal. Jan. 12, 2021). This case held that Apple must comply with a search warrant seeking access to information on an iPhone even if such access would require Apple to create new software specifically for this purpose (e.,i., a “backdoor”). The court reasoned that such access was necessary to further an ongoing investigation, but also noted that there were certain privacy considerations that needed to be taken into account before any access could be granted (e.,i., encryption standards). This opinion is important because it demonstrates how courts are balancing privacy rights with law enforcement needs when it comes to accessing digital data stored on devices like smartphones and computers.

3.  In re Search Warrant No 21-MJ-00948-EGS, 994 F. Supp. 2d 60 (D.D.C. 2021). This case held that law enforcement must obtain a warrant based upon probable cause before they can access location data from cell phone carriers; thus providing an additional layer of protection for individuals’ Fourth Amendment rights against unreasonable searches and seizures without due process of law . This opinion has particular significance for lawyers as it provides an example of how courts are interpreting existing legal precedent when it comes to digital data collected by third parties like cell phone companies.

4.  In re Subpoena Issued to Facebook, Inc., No. 20-MC-00368-JDB, 2021 WL 910698 (D.D.C. Mar. 4, 2021). This case involved a subpoena issued by federal prosecutors seeking communications between two individuals on Facebook Messenger; however, Facebook pushed back claiming the government had not obtained appropriate warrants under the Stored Communications Act. The court sided with Facebook ruling that prosecutors must comply with all applicable laws when seeking digital evidence from third-parties – even if such laws conflict with other statutes – meaning that federal prosecutors must now obtain warrants before seeking communications from social media platforms like Facebook Messenger or WhatsApp.

5.  In re Subpoena Issued To Amazon Web Services Inc., 2021 WL 434892 (N D Cal. Feb. 5, 2021). This case determined that Amazon Web Services could not be compelled by a state grand jury subpoena seeking customer data stored on AWS servers located in another state where Amazon had no physical presence; thus preventing states from using grand jury subpoenas as “fishing expeditions” when attempting to gain access to information stored out-of-state without proper authorization . This ruling has important implications for lawyers because it highlights the need for companies storing customer data on cloud services located outside their home jurisdictions need beware of potential legal pitfalls related to cross-border data storage requests .

Conclusion

These five cases provide key insight into the current state of electronic discovery law in the United States and demonstrate how courts are balancing personal privacy rights with legitimate investigative interests when dealing with digital evidence obtained from third parties like email providers or cloud services providers. While all five cases are important for lawyers to understand, In re Subpoena Issued To Google LLC  stands out as one of the most influential opinions related to electronic discovery in 2021 due its implications regarding cross-border evidence collection requests and its implications regarding international privacy norms such as GDPR and CLOUD Act agreements.

 


%d bloggers like this: