Predictive analytics has progressed to the point that Corporate Counsel could, given the right tools and knowledge, predict and prevent many of the law suits now hemorrhaging corporate America. Insurance companies could do the same thing, predict what claims will likely trigger litigation and take steps to avoid these costly disputes. It is all a matter of knowing how to obtain and use Smart Data to serve as an early warning system — Smart Data that will reveal emerging patterns of wrongful conduct before they ripen into litigation. I call this data analytics based program of litigation avoidance, Presuit™.
Essentially I am talking about the use of predictive coding type AI technologies to take corporate compliance to the next level. When Presuit™ gets off the ground, litigation will never be the same. It will not stop all litigation, but, once predictive analytics catches on, there will certainly be far less litigation than there is now. My orientation as an attorney is about litigation, but this use of technology could also be understood as a type compliance auditing: Continuous Controls Monitoring (CCM). Gartner defines CCM as “a set of technologies to reduce business losses through continuous monitoring and reducing the cost of audits through continuous auditing of the controls in financial and other transactional applications.” This would extend the audits to unstructured data, such as emails, texts and other internal corporate communications.
Presuit™ is not a Minority Report science fiction dream. This is all within the reach of existing software and probability analysis based search methods. Even though it has not been done before, based on my experience with active machine learning, I am sure it can be done. In fact, I have a pretty good idea of how to do Presuit™, which is why I’ve trademarked it and begun services. See Presuit.com. And I am not the only one. I know several other attorneys who probably could too. We see the idea. So too do a few information scientists I’ve talked to about it. The scientists and vendors all agree it can work. They are probably getting ready to offer services like these too, I don’t know. What we all need to make it work is a visionary general counsel and enough time, money, and scientific-technical support to set it up and perfect workable systems.
Presuit™ is the next logical step to the application of artificial intelligence to Big Data and the Law. It is where technology is inevitably taking the Law. Yes, it will take significant funds to implement such a program, and it will take time. But the potential savings and other benefits of prediction and prevention of law suits are mind-boggling.
History is ripe for the making and the time is now. The big question is which corporation will come forward first to make it happen, who will be the first to embrace Smart Data to drastically reduce their litigation load. Then the related big questions for my friends and colleagues will be which lawyers, scientists and vendors will be tapped to help them to do it.
What is Smart Data?
Smart Data generally is information that has been enhanced by predictive analytics. (For what others outside of the legal profession have to say about Smart Data see this CDNet webcast, and also see here, here, here, here, and here.) Most commentators agree that the whole point of Big Data is to obtain Smart Data. In that sense Smart Data is the small, useable Signal in the Noise. There are all kinds of Smart Data, but for purposes of predicting litigation, Smart Data is data that has been probability ranked as relevant or irrelevant to different legal causes of action. It consists of documents and other ESI commonly associated with various types of illegal activity.
This kind of Legal Smart Data reveals patterns of emerging wrongful conduct. It is information that can be used to detect and prevent illegal activities. To be specific, a computer file, such as an email or attachment, becomes smart for these purposes when it has had an extra layer of metadata added to it that reliably ranks the probable relevance of that file to one or more legal issues. Thus, an email would be smart in the sense it knows it has a 90% probability of being evidence of a certain issue, such as age discrimination, or consumer fraud. Data is not legal smart if it has no probability ranking related to misconduct of some kind.
Predictive Coding Example From Current e-Discovery Practice
As an example, when I complete an active machine learning training of a set of corporate email, all of the email and attachments have an extra metadata field ranking the probable relevance, or irrelevance, to a set of one or more legal issues. (The final protections review and productions are made based on these probability metadata.) As per my custom, where appropriate there would also be probability rankings for high-relevance and privilege.
As the subject matter expert my work in training the system transformed the email into Smart Data, data that knows its probability of relevance. The data is as smart, as good, as I was as the SME on the subject, or, as is often the case, as I was as the surrogate for another SME. In other words, the degree of intelligence of the data would depend in part on the degree of expertise of the SME. Was the SME a world authority on these issues, or just the best available at the time?
The data is also as smart, as good, as my work was in the training. In other words, the degree of intelligence of the data would depend in part on the quality of the human training. Was the trainer qualified, experienced? Did they spend 20 hours to train the machine or 200?
The data is also as smart, as good, as the software that improved upon my work. In other words, the degree of intelligence of the data would depend in part on the quality of the software. The machine — the algorithms — enhance the natural intelligence put into the data by the human SME, in this example, me, with the machine’s own artificial intelligence. Was it the best available software? Did it perform well on this data?
Degree of subject matter expertise, human skill in training, and computer skill in analysis – all three factors impact the quality of any predictive coding project. That in turn impacts the effectiveness of the resultant Smart Legal Data to train new data as it enters the system. So too would other elements, such as the diversity or distinctiveness of the prior or additional data.
Smart Legal Data that results from every predictive coding project could then be reused to train new data as it is later created and added to the client’s system. The Smart Legal Data from a prior case would have to be modified somewhat to eliminate insignificant distinctiveness, and culled so as to only include key probability ranges, but this will be fairly easy.
The modified Smart Legal Data could then be used as training data to rank new email and attachments as to probable relevance. At this point the SME would just play a secondary and occasional role for quality control purposes. The training could proceed automatically in an unsupervised, or at least semi-supervised, manner. (That part will be tricky to set up.) When patterns emerge suggesting a new cause of action may be developing, alerts are sounded. Legal counsel is advised. Presuit™ has averted yet another law suit.
The ultimate in litigation readiness is to eliminate suits before they happen, to know about them in the presuit stage. This is now possible in an ongoing virtuous feedback machine training loop that takes Smart Data from old suits to train data in corporate ESI systems.
Once an SME trains data in the course of a lawsuit, and makes it smart, the SME’s mind has, to a certain extent, imprinted itself on the data. In our current legal practice when an old suit is over we flush that intelligence away. The critical documents, the Smart Legal Data, are not saved for reuse.
That is a huge waste of SME intelligence, effort, and money. Instead, the key components of the Smart Data should be saved, and improved upon, nurtured, and grown. The AI enhanced Smart Data should be added back to the corporate ESI systems. The Smart Data should be used again, and again, to train the rest of the data. It should be used as an indicator to detect the same and similar legal issues. For example, once you have created Smart Data on fraud, keep it in the system to detect more fraud. Literally, it takes a thief’s data to catch a thief’s data. I am quite sure this will work, if done carefully and with input from the right scientists and technologists in the area. This will not an easy undertaking, but that it is possible at all is incredible. It is a profound example of what MIT’s Brynjolfsson and McAfee write about in their new book, The Second Machine Age.
Litigation Avoidance By Early Detection of Relevant Data
You can call this an aspect of Information Governance, if you want, and some are, or I predict, soon will. But I call Presuit™ an aspect of Litigation Readiness (the first step in the EDBP). It is the complete avoidance of litigation by identifying and correcting wrongful action before it festers into a law suit. Nip it in the bud. For example, it could be used to detect sexual harassment in the workplace as soon as it begins to manifest in emails or texts. Bring the employees to H.R. for counseling before a charge is filed. Think of the savings; not only monetary, but mental and emotional. Presuit™ will change everything.
Employee litigation could be drastically reduced by using Smart Data analytics, but that is just the beginning. The same could be applied to almost all litigation that now plagues business and government. For a few examples consider government false claims act cases, fraud, trade secret theft, patent infringement, conspiracy, foreign bribery. Any wrongful activity carried out by an employee in an enterprise, including outright illegal activities, leaves traces in the company’s information system. Under a Presuit™ program computer systems are enhanced with Smart Data to identify these patterns and alert management to investigate.
A level one investigation may show that the computer alert was a false alarm, in which case the computer anyway learns from the pattern detection error, and no harm is done to the employee. Conversely, if the pattern was correct and confirmed by further investigation, then corrective action can be voluntarily taken. The misconduct can be stopped before it gets out of hand. Employees can be counseled and retrained. Where appropriate they can be disciplined or fired, or in extreme cases, reported to police.
This is all far better than our current system where in-house counsel often does not know of a problem until a suit is filed, or a government subpoena is served. Believe me, problems are easier to solve, and far less expensive, when you do not have the help of plaintiff’s counsel.
National Security Agency Example
I am not going to go off on a political tangent here, but for purposes of quieting the naysayers out there who may think these proposals are farfetched, consider the example of the National Security Agency. The NSA essentially tries to take the Big Data of all world communications to find the Smart Data revealing patterns of terrorist activities. The keyword search the NSA does of metadata is just the rough culling step of search (step 6 in the EDBP). The NSA keyword search does not create Smart Data, but does create a more manageable size for data for the NSA to deal with, and does find high value targets in the smaller universe. The Smart Data is created in the next step where the predictive analytics algorithms are run on the communications of interest (step 7, C.A.R.).
When, in the case of the NSA, that Smart Data shows probable terrorist activities, then the more direct surveillance and corrective action steps are triggered; warrants are issued and human agents move into action. That is precrime in action to try to stop terrorism.
The same multimodal search process that the NSA uses is already available to corporate counsel for Presuit™. Smart Data systems can be set up to alert in-house counsel to all type of potentially litigation triggering activities. The predictive analytics software may not be as sophisticated as the NSA’s, but it does work very well, especially with the much smaller volumes of data that most corporations have to deal with (compared to the NSA).
For corporate counsel, the Smart Data comes when data ranked by expert input and an ongoing process of machine learning shows probable undesired conduct by a corporate employee. It could be discrimination, or fraud, or any number of corporate torts or breach of statutory or contractual duty.
When these signals are detected from the noise of corporate information, then more direct investigation by the legal department is triggered. Corrective and remedial actions are taken where appropriate. The misconduct is stopped. Damages are mitigated, if not avoided entirely. Most importantly, the expense and harassment of litigation has been avoided. Your computer doors have not been opened to fishing by plaintiff’s counsel.
Presuit™ is a great tool for corporate good citizenship. The predictive analytics from recycling Smart Data will empower legal counsel to identify and correct employee misconduct before new litigation ensues. When widely adopted, businesses will save hundreds of billions of dollars a year in wasted litigation costs. Weeding out misconduct early will also significantly improve employee morale. It will free a company to act in accord with its ideals, and its workers to operate at peak efficiency and productivity.
Still, machine learning for legal compliance is a new tool, and Smart Data has never before been used to identify potential litigation. What companies will be the first adopters of Presuit™? You would expect it to be the serial litigants, the ones with the greatest litigation expenses to save. Perhaps an insurer? But innovation rarely proceeds in a rational manner and often takes surprising routes. Time will tell.