Visualizing Data in a Predictive Coding Project – Part Three

Ralph at Niagara FallsThis is part three of my presentation of an idea for visualization of data in a predictive coding project. Please read part one and part two first. This concluding blog in the visualization series also serves as a stand alone lesson on the basics of math, sampling, probability, prevalence, recall and precision. It will summarize some of my current thoughts on quality control and quality assurances in large scale document reviews. Bottom line, there is far more to quality control than doing the math, but still, sampling and metric analysis are helpful. So too is creative visualization of the whole process.

Law, Science and Technology

Team_TriangleThis is the area in which scientists on e-discovery teams excel. I recommend that every law firm, corporate and vendor e-discovery team have at least one scientist to help them. Technologists alone are not sufficient. Discovery teams know this, and all have engineers working with lawyers, but very few yet have scientists working with engineers and lawyers. They are like two-legged stools.

Also, and this seems obvious, you need search sophisticated lawyers on e-discovery teams too. I am starting to see this error more and more lately, especially in vendors. Engineers may think they know the law, that is very common, but they are wrong. The same delusional thinking sometimes even affects scientists. Both engineers and scientists tend to over-simplify the law and do not really understand legal discovery. They do not understand the larger context and overall processes and policies.

John Tredennick

John Tredennick

For legal search to be done properly, it must not only include lawyers, the lawyers must lead. Ideally, a lawyer will be in charge, not in a domineering way (my way or the highway), but in a cooperative multi-disciplinary team sort of way. That is one of the strong points I see at Catalyst. Their team includes tons of engineers/technologists, like any vendor, but also scientists, and lawyers. Plus, and here is the key part, the CEO is an experienced search lawyer. That means not only a law degree, but years of legal experience as a practicing attorney doing discovery and trials. A fully multidisciplinary team with an experienced search lawyer as leader is, in my opinion, the ideal e-discovery team. Not only for vendors, but for corporate e-discovery teams, and, of course, law firms.

ralph_wrongMany disagree with me on this, as many laymen and non-practicing attorneys resent my law-first orientation. Technologists are now often in charge, especially on vendor teams. In my experience these technologists do not properly respect the complexity of legal knowledge and process. They often bad mouth lawyers and law firms behind their back. Their products and services suffer as a result. It is a recipe for disaster.

On many vendor teams, the lawyers are not part of the leadership, if they are on a team, it is low level and they are not respected. This is all wrong because the purpose of e-discovery teams is the search for evidence in a legal context, typically a law suit. There is only one leg of the stool that has ever studied evidence.

It takes all three disciplines for top quality legal search: scientists, technologists and lawyers. If you cannot afford a full-time scientists, then you should at least hire one as a consultant on the harder cases.

The scientists on a team may not like the kind of simplification I will present here on sampling, prevalence and recall. They typically want to go into far greater depth and provide multiple caveats on math and probability, which is fine, but it is important to start with a foundation of basics. This is what you will find here. The basics of math and probabilities, and applications of these principles from a lawyer’s point of view, not a scientist’s or engineer’s.

Professor Gordon CormackStill, the explanations here are informed by the input of several outstanding scientists. A special shout out and thanks goes to Gordon Cormack. He has been very generous with his time and patient with my incessant questions. Professor Cormack has been a preeminent voice in Information Science and search for decades now, well before he started teaming with Maura Grossman to study predictive coding. I appreciate his assistance, and, of course, any errors and oversimplifications are solely my own.

Now let’s move onto the math part you have been waiting for, and begin by revisiting the hypothetical we set out in parts one and two of this visualization series.

 Calculating and Visualizing Prevalence

Recall we have exactly 1,000,000 documents remaining for predictive coding after culling. I previously explained that this particular project began with culling and multimodal judgmental sampling, and with a random sample of 1,534 documents. Please note this is not intended to refer to all projects. This is just an example to have data flows set up for visualization purposes. If you want to see my standard work flows see LegalSearchScience.com and Electronic Discovery Best Practices, EDBP.com, on the Predictive Coding page. You will see, for instance, that another activity is always recommended, especially near the beginning of a project, namely Relevancy Dialogues (step 1).

data-visual_RANDOM_ONEAssuming a 95% confidence level, a sample of 1,534 documents creates a confidence interval of 2.5%. This means your sample is subject to a 2.5% error rate in both directions, high and low, for a total error range of 5%. This is 5% of the total One Million documents corpus (50,000 documents), not just 5% of the 1,534 sample (77 documents).

In our hypothetical the SME, who had substantial help from a top contract reviewer, studied the 1,534 sampled documents. The SME found that 384 were relevant and 1,150 were irrelevant. By the way, when done properly this review of 1,534 documents should only cost between $1,000 to $2,000, with most of that going to the SME expense, not the contract reviewer expense.

The spot projection of prevalence here is 25%. This is simple division. Divide the 384 relevant by the total population: 384/1,534. You get 25%. That means that one out of four of the documents sampled was found to be relevant. Random sampling tells us that this same ratio should apply, at least roughly, on the larger population.  You could at this point simply project the sample percentage from the sample onto the entire document population. You would thus conclude that approximately 250,000 documents will likely be relevant. But this kind of projection alone is nearly meaningless in lower prevalence situations, which are common in legal search. It is also of questionable value in this hypothetical where there is a relatively high prevalence of 25%.data-visual_RANDOM_TWO

When doing probability analysis based on sampling you must always include both the confidence level, here 95%, and the confidence interval, here 2.5%. The Confidence Level means that 5 times out of 100 the projection will be in error. More specifically, the Confidence Level means that if you were to repeat the sampling 100 times, the resulting Confidence Interval (here 2.5%) would contain the true value (here 250,000 relevant documents) at least 95% of the time. Conversely, this means that it would miss the true value at most 5% of the time.

In our hypothetical the true value is 250,000 relevant documents. On one sample you might get a Confidence Interval of 225,000 – 275,000, as we did here. But with another sample you might get 215,000 – 265,000. On another you might get 240,000 – 290,000.  These all include the true value. Occasionally (but no more than 5 times in a hundred), you might get a Confidence Interval like 190,000 – 240,000, or 260,000 – 310,000, that excludes the true value. That is what a 95% Confidence Level means.

The confidence interval range is simply calculated here by adding 2.5% to the 25%, and subtracting 2.5% to the 25%. This creates a percentage range of from between 22.5% to 27.5%. When you project this confidence interval unto the entire document collection you get a range of relevant documents of from between 225,000 (22.5%*1,000,000) and 275,000 (27.5%*1,000,000).

This simple calculation, called a Classical or Gaussian Estimation, works well in high prevalence situations, but in situations where the prevalence is low, say 3% or less, and even in this hypothetical where the prevalence is a relatively high 25%, the accuracy of the projected range can be improved by adjusting the 22.5% to 27.5% confidence interval range. The adjustment is performed by using what is called a Binomial calculation, instead of the Normal or Gaussian calculation. Ask a scientist for particulars on this, not me. I just know to use a standard Binomial Confidence Interval Calculator to determine the range in most legal search projects. For some immediate guidance, see the definitions of Binomial Estimation and Classical or Gaussian Estimation in The Grossman-Cormack Glossary of Technology Assisted Review.

data-visual_RANDOM_2With the Binomial Calculator you again enter the samples as a fraction with the numerator being the relevant documents, and the denominator the total number of documents sampled. Again, this is just like before, you divide 384 by 1,534. The basic answer is also the same 25%, the point or spot projection ratio, but the range with a Binomial Calculator is now slightly different. Instead of a simple plus or minus 2.5%, that produces 22.5%-27.5%, the binomial calculation creates a tighter range of from between 22.9% to 27.3%. The range in this hypothetical is thus a little tighter than 5%. It is a range here is 4.4% (from between 22.9 to 27.3%). Therefore the projected range of relevant documents using the Binomial interval calculation is from between 229,000 (22.9%*1,000,000) and 273,000 (27.3%*1,000,000) documents.

bell-curve-Standard_deviation_diagramThe 1,534 simple random sample of 1,000,000 document collection shows that 95 times out of 100 the correct number of relevant documents will be between 229,000 and 273,000.

This also means that no more than five times out of 100 will the calculated interval, here between 22.9% and 27.3%, fail to capture the true value, the true number of relevant documents in the collection. Sometimes the true value, the true number of relevant documents, may be less than 229,000 or greater than 273,000. This is shown in part by the graphic below, which is another visualization that I like to use to help me to visualize what is happening in a predictive coding project. Here the true value lies somewhere between the 229,ooo and 273,000, or at least 95 times out of 100 it does. When 5 times out of 100 the true value lies outside the range, the divergence is usually small. Most of the time, when the confidence interval misses the true value, it is a near miss. Cases where the confidence interval is far below, or far above, the true value is exceedingly rare.

Corpus_data_recall

The Binomial adjustment to the interval calculation is required for low prevalence populations. For instance, if the prevalence was only 2%, and the interval was again 2.5%, the error range would create a negative number, -.5% (2%-2.5%). It would be from between -.5% and 4.5%. That projection means from between -0- relevant documents to 45,000. (Obviously you can not have negative relevant documents.) The zero relevant documents is also known to be wrong because you could not have performed the calculation unless there were some relevant documents in the sample. So in this situation of low prevalence the Binomial calculation method is required to produce anything close to accurate projections.

For example, assuming again a 1,000,000 corpus, and a 95%+/-2.5% sample consisting of 1,534 documents, a 2% prevalence results from finding 31 relevant documents. Using the binomial calculator you get a range of from between 1.4% to 2.9%, instead of  between -.5% to 4.5%. The binomial based interval range results in a projection of between 14,000 relevant documents (instead of the absurd zero relevant documents) to 29,000 relevant documents.

Even with the binomial calculation adjustment, the reliability of using probability projections to calculate prevalence is the subject of much controversy among information scientists and probability statisticians (most good information scientists doing search are also probability statisticians, but not visa versa). The reliability of such range projections is controversial in situations like this, where the sample size is low, here only 1,534 documents, and the likely percentage of relevant documents is also low, here only 2%. In this second scenario where only 31 relevant documents were found in the sample, there are too few relevant documents for sampling to be as reliable as it is in higher prevalence collections. I still think you should do it. It does provide good information. But you should not rely completely on these calculations, especially when it comes to the step of trying to calculate recall. You should use all of the quality control procedures you know, including the others listed previously.

Calculating Recall Using Prevalence

Search Quadrant - standard in information scienceRecall is another percentage that represents the proportion between the total number of relevant documents in a collection, and the number of these relevant documents that have been found. So, if you happen to know that there are 10 relevant documents in a collection of 100 documents, and you correctly identify 9 relevant documents, then you have attained a 90% recall level. Referring to the hopefully familiar Search Quadrant shown right, this means that you would have one False Negative and nine True Positives. If you only found one out of the ten, you would have 10% recall (and would likely be fired for negligence). This would be nine False Positives and one True Positive.

The calculation of Precision requires information on the total number of False Positives. In the first example where you found nine of the ten relevant, if you also found nine more that you thought were relevant, but were not, they were False Positives, then what would your precision be? You have found a total of 18 documents that you thought were relevant, and it turns out that only half of them, 9 documents, were actually relevant. That means you had a precision rate of 50%. Simple. Precision could also easily be visualized by various kinds of standard graphs. I suggest that this be added to all search and review software. It is important to see, but, IMO, when it comes to legal search, the focus should be on Recall, not Precision.

gold_standard_MYTHThe problem with calculating Recall in legal search is that you never know the total number of relevant documents, that is the whole point of the search. If you knew, you would not have to search. But in fact no one ever knows. Moreover, in large document collections, there is no way to ever exactly know the total number of relevant documents. All you can ever do is calculate probable ranges. You might think that absolute knowledge could come from human review of all One Million documents in our hypothetical. But that would be wrong because humans make too many mistakes, especially with legal judgments as fluid as relevancy determinations. So too do computers, dependent as they are to the training by all too fallible humans.

Bottom line, we can never know for sure how many relevant documents are in the 1,000,000 collection, and so we can never know with certainty what our Recall rate is. But we can make an very educated guess, one that is almost certainly correct when a range of Recall percentages are used, instead of just one particular number. We can narrow down the grey area. All experienced lawyers are familiar conceptually with this problem. The law is made in a process similar to this. It arise case by case out of large grey areas of uncertainty.

The reliability of our sample based Recall guess decreases as prevalence lowers. It is a problem inherent to all random sampling. It is not unique to legal evidence search. What is unique to legal search is the importance of Recall to begin with. In many other types of search Recall is not that important. Google is the prime example of this. You do not need to find all websites with relevant information, just the more useful, generally the most popular web pages. Law is moving away from Recall focus, but slowly. And it is more of a move right now from Recall of simple relevance to Recall of the highly relevant. In that sense legal search will in the long run become more like mainstream Googlesque search. But for now the law is still obsessed with finding all of the evidence in the perhaps mistaken belief that justice requires the whole truth. But I digress.

In our initial hypothetical of a 25% prevalence, the accuracy of the recall guess is actually very high, subject primarily to the 95% confidence level limitation. Even in the lower 2% hypothetical, the recall calculation has value. Indeed, it is the basis of much scientific research concerning things like rare diseases and rare species. Again, we enter a hotly debated area of science that is beyond my expertise (although not my interest).

data-visual_Round_5Getting back to our example where we have a 95% confidence level that there are between 229,000 and 273,000 relevant documents in the 1,000,000 document collection – as described before in part one of this series, we assume that after only four rounds of machine training we have reached a point in the project where we are not seeing a significant increase in relevant documents from one round of machine training to the next. The change in document probability ranking has slowed and the visualization of the ranking distribution looks something like this upside down champagne glass shown right.

At this point a count shows that we have now found 250,000 relevant documents. This is critical information that I have not shared in the first two blogs, information that for the first time allows for a Recall calculation. I held back this information until now for simplicity purposes, plus it allowed me to add a fun math test. (Well, the winner of the test, John Tredennick, CEO of Catalyst, thought it was fun.) In reality you would keep a running count of relevant documents found, and you would have a series of Recall visualizations. Still, the critical Recall calculation takes place when you have decided to stop the review and test.

Recall-rangeAssuming we have found 250,000 relevant documents this means that we have attained anywhere from between 91.6% to 100% recall. At least it means we can have a 95% confidence level that we have attained a result somewhere in that range. Put another way, we can have a 95% confidence level that we have attained a 91.6% or higher recall rate. We cannot have 100% confidence in that result. Only 95%. That means that one time out of twenty (5% of the 95% confidence level) there may be more than 273,00 relevant documents. That in turn means that one time in twenty we may have attained less than a 91.6% recall in this circumstance.

bell-curve-Standard_deviation_diagram

The low side Recall calculation of 91.6% is derived by dividing the 250,000 found, by the high-end of the confidence interval, 273,000 documents. If the spot projection happens to be exactly right, which is rare, and in this hypo is now looking less and less likely (we have, after all, now found 250,000 relevant documents, or at least think we have), then the math would be 100% recall (250,000/250,000). That is extremely unlikely. Indeed, information scientists love to say that the only way to attain 100% recall is with 0% precision, that is, to select all documents. This statement is, among other things, a hyperbole intended to make the uncertainty point inherent in sampling and confidence levels. The 95% Confidence Level uncertainty is shown by the long tail on either side of the standard bell curve pictured above.

You can never have more than 100% recall, of course, so we do not say we have attained anywhere between 109% and 91.6% recall. The low-end estimate of 229,000 relevant documents has, at this point in the project, been shown to be wrong by the discovery and verification of 250,000 relevant documents. I say shown, not proven because of the previously mentioned liquidity of relevance and inaccuracy of humans of make consistent final judgments when, as here, vast numbers of documents are involved.

Thermometer_RecallFor a visualization of recall I like the image of a thermometer, like a fund-raising goal chart, but with a twist of two different measures. On the left side put the low-end measure, here the 2.29% confidence interval with 229,000 documents, and on the right side, the high measure, 2.73% confidence interval with 273,000 documents. You can thus chart your progress from the two perspectives at once, the low probability error rate, and the high probability error rate. This is shown on the diagram to the right. It shows the metrics of our hypothetical where we have found and confirmed 250,000 relevant documents. That just happens to represent 100% recall on the low-end of probability error range using the 2.29% confidence interval. But as explained before, the 250,000 relevant documents found also represents only 91.6% recall on the high-end using the 2.73% confidence interval. You will never really know which is accurate, except that it is safe to bet you have not in fact attained 100% recall.

Random Sample Quality Assurance Test

In any significant project, in addition to following the range of recall progress, I impose a quality assurance test at the end to look for False Negatives. Remember, this means relevant documents that have been miscoded as irrelevant. One way to do that is by running similarity searches and verification of syncing. That can catch situations involving documents that are known to be relevant. It is a way to be sure that all variations of those documents, including similar but different documents, are coded consistently. There may be reasons to call one variant relevant, and another irrelevant, but usually not. I like to put a special emphasis on this at the end, but it is only one of many quality tests and searches that a skilled searcher can and should run throughout any large review project. Visualizations could also be used to assist in this search.

But what about the False negatives that are not near duplicates or close cousins? The similarity and consistency searches will not find them. Of course you have been looking for these documents throughout the project, and at this point you think that you have found as many relevant documents as you can. You may not think you have found all relevant documents, total recall, no experienced searcher ever really believes that, but you should feel like you have found all highly relevant documents. You should have a well reasoned opinion that you have found all of the relevant documents needed to do justice. That opinion will be informed by legal principles of reasonability and proportionality.

data-visual_Round_5That opinion will also be informed by your experience in search though this document set. You will have seen for yourself that the probability rankings have divided the documents into to well defined segments, relevant and irrelevant. You will have seen that no documents, or very few, remain in the uncertainty area, the 40-60% range. You will have personally verified the machine’s predictions many times, such that you will have high confidence that the machine is properly implementing the SME’s relevance concept. You will have seen for yourself that few new relevant documents are found from one round of training to the next. You will also usually have seen that the new documents found are really just more of the same. That they are essentially cumulative in nature. All of these observations, plus the governing legal principles, go into the decision to stop the training and review, and move onto final confidentiality protection review, and then production and privilege logging.

Still, in spite of all such quality control measures, I like to add one more, one based again on random sampling. Again, I am looking for False Negatives, specifically any that are of a new and different kind of relevant document not seen before, or a document that would be considered highly relevant, even if of a type seen before. Remember, I will not have stopped the review in most projects (proportionality constraints aside), unless I was confident that I had already found all of those types of documents; already found all types of strong relevant documents, and already found all highly relevant document, even if they are cumulative. I want to find each and every instance of all hot (highly relevant) documents that exists in the entire collection. I will only stop (proportionality constraints aside) when I think the only relevant documents I have not recalled are of an unimportant, cumulative type; the merely relevant. The truth is, most documents found in e-discovery are of this type; they are merely relevant, and of little to no use to anybody except to find the strong relevant, new types of relevant evidence, or highly relevant evidence.

There are two types of random samples that I usually run for this final quality assurance test. I can sample the entire document set again, or I can limit my sample to the documents that will not be produced. In the hypothetical we have been working with, that would mean a sample of the 750,000 documents not identified as relevant. I do not do both samples, but rather one or another. But you could do both in a very large, relatively unconstrained budget project. That would provide more information. Typically in a low prevalence situation, where for instance there is only a 2% relevance shown from both the sample, and the ensuing search project, I would do my final quality assurance test with a sample of the entire document collection. Since I am looking for False Negatives, my goal is not frustrated by including the 2% of the collection already identified as relevant.

There are benefits from running a full sample again, as it allows direct comparisons with the first sample, and can even be combined with the first sample for some analysis. You can, for instance, run a full confusion matrix analysis as explained, for instance, in The Grossman-Cormack Glossary of Technology Assisted Review; also see Escape From Babel: The Grossman-Cormack Glossary.

CONFUSION MATRIX

Truly Non-Relevant Truly Relevant
Coded Non-Relevant True Negatives (“TN”) False Negatives (“FN”)
Coded Relevant False Positives (“FP”) True Positives (“TP”)

Accuracy = 100% – Error = (TP + TN) / (TP + TN + FP + FN)
Error = 100% – Accuracy = (FP + FN) / (TP + TN + FP + FN)
Elusion = 100% – Negative Predictive Value = FN / (FN + TN)
Fallout = False Positive Rate = 100% – True Negative Rate = FP / (FP + TN)
Negative Predictive Value = 100% – Elusion = TN / (TN + FN)
Precision = Positive Predictive Value = TP / (TP + FP)
Prevalence = Yield = Richness = (TP + FN) / (TP + TN + FP + FN)
Recall = True Positive Rate = 100% – False Negative Rate = TP / (TP + FN)

Special code and visualizations built into review software could make it far easier to run this kind of Confusion Matrix analysis. It is really far easier than it looks and should be user friendly automated. Software vendors should also offer basic instruction on this tool. Scientist members of an e-discovery team can help with this. Since the benefits of this kind of analysis outweigh the small loss of including the 2% already known to be relevant in the alternative low prevalence example, I typically go with a full random sample in low prevalence projects.

In our primary hypothetical we are not dealing with a low prevalence collection. It has a 25% rate. Here if I sampled the entire 1,000,000, I would in large part be wasting 25% of my sample. To me that detriment outweighs the benefits of bookend samples, but I know that some experts disagree. They love the classic confusion matrix analysis.

To complete this 25% prevalence visualization hypothetical, next assume that we take a simple random sample of the 750,000 documents only, which is sometimes called the null-set. This kind of sample is also sometimes called an Elusion test, as we are sampling the excluded documents to looks for relevant documents that have so far eluded us. We again sample 1,534 documents, again allowing us a 95% confidence level and confidence interval of plus or minus 2.5%.

Next assume in this hypothetical that we find that 1,519 documents have been correctly coded as irrelevant. (Note, most of the correct coding would come have come from machine prediction, not actual human review, but some would have been by actual prior human review.) These 1,519 documents are True Negatives. That is 99% accurate. But the SME review of the random sample did uncover 15 mistakes, 15 False Negatives. The SME decided that 15 documents out of the 1,534 sampled  had been incorrectly coded as irrelevant. That is a 01% error rate. That is pretty good, but not dispositive. What really matters is the nature of the relevancy of the 15 False Negatives. Were these important documents, or just more of the same?

I always use what is called an accept on zero error protocol for the elusion test when it comes to highly relevant documents. If any are highly relevant, then the quality assurance test automatically fails. In that case you must go back and search for more documents like the one that eluded you and must train the system some more. I have only had that happen once, and it was easy to see from the document found why it happened. It was a black swan type document. It used odd language. It qualified as a highly relevant under the rules we had developed, but just barely, and it was cumulative. Still, we tried to find more like it and ran another round of training. No more were found, but still we did a third sample of the null set just to be sure. The second time it passed.

In our hypothetical none of the 15 False Negative documents were highly relevant, not even close. None were of a new type of relevance. All were of a type seen before. Thus the test was passed.

The project then continued with the final confidential review, production and logging phases. Visualizations should be included in the software for these final phases as well, and I have several ideas, but this article is already far too long.

As I indicated in part one of this blog series, I am just giving away a few of my ideas here. For more information you will need to contact me for billable consultations, routed through my law firm, of course, and subject to my time availability with priority given to existing clients. Right now I am fully booked, but I may have time for these kind of interesting projects in a few months.

Conclusion

Ralph_FallsThe growth in general electronic discovery legal work (see EDBP for full description) has been exploding this year, so too has multidisciplinary e-discovery team work. It will, I predict, continue to grow very fast from this point forward. But the adoption of predictive coding software and predictive coding review has, to date, been an exception to this high growth trend. In fact, the adoption of predictive coding has been relatively slow. It is still only infrequently used, if at all, by most law firms, even in big cases. I spoke with many attorneys at the recent Georgetown Institute event who specialize in this field. They are all seeing the same thing and, like me, are shaking their heads in frustration and dismay.

I predict this will change too over the next two to three years. The big hindrances to the adoption of predictive coding are law firms and their general lack of knowledge and skills in predictive coding. Most law firms, both big and small, know very little about the basic methods of predictive coding. They know even less about the best practices. The ignorance is widespread among attorneys my age, and they are the ones in law firm leadership positions. The hinderance to widespread adoption of predictive coding is not lack of judicial approval. There is now plenty of case law. The hinderance is lack of knowledge and skills.

greedy

Greedy Lawyers

There is also a greed component involved for some, shall we say, less than client-centric law firms. We have to talk about this elephant in the room. Client’s already are. Some attorneys are quite satisfied with the status quo. They make a great deal of money from linear reviews, and so called advanced keyword search driven reviews. The days of paid inefficiency are numbered. Technology will eventually win out, even over fat cat lawyers. It always does.

The answers I see to the resistance issues to predictive coding are threefold:

Continued Education. We have to continue the efforts to demystify AI and active machine learning. We ned to move our instruction from theory to practice.

Improved Software. Some review software already has excellent machine training features. Some is just so-so, and some do not have this kind of document search and ranking capacity at all. My goal is to push the whole legal software industry to include active machine learning in most all of their options. Another goal is for software vendors to improve their software, and make it easier to work with by adding much more in the way of creative visualizations. That has been the main point of this series and I hope to see a response soon from the industry. Help me to push the industry. Demand these features in your review software. Look beyond the smokescreens and choose the true leaders in the field.

Client Demand. Pressure on reluctant law firms from the companies that pay the bills will have a far stronger impact than anything else.  I am talking about both corporate clients and insurers. They will, I predict, start pushing law firms into greater utilization of AI-enhanced document review. The corporate clients and insurers have the economic motivation for this change that most law firms lack. Corporate clients are also much more comfortable with the use of AI for Big Data search. That kind of pressure by clients on law firms will motivate e-discovery teams to learn the necessary skills. That will in turn motivate the software vendors to spend the money necessary to improve their software with better AI search and better visualizations.

All of the legal software on the market today, especially review software, could be improved by adding more visualizations and graphic display tools. Pictures really can be worth a thousand words. They can especially help to make advanced AI techniques more accessible and easier to understand. The data visualization ideas set forth in this series are just the tip of the iceberg of what can be done to improve existing software.

7 Responses to Visualizing Data in a Predictive Coding Project – Part Three

  1. […] this week, Ralph Losey published a blog post in which he also discusses the challenges in measuring recall. This is the last in a three-part […]

    Like

  2. […] an example, I go back to the hypothetical search project I described in Part Three of Visualizing Data in a Predictive Coding Project. This was a search of 1,000,000 documents […]

    Like

  3. BenRitz says:

    If predictive coding software was more visual, maybe it could be redesigned around gasification principles?

    http://www.wired.com/2014/12/how-to-build-habit-forming-products/?mbid=social_fb

    Like

  4. […] happened in a recent post by Ralph Losey at his e-Discovery Team blog. The topic of his post was visualizing data in a predictive coding project. He begins by discussing how an e-discovery team should be composed. It should include not just […]

    Like

  5. […] In addition, I suggest that you re-read my short description of an elusion test at the end of Part Three of Visualizing Data in a Predictive Coding Project. It provides a brief description of the other […]

    Like

  6. […] machines are not taking over. There is no substitute for informed and creative legal skill. Indeed, as noted by Ralph Losey, who is widely recognized as an authority on e-discovery and legal technology, it is a mistake to […]

    Like

  7. […] As a side note, the decision to stop may be somewhat informed by running estimates of possible recall range attained based on early prevalence assumptions from a sample of all documents at or near the beginning of the project. The prevalence based recall range estimate would not, however, be the sole driver of the decision to stop and test. The prevalence based recall estimates alone can be very unreliable as shown In Legal Search Exact Recall Can Never Be Known. That is one of the main reasons for developing the ei-Recall alternative. I explained the thinking behind the decision to stop in Visualizing Data in a Predictive Coding Project – Part Three. […]

    Like

%d bloggers like this: