Ranking is a key feature of AI enhanced document review, perhaps the key feature. Ranking orders all documents in a collection according to likely relevance as you define it. You have to train the software on a few exemplar documents, but then as the training kicks in, the software ranks all of the documents in the collection in accordance with your instructions.
If your vendor pushes so-called predictive coding-type software that cannot rank by relevancy, then get a new vendor, get new software. If they say it ranks, and talk the right talk, which you should be able to tell by, for instance, reading the articles on the CAR page, then try it out. Verify for yourself. Then try out other vendors and their software, and see how their ranking system works. At least do the demos and ask hard questions. You will find that all software claiming to be predictive coding software is not the same. They are not all as good, especially in the key feature of ranking.
Ranking allows you to spend your valuable attorney review time wisely. It allows you to focus on the most relevant documents first, and to merely sample the lower ranked documents. Ranking is the fuel that drives all CARs (computer assisted reviews). Ranking also makes proportionality a doctrine that is palatable to both producing and receiving parties. It facilitates efficiency, and provides everyone the most bang for the buck, or to be precise, the most documents with the highest probative value for the buck.
This is a win/win scenario for both producing and receiving party. For example, the importance of a case may not warrant the expense of a review of all 1,000,000 potentially relevant documents, but could justify the expense to review the 50,000 documents most likely to have value to the case. The top 50,000 out of the 1,000,000 most likely relevant documents will probably give both parties all of the information they need to try the case.
Ranking is what keeps bottom line driven proportional review fair and equitable. See Bottom Line Driven Proportional Review (2013 Updated Version). That is why it is the key feature you should focus on when making a software licensing decision. Like any other complex technology product, some predictive software is better than others. Some relevance ranking systems work better than others.
Due primarily to empty suits in the C-Suite of most, but not all e-discovery companies, there are now significant differences in vendor quality. See eg. the LTN article Vendor CEOs: Stop Being Empty Suits & Embrace the Hacker Way and “The Hacker Way” – What the e-Discovery Industry Can Learn From Facebook’s Management Ethic, and Part One and Part Two of The Solution to Empty-Suits in the Board Room: The “Hacker Way” of Management. Also see: Monica Bay’s report on the 2013 keynote talk at ILTA by Scott Klososky, entitled Are You a Dead Leader Walking or Driving With Your High Beams? and VQ‘s, article on same ILTA 2013: Technology Darwinism. Bottom line: be careful when and where you shop. For more information on the differences see, for example, some of my articles on the subject:
- Reinventing the Wheel: My Discovery of Scientific Support for “Hybrid Multimodal” Search;
- The Many Types of Legal Search Software in the CAR Market Today;
- Three-Cylinder Multimodal Approach To Predictive Coding.
Also see the academic view of the subject of advanced analytics software in Doug Oard & William Webber’s text, Information Retrieval for E-Discovery, Foundations and Trends in Information Retrieval, Vol. 7, Nos. 2–3 (2013) 99–237. But beware of vendor papers of any color. They have too much skin in the game to be objective. Looking at cute quadrants by outside companies — where it seems like everyone is a winner — can also be interesting. But, for now at least, these so-called rankings are of only limited value in making purchasing decisions.
Old School Keyword Based Relevancy Ranking
Ranking is the key feature, but you do not just press a button for the predictive coding software to rank relevancy for you. If your vendor claims to have new software like that, run. They are full of it. But, you should know, or remember, that relevancy ranking in pre-predictive-coding type software did work like that. It was easy. In fact, there was nothing to old-school keyword based relevancy ranking, and that is what you got out of it, little or nothing.
Pre-analytics software would rank documents for relevancy according to the number of keyword hits in them. The documents that had the most keywords in them were ranked the highest. But did that make them the most relevant? Of course not! It just meant they had the most keywords.
Sometimes the documents with the most keywords would be very relevant, but more often, they would not. The documents with the highest ranking might not be relevant at all. They were usually just the longest documents, the ones with the most words, including, often by happenstance, the words the searcher considered key. How many times have you seen the hundred page spreadsheets ranked as the highest “relevant,” the spreadsheets with tens of thousands of names in them? The same applies to long reports, sometimes thousands of pages long. Those were the documents ranked the highest under old-fashioned keyword count ranking.
Ranking Based on Multidimensional Analysis
Relevancy based solely on keyword counts is just a coincidence. It is also quite likely that a document with only one keyword, or no keywords, and thus ranked last, is actually the hottest document in the collection. There was no way you could rely on that kind of arbitrary, pseudo-ranking to justify reduced review. But now with predictive coding, which uses a complex analytic system that looks at the entire document to rank documents, including metadata, and how every document relates to other documents, the relevancy ranking become real. It becomes testable. It can even be the basis for new types of ranking searches.
The use of complex multidimensional vectoring and probability math has made the ranking reliable. See Baron & Freeman, Guest Blog: Quick Peek at the Math Behind the Black Box of Predictive Coding. At least the ranking can be reliable if the software is trained properly by a bona-fide SME (subject matter expert), who is also an expert searcher and power-user of the software. All three skills are necessary, and right now very few lawyers qualify. That is why I am so focused on training as many AI enhanced search lawyers as possible, both here and in the U.K. (If you are in London on September 10, 2013, you are welcome to attend my morning presentation on Predictive Coding – How it’s Bringing Innovation to Legal Practice at the historic Lincoln’s Inn law society center.)
The strength of AI enhanced software search is also its weakness. Garbage in, garbage out – GIGO. It will rank documents in the collection in the way your intelligence tells it to. If your intelligence is off, for instance, if you do not know what documents are actually important to a case, or responsive to the request, then the artificial intelligence created by the software will also be bad. The software can correct some human errors to be sure, but still it basically leverages and amplifies the coding programmed into it by the SME. If the SME makes mistakes in the coding of the seed and training documents, then the ranking will be off. Bottom line, if your SME is not a particularly good expert on the case issues, then neither will your software results. Human skills and experience are still paramount, even in this new computer age.
Software alone is never the solution. It is just a tool to enhance the real power, which is the intelligence of the attorney using the tool. He or she is the Subject Matter Expert on the issues in the case. Ideally they are also experienced evidence searchers, and have mastered the software they are using to conduct the search. Either that, or they have direct contact and work closely with a surrogate searcher (yes, you heard it here first, and someday, in a future blog, I’ll explain it further) who has these skills. The surrogate searcher can implement the SME’s mental constructs of relevance.
Predictive coding software takes the attorneys view of relevance, ideally the best SME possible for the case, and imposes it on all of the documents. That is how ranking works, that is how predictive coding works. With good software, good methods, and a skilled SME, it can become treasure in, treasure out. Otherwise, its GIGO. No easy button, so stay away from the snake oil salesman who try to tell you anything different.
The software is worthless without good users – the SMEs – and good user methods, and good software. If you go cheap on any of these three elements, then the ranking may not work right. But if done correctly, AI enhanced ranking can save everyone money, both the producing parties, and, as shown previously, the receiving parties too. It can analyze all documents according to the SMEs conception to determine if they are Hot or Not. Then you can just look at the most likely hottest documents, and, with good reason, just ignore the rest.