“Save Everything” and Eventually You Will Not Be Able to Find Anything: The Sedona Conference Principles and Commentary on Defensible Disposition

August 13, 2018

If you are a data hoarder, an information pack-rat that saves everything, you will eventually drown in your own data and die. Maybe not literally killed, mind you, but figuratively. Maybe not you personally, but your enterprise, your group, your project, your network. Too much information can render you and your enterprise intellectually paralyzed, cut off and seriously misinformed or uninformed. Saving it all is physically and logistically difficult, if not possible. Even if you could, keeping it all would impede your search, making it hard to find the information you need, when you need it. I address these issues this week in my review of a new commentary by The Sedona Conference Principles and Commentary on Defensible Disposition (August 2018).

Information overload is better than physical death I know, but still very bad in today’s Google world. You end up not being able to find the information you need, when you need it. That makes it hard to determine what really happened. It allows lies and liars to fester and grow. We are now seeing firsthand in the U.S. where this can lead. It is not good. It has put the whole world into a precarious situation. We need the truth to thrive as a culture; not smoke and mirrors, not conman games. A culture built on lies is a cancer. It is a deadly disease, especially for the Law, which depends on truth, on evidence, on real facts, to attain the goal of Justice.

Saving Too Much

Over-retention is the enemy of effective, efficient search. The more ESI there is to search, the more difficult the search. There can be exceptions to this rule, but for the most part it is true. That makes a “save everything” ESI policy an enemy of search. It interferes with the ability to find the information needed, which in my case is electronic evidence in legal proceedings, when it is needed. It is important for these information needs be filled quickly and completely.

Search is powerful. That is my field. The more data the better, is often true, but not always. It depends on the data and its effective life, how long a particular type of data is of any use to anyone. Big data allows for detection of patterns that would otherwise not be seen. This analysis takes CPU power. The advances in this area have been fantastic. We have the processing power, as well as the cheap storage, but our search and retrieval software has not otherwise kept up with the data explosion in volume and complexity. Predictive coding software and other AI applications have come a long way, but are still sometimes confused by the volume, variety and complexity of useless data that plagues most company IT systems.

Retrieval of specific documents and metadata takes time and specialized human skills. The more worthless data in a collection, such as spam, the greater the number false positives in a search, no matter how powerful the algorithms or skilled the searcher. Vast volumes of data make searches longer to execute and less precise. The more noise in the data, the more difficult to hear the signal. That is a fundamental law of information.

With high data volumes you can often still find the signal, the relevant documents that you need in large chaotic data collections, but it takes time and special tools and skills. There are often too many false positives in searches of data collections containing too much spam-like, useless data. Although search is strong, search alone is inadequate to meet the needs of most organizations. They also need data destruction and retention policies that govern all information. That is one reason why the success of information governance depends on data disposition.

An organization should save as much as it needs, but not too much, and also not too little. It is a Goldilocks situation. If you do not save data, you can never find it. If you save too little, then what you later need might not be there to be found. But if you save too much, you may never be able to find what you need. The signal may be in the collection to be found, in plain view, but hidden in the vast numbers, the noise of spam and other irrelevancies.

Search v. Destroy

I have debated Information Governance leaders for years the importance of search versus file destruction. I was pretty much the only advocate for search over disposition. I favored retention over destruction in most close cases, but I had a cost and proportionality overlay. I am reminded, for instance, of my debate with Jason Baron on the subject at the IQPC 10th Anniversary of Information Governance and eDiscovery, where he managed to quote Churchill at the end and won the debate hands-down. e-Disco News, Knowledge and Humor: What’s Happening Today and Likely to Happen Tomorrow (e-Discovery Team, June 7, 2015); Information Governance v Search: The Battle Lines Are Redrawn (e-Discovery Team, Feb. 8, 2015).

I did not consider it a fair debate because of Jason’s very successful pandering to the jury during his closing argument with a quote by Churchill from his speech, We Shall Fight on the Beaches. That’s the one about never surrendering in the fight against “the odious apparatus of Nazi rule” (sadly, this exhortation still has legs today in the US).

The debate was “unfair” primarily because this was an IG conference. Everybody in IG is pro-destruction and values disposition over search. I think most IG leaders go too far, that they are trigger happy to kill data. I pointed out in my debates that once a file is deleted, it cannot be found, no matter how good your filing, no matter how good your search (forensic recovery issues aside).

I am pro-search and think that the importance of management of ESI by filing and disposition is somewhat overblown. I think search is king, not data deletion. Still, even in my most strident of debates and pro-search arguments, I never advocated for the retention of all data. I always assumed that some file disposition was required and accepted that as a given. I was not a save everything and search advocate. I advocated for both, search and destroy. I advocated for more retention than most, but have never argued to retain everything.

There is a common core of agreement that some ESI should be deleted, that all data should not be saved. The disagreement is on how much data to save. How does a person or company know what is the “just right” data destruction policy for that company? There is agreement among experts that there is no one-size-fits-all solution, so custom work is required. Different retention and destruction policies should apply depending on the company and the particularities of their data universe. Many IG specialists advise clients on the custom fit they need. It involves careful investigation of the company, its data and activities, including law suits and other investigations.

The Sedona Conference  Principles and Commentary on Defensible Disposition

Kevin Brady

Kevin Brady

These IG specialists, and the companies they serve, now have an excellent new resource tool to analyze and custom-fit data destruction policies. The Sedona Conference Principles and Commentary on Defensible Disposition (August 2018 Public Comment Version) (Editors-in-Chief, Kevin F. Brady and Dean Kuckelman). I highly recommend this new and excellent work by The Sedona Conferences. My commendations to the Drafting Team: Lauren A. Allen, Jesse Murray, Ross Gotler, Ken Prine, Logan J. Herlinger, David C. Shonka, Mark Kindy; the Drafting Team Leaders: Tara Emory and Becca Rausch; the Staff Editor: Susan McClaim, and Editors-in-Chief, Kevin F. Brady and Dean Kuckelman. Please send to them any comments you may have.

The Commentary begins in usual Sedona fashion by articulation of basic principles and comments tied to principles. The cases and legal authorities cited in all Commentaries by The Sedona Conference are excellent. This commentary on data disposition is no exception. I commend it for your detailed study and reference. Free download here from The Sedona Conference.

The Principles are:

PRINCIPLE 1.    Absent a legal retention or preservation obligation, organizations may dispose of their information.

Comment 1.a.   An organization should, in the ordinary course of business, properly dispose of information that it does not need.

Comment 1.b.   When designing and implementing an information disposition program, organizations should consider the obligation to preserve information that is relevant to the claims and defenses and proportional to the needs of any pending or anticipated litigation.

Comment 1.c. When designing and implementing an information disposition program, organizations should consider the obligation to preserve information that is relevant to the subject matter of government inquiries or investigations that are pending or threatened against the organization.

Comment 1.d.   When designing and implementing an information disposition program, organizations should consider applicable statutory and regulatory obligations to retain information.

PRINCIPLE 2.    When designing and implementing an information disposition program, organizations should identify and manage the risks of over-retention.

Comment 2.a.   Information has a lifecycle, including a time when disposal is beneficial.

Comment 2.b. To determine the “right” time for disposal, risks and costs of retention and disposal should be evaluated.

PRINCIPLE 3.    Disposition should be based on Information Governance policies that reflect and harmonize with an organization’s information, technological capabilities, and objectives.

Comment 3.a.   To create effective information disposition policies, organizations should establish core components of an Information Governance program, which should reflect what information it has, when it can be disposed of, how it is stored, and who owns it.

Comment 3.b. An organization should understand its technological capabilities and define its information objectives in the context of those capabilities.

Document Disposition and Information Governance

The Sedona Conference Principles and Commentary on Defensible Disposition builds upon Sedona’s earlier work, the Sedona Conference Commentary on Information Governance (Oct. 2014). Principle 6 of the Commentary on Information Governance provides the following guidance to organizations:

The effective, timely, and consistent disposal of physical and electronic information that no longer needs to be retained should be a core component of any Information Governance program. The Sedona Conference, Commentary on Information Governance, 15 SEDONA CONF. J. 125, 146 (2014) (“Information Governance” is “an organization’s coordinated, interdisciplinary approach to satisfying information compliance requirements and managing information risks while optimizing information value.” Id. at 126).

The Comment to Principle 6 goes on to explain:

It is a sound strategic objective of a corporate organization to dispose of information no longer required for compliance, legal hold purposes, or in the ordinary course of business. If there is no legal retention obligation, information should be disposed as soon as the cost and risk of retaining the information is outweighed by the likely business value of retaining the information. . . . Typically, the business value decreases and the cost and risk increase as information ages. Id. at 147.

The Sedona Conference concluded in 2018 that this 2014 advice, and similar advice from other sources, has not been followed by most organizations. instead, they continue to struggle to make “effective disposition decisions.” The group in Principles and Commentary on Defensible Disposition concluded in its Introduction that this struggle was caused by many factors, but identified the three main problems:

[T]he incorrect belief that organizations will be forced to “defend” their disposition
actions if they later become involved in litigation. Indeed, the phrase “defensible disposition” suggests that organizations have a duty to defend their information disposition actions. While it is true that organizations must make “reasonable and good faith efforts to retain information that is relevant to claims or defenses,” that duty to preserve information is not triggered until there is a “reasonably anticipated or pending litigation” or other legal demands for records. The Sedona Principles, Third Edition: Best Practices, Recommendations & Principles for Addressing Electronic Document Production, 19 SEDONA CONF. J. 1, 51, Principle 5, 93 (2018).

Another factor in the struggle toward effective disposition of information is the difficulty in appreciating how such disposition reduces costs and risks.

Lastly, many organizations struggle with how to design and implement effective disposition as part of their overall Information Governance program.

The Principles and Commentary on Defensible Disposition attempt to address these three factors and provide guidance to organizations, and the professionals who counsel organizations, on developing and implementing an effective disposition program.

Disposition Challenges

The Sedona Conference Principles and Commentary on Defensible Disposition (August, 2018) concludes by identifying the main challenges to data deletion.

  1. Unstructured Information.
  2. Mergers and Acquisitions.
  3. Departed, Separated, or Former Employees
  4. Shared File Sites
  5. Personally Identifiable Information (“PII”)
  6. Law Firms, eDiscovery Vendors, and Adversaries
  7. In-House Legal Departments
  8. Hoarders (my personal favorite)
  9. Regulations
  10. Cultural Change and Training

There are more, I am sure, but this is a good top ten list to start. I only wish they had included more discussion of these top ten.

Conclusion

Search is still more important for me than destroy. I prefer Where’s Waldo over Kill Waldo! I have not changed my position on that. But neither has mainstream Information Governance. They still disagree with my emphasis on Search. But everyone agrees that we should do both: Search and Destroy. Even I do not want companies to save all of their data. Some data should be destroyed.

I agree with mainstream IG that saving everything forever is not a viable information governance policy, no matter how many resources you also put into ESI search and retrieval. I have never said that you should rely solely on search, just that you should give Search more importance and, when in doubt, that you should save more documents than less. The Search and Destroy argument has always been one of a matter of degree and balance, not whether there should be no destruction at all. The difficult questions involve what should be saved and for how long, which are traditional information management problems.

Where to draw the line on destruction is the big question for everyone. The answer is always company specific, even project specific. It involves questions of varying retention times, files type and custodian analysis. When it comes down to specific decisions, and close questions, I generally favor retention. What may appear to be useless today, may prove to be relevant evidence tomorrow. I hate not being able to prove my case because all of the documents have already been deleted. Then it is just one person’s word against another. IG experts, who usually no longer litigate, or never litigated, do not like my complaints. They are eager to kill, to purge and destroy data. I am more inclined to save and search, but not save too much. It is a question of balance.

Data destruction – the killing of data – can, if done properly, make the search for relevant content much easier. Some disposition of obviously irrelevant, spam and otherwise useless information makes sense on every level. It helps all users of the IT system. It also helps with legal compliance. Too much destruction of data, too aggressive, and you may end up deleting information that you were required by law to keep. You could lose a law suit because of one mistake in a data disposition decision. Where do you draw the line between save and delete? What is the scope of a preservation duty? What files types should be retained? What retention times should apply? How much is too much? Not enough?

The questions go on and on and there is no one right answer. It all depends on the facts and circumstances of the organization and its data. The new Sedona Conference Principles and Commentary on Defensible Disposition is an important new guide to help IT lawyers and technologists to craft custom answers to these questions.

 


Ethical Guidelines for Artificial Intelligence Research

November 7, 2017

The most complete set of AI ethics developed to date, the twenty-three Asilomar Principles, was created by the Future of Life Institute in early 2017 at their Asilomar Conference. Ninety percent or more of the attendees at the conference had to agree upon a principle for it to be accepted. The first five of the agreed-upon principles pertain to AI research issues.

Although all twenty-three principles are important, the research issues are especially time sensitive. That is because AI research is already well underway by hundreds, if not thousands of different groups. There is a current compelling need to have some general guidelines in place for this research. AI Ethics Work Should Begin Now. We still have a little time to develop guidelines for the advanced AI products and services expected in the near future, but as to research, the train has already left the station.

Asilomar Research Principles

Other groups are concerned with AI ethics and regulation, including research guidelines. See the Draft Principles page of AI-Ethics.com which lists principles from six different groups. The five draft principles developed by Asilomar are, however, a good place to start examining the regulation needed for research.

Research Issues

1) Research Goal: The goal of AI research should be to create not undirected intelligence, but beneficial intelligence.

2) Research Funding: Investments in AI should be accompanied by funding for research on ensuring its beneficial use, including thorny questions in computer science, economics, law, ethics, and social studies, such as:

  • How can we make future AI systems highly robust, so that they do what we want without malfunctioning or getting hacked?
  • How can we grow our prosperity through automation while maintaining people’s resources and purpose?
  • How can we update our legal systems to be more fair and efficient, to keep pace with AI, and to manage the risks associated with AI?
  • What set of values should AI be aligned with, and what legal and ethical status should it have?

3) Science-Policy Link: There should be constructive and healthy exchange between AI researchers and policy-makers.

4) Research Culture: A culture of cooperation, trust, and transparency should be fostered among researchers and developers of AI.

5) Race Avoidance: Teams developing AI systems should actively cooperate to avoid corner-cutting on safety standards.

Principle One: Research Goal

The proposed first principle is good, but the wording? Not so much. The goal of AI research should be to create not undirected intelligence, but beneficial intelligence. This is a double-negative English language mishmash that only an engineer could love. Here is one way this principle could be better articulated:

Research Goal: The goal of AI research should be the creation of beneficial intelligence, not  undirected intelligence.

Researchers should develop intelligence that is beneficial for all of mankind. The Institute of Electrical and Electronics Engineers (IEEE) first general principle is entitled “Human Benefit.” The Asilomar first principle is slightly different. It does not really say human benefit. Instead it refers to beneficial intelligence. I think the intent is to be more inclusive, to include all life on earth, all of earth. Although IEEE has that covered too in their background statement of purpose to “Prioritize the maximum benefit to humanity and the natural environment.”

Pure research, where raw intelligence is created just for the hell of it, with no intended helpful “direction” of any kind, should be avoided. Because we can is not a valid goal. Pure, raw intelligence, with neither good intent, nor bad, is not the goal here. The research goal is beneficial intelligence. Asilomar is saying that Undirected intelligence is unethical and should be avoided. Social values must be built into the intelligence. This is subtle, but important.

The restriction to beneficial intelligence is somewhat controversial, but the other side of this first principle is not. Namely, that research should not be conducted to create intelligence that is hostile to humans.  No one favors detrimental, evil intelligence. So, for example, the enslavement of humanity by Terminator AIs is not an acceptable research goal. I don’t care how bad you think our current political climate is.

To be slightly more realistic, if you have a secret research goal of taking over the world, such as  Max Tegmark imagines in The Tale of the Omega Team in his book, Life 3.0, and we find out, we will shut you down (or try to). Even if it is all peaceful and well-meaning, and no one gets hurt, as Max visualizes, plotting world domination by machines is not a positive value. If you get caught researching how to do that, some of the more creative prosecuting lawyers around will find a way to send you to jail. We have all seen the cheesy movies, and so have the juries, so do not tempt us.

Keep a positive, pro-humans, pro-Earth, pro-freedom goal for your research. I do not doubt that we will someday have AI smarter than our existing world leaders, perhaps sooner than many expect, but that does not justify a machine take-over. Wisdom comes slowly and is different than intelligence.

Still, what about autonomous weapons? Is research into advanced AI in this area beneficial? Are military defense capabilities beneficial? Pro-security? Is the slaughter of robots not better than the slaughter of humans? Could robots be more ethical at “soldiering” than humans? As attorney Matt Scherer has noted, who is the editor of a good blog, LawAndAI.com and a Future of Life Institute member:

Autonomous weapons are going to inherently be capable of reacting on time scales that are shorter than humans’ time scales in which they can react. I can easily imagine it reaching the point very quickly where the only way that you can counteract an attack by an autonomous weapon is with another autonomous weapon. Eventually, having humans involved in the military conflict will be the equivalent of bringing bows and arrows to a battle in World War II.

At that point, you start to wonder where human decision makers can enter into the military decision making process. Right now there’s very clear, well-established laws in place about who is responsible for specific military decisions, under what circumstances a soldier is held accountable, under what circumstances their commander is held accountable, on what circumstances the nation is held accountable. That’s going to become much blurrier when the decisions are not being made by human soldiers, but rather by autonomous systems. It’s going to become even more complicated as machine learning technology is incorporated into these systems, where they learn from their observations and experiences in the field on the best way to react to different military situations.

Podcast: Law and Ethics of Artificial Intelligence (Future of Life, 3/31/17).

The question of beneficial or not can become very complicated, fast. Like it or not, military research into killer robots is already well underway, in both the public and private sector. Kalashnikov Will Make an A.I.-Powered Killer Robot: What could possibly go wrong? (Popular Mechanics, 7/19/17); Congress told to brace for ‘robotic soldiers’ (The Hill, 3/1/17); US military reveals it hopes to use artificial intelligence to create cybersoldiers and even help fly its F-35 fighter jet – but admits it is ALREADY playing catch up (Daily Mail, 12/15/15) (a little dated, and sensationalistic article perhaps, but easy read with several videos).

AI weapons are a fact, but they should still be regulated, in the same way that we have regulated nuclear weapons since WWII. Tom Simonite, AI Could Revolutionize War as Much as Nukes (Wired, 7/19/17); Autonomous Weapons: an Open Letter from AI & Robotics Researchers.

Principle Two: Research Funding

The second principle of Funding is more than an enforcement mechanism for the first, that you should only fund beneficial AI. It is also a recognition that ethical work requires funding too. This should be every lawyer’s favorite AI ethics principle. Investments in AI should be accompanied by funding for research on ensuring its beneficial use, including thorny questions in computer science, economics, law, ethics, and social studies. The principle then adds a list of five bullet-point examples.

How can we make future AI systems highly robust, so that they do what we want without malfunctioning or getting hacked. The goal of avoiding the creation of AI systems that can be hacked, easily or not, is a good one. If a hostile power can take over and misuse an AI for evil end, then the built-in beneficence may be irrelevant. The example of a driverless car come to mind that could be hacked and crashed as a perverse joy-ride, kidnapping or terrorist act.

The economic issues raised by the second example are very important: How can we grow our prosperity through automation while maintaining people’s resources and purpose? We do not want a system that only benefits the top one percent, or top ten percent, or whatever. It needs to benefit everyone, or at least try to. Also see Asilomar Principle Fifteen: Shared Prosperity: The economic prosperity created by AI should be shared broadly, to benefit all of humanity.

Yoshua Bengio, Professor of Computer Science at the University of Montreal, had this important comment to make on the Asilomar principles during an interview at the end of the conference:

I’m a very progressive person so I feel very strongly that dignity and justice mean wealth is redistributed. And I’m really concerned about AI worsening the effects and concentration of power and wealth that we’ve seen in the last 30 years. So this is pretty important for me.

I consider that one of the greatest dangers is that people either deal with AI in an irresponsible way or maliciously – I mean for their personal gain. And by having a more egalitarian society, throughout the world, I think we can reduce those dangers. In a society where there’s a lot of violence, a lot of inequality, the risk of misusing AI or having people use it irresponsibly in general is much greater. Making AI beneficial for all is very central to the safety question.

Most everyone at the Asilomar Conference agreed with that sentiment, but I do not yet see a strong consensus in AI businesses. Time will tell if profit motives and greed will at least be constrained by enlightened self-interest. Hopefully capitalist leaders will have the wisdom to share the great wealth with all of society that AI is likley to create.

How can we update our legal systems to be more fair and efficient, to keep pace with AI, and to manage the risks associated with AI? The legal example is also a good one, with the primary tension we see so far between fair versus efficient. Just policing high crime areas might well be efficient, at least for reducing some type of crime, but would it be fair? Do we want to embed racial profiling into our AI? Neighborhood slumlord profiling? Religious, ethic profiling? No. Existing law prohibits that and for good reason. Still, predictive policing is already a fact of life in many cities and we need to be sure it has proper legal, ethical regulation.

We have seen the tension between “speedy” and “inexpensive” on the one hand, and “just” on the other in Rule One of the Federal Rules of Civil Procedure and e-discovery. When applied using active machine learning a technical solution was attained to these competing goals. The predictive coding methods we developed allowed for both precision (“speedy” and “inexpensive”) and recall (“just”). Hopefully this success can be replicated in other areas of the law where machine learning is under proportional control by experienced human experts.

The final example given is much more troubling: What set of values should AI be aligned with, and what legal and ethical status should it have? Whose values? Who is to say what is right and wrong? This is easy in a dictatorship, or a uniform, monochrome culture (sea of white dudes), but it is very challenging in a diverse democracy. This may be the greatest research funding challenge of all.

Principle Three: Science-Policy Link

This principle is fairly straightforward, but will in practice require a great deal of time and effort to be done right. A constructive and healthy exchange between AI researchers and policy-makers is necessarily a two-way street. It first of all assumes that policy-makers, which in most countries includes government regulators, not just industry, have a valid place at the table. It assumes some form of government regulation. That is anathema to some in the business community who assume (falsely in our opinion) that all government is inherently bad and essentially has nothing to contribute. The countervailing view of overzealous government controllers who just want to jump in, uninformed, and legislate, is also discouraged by this principle. We are talking about a healthy exchange.

It does not take an AI to know this kind of give and take and information sharing will involve countless meetings. It will also require a positive healthy attitude between the two groups. If it gets bogged down into an adversary relationship, you can multiply the cost of compliance (and number of meetings) by two or three. If it goes to litigation, we lawyers will smile in our tears, but no one else will. So researchers, you are better off not going there. A constructive and healthy exchange is the way to go.

Principle Four: Research Culture

The need for a good culture applies in spades to the research community itself. The Fourth Principal states: A culture of cooperation, trust, and transparency should be fostered among researchers and developers of AI. This favors the open source code movement for AI, but runs counter to the trade-secret  business models of many corporations. See Eg.:OpenAI.com, Deep Mind Open Source; Liam , ‘One machine learning model to rule them all’: Google open-sources tools for simpler AI (ZDNet, 6/20/17).

This tension is likley to increase as multiple parties get close to a big breakthrough. The successful efforts for open source now, before superintelligence seems imminent, may help keep the research culture positive. Time will tell, but if not there could be trouble all around and the promise of full employment for litigation attorneys.

Principle Five: Race Avoidance

The Fifth Principle is a tough one, but very important: Teams developing AI systems should actively cooperate to avoid corner-cutting on safety standards. Moving fast and breaking things may be the mantra of Silicon Valley, but the impact of bad AI could be catastrophic. Bold is one thing, but reckless is quite another. In this area of research there may not be leisure for constant improvements to make things right. HackerWay.org.
Not only will there be legal consequences, mass liability, for any group that screws up, but the PR blow alone from a bad AI mistake could destroy most companies. Loss of trust may never be regained by a wary public, even if Congress and Trial Lawyers do not overreact. Sure, move fast, but not too fast where you become unsafe. Striking the right balance is going to require an acute technical, ethical sensitivity. Keep it safe.

Last Word

AI ethics is hard work, but well worth the effort. The risks and rewards are very high. The place to start this work is to talk about the fundamental principles and try to reach consensus. Everyone involved in this work is driven by a common understanding of the power of the technology, especially artificial intelligence. We all see the great changes on the horizon and share a common vision of a better tomorrow.

During an interview at the end of the Asilomar conference, Dan Weld, Professor of Computer Science, University of Washington, provided a good summary of this common vision:

In the near term I see greater prosperity and reduced mortality due to things like highway accidents and medical errors, where there’s a huge loss of life today.

In the longer term, I’m excited to create machines that can do the work that is dangerous or that people don’t find fulfilling. This should lower the costs of all services and let people be happier… by doing the things that humans do best – most of which involve social and interpersonal interaction. By automating rote work, people can focus on creative and community-oriented activities. Artificial Intelligence and robotics should provide enough prosperity for everyone to live comfortably – as long as we find a way to distribute the resulting wealth equitably.

Five Reasons You Should Read the ‘Practical Law’ Article by Maura Grossman and Gordon Cormack called “Continuous Active Learning for TAR”

April 11, 2016

Maura-and-Gordon_Aug2014There is a new article by Gordon Cormack and Maura Grossman that stands out as one of their best and most accessible. It is called Continuous Active Learning for TAR (Practical Law, April/May 2016). The purpose of this blog is to get you to read the full article by enticing you with some of the information and knowledge it contains. But before we go into the five reasons, we will examine the purpose of the article, which aligns with our own, and touch on the differences between their trademarked TAR CAL method and our CAR Hybrid Multimodal method. Both of our methods use continuous, active learning, the acronym for which, CAL, they now claim as a Trademark. Since they clearly did invent the acronym, CAL, we for one will stop using it – CAL – as a generic term.

The Legal Profession’s Remarkable Slow Adoption of Predictive Coding

The article begins with the undeniable point of the remarkably slow adoption of TAR by the legal profession, in their words:

Adoption of TAR has been remarkably slow, considering the amount of attention these offerings have received since the publication of the first federal opinion approving TAR use (see Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y. 2012)).

Winners in Federal CourtI remember getting that landmark ruling in our Da Silva Moore case, a ruling that pissed off plaintiffs’ counsel, because, despite what you may have heard to the contrary, they were strenuously opposed to predictive coding. Like most other lawyers at the time who were advocating for advanced legal search technologies, I thought Da Silva would open the flood gates, that it would encourage attorneys to begin using the then new technology in droves. In fact, all it did was encourage the Bench, but not the Bar. Judge Peck’s more recent ruling on the topic contains a good summary of the law. Rio Tinto PLC v. Vale S.A., 306 F.R.D. 125 (S.D.N.Y. 2015). There were a flood  of judicial rulings approving predictive coding all around the country, and lately, around the world. See Eg. Pyrrho Investments v MWB PropertyEWHC 256 (Ch) (2/26/16).

The rulings were followed in private arbitration too. For instance, I used the Da Silva More ruling a few weeks after it was published to obtain what was apparently the first ruling by an arbitrator in AAA approving use of predictive coding. The opposition to our use of cost-saving technology in that arbitration case was again fierce, and again included personal attacks, but the arguments for use in arbitration are very compelling. Discovery in arbitration is, after all, supposed to be constrained and expedited.

IT_GovernanceAfter the Da Silva Moore opinion, Maura Grossman and I upped our speaking schedule (she far more than me), and so did several tech-minded judges, including Judge Peck (although never at the same events as me, until the cloud of false allegations created by a bitter plaintiff’s counsel in Da Silva Moore could be dispelled). At Legal Tech for the next few years Predictive Coding is all anybody wanted to talk about. Then IG, Information Governance, took over as the popular tech-child of the day. In 2015 we had only a few predictive coding panels at Legal Tech, but they were well attended.

The Grossman Cormack speculates that the cause of the remarkably slow adoption is:

The complex vocabulary and rituals that have come to be associated with TAR, including statistical control sets, stabilization, F1 measure, overturns, and elusion, have dissuaded many practitioners from embracing TAR. However, none of these terms, or the processes with which they are associated, are essential to TAR.

Control-SetsWe agree. The vendors killed what could have been their golden goose with all this control set nonsense and their engineers love of complexity and misunderstanding of legal search. I have ranted about this before. See Predictive Coding 3.0. I will not go into that again here, except to say the statistical control set nonsense that had large sampling requirements was particularly toxic. It was not only hard and expensive to do, it led to mistaken evaluations of the success or failure of projects because it ignored the reality of the evolving understand of relevance, so called concept drift. Another wrong turn involved the nonsense of using only random selection to find training documents, a practice that Grossman and I opposed vigorously. See Latest Grossman and Cormack Study Proves Folly of Using Random Search For Machine Training – Part One,  Part Two,  Part Three, and Part Four. Grossman and Cormack correctly criticize these old vendor driven approaches in Continuous Active Learning for TAR. They call them SAL and SPL protocols (a couple of acronyms that no one wants to trademark!).

Bottom line, the tide is changing. Over the last several years the few private attorneys who specialize in legal search, but are not employed by a vendor, have developed simpler methods. Maura and I are just the main ones writing and speaking about it, but there are many others who agree. Many have found that it is counter-productive to use control sets, random input, non-continuous training with its illogical focus on the seed set, and misleading recall point projections.

grossman_cormack_filteredWe do so in defiance of the vendor establishment and other self-proclaimed pundits in this area who benefitted by such over-complexity. Maura and Gordon, of course, have their own software (Gordon’s creation), and so never needed any vendors to begin with. Not having a world renowned information scientist like Professor Cormack as my life partner, I had no choice but to rely on vendors for their software. (Not that I complaining, mind you. I’m married to a mental health counselor, and it does not get any better than that!)

MrEdr_CapedAfter a few years I ultimately settled on one vendor, Kroll Ontrack, but I continue to try hard to influence all vendors. It is a slow process. Even Kroll Ontrack’s software, which I call Mr. EDR, still has control set functions built in. Thanks to my persistence, it is easy to turn off these settings and do things my way, with no secret control sets and false recall calculations. Hopefully soon that will be the default setting. Their eyes have been opened. Hopefully all of the other major vendors will soon follow suit.

All of the Kroll Ontrack experts in predictive coding are now, literally, a part of my Team. They are now fully trained and believers in the simplified methods, methods very similar to those of Grossman and Cormack, albeit, as I will next explain, slightly more complicated. We proved how well these methods worked at TREC 2015 when the Kroll Ontrack experts and I did 30 review projects together in 45 days. See e-Discovery Team at TREC 2015 Total Recall Track, Final Report (116 pg. PDF), and  (web page with short summary). Also see – Mr. EDR with background information on the Team’s participation in the TREC 2015 Total Recall Track.

We Agree to Disagree with Grossman and Cormack on One Issue, Yet We Still Like Their Article

Team_TRECWe are fans of Maura Grossman and Gordon Cormack’s work, but not sycophants. We are close, but not the same; colleagues, but not followers. For those reasons we think our recommendation for you to read this article means more than a typical endorsement. We can be critical of their writings, but, truth is, we liked their new article, although we continue to dislike the name TAR (not important, but we prefer CAR). Also, and this is of some importance, my whole team continues to disagree with what we consider the somewhat over-simplified approach they take to finding training documents, namely reliance on the highest ranking documents alone.

LogisticRegressionWindowLogisticFitChart6Despite what some may think, the high-ranking approach does eventually find a full diversity of relevant documents. All good predictive coding software today pretty much uses some type of logistic regression based algorithms that are capable of building out probable relevance in that way. That is one of the things we learned by rubbing shoulders with text retrieval scientists from around the world at TREC when participating in the 2015 Total Recall Track that Grossman and Cormack helped administer. This regression type of classification system works well to avoid the danger of over-training on a particular relevancy type. Grossman and Cormack have proven that before to our satisfaction (so have our own experiments), and they again make a convincing case for this approach in this article.

4_Cylinder_engineStill, we disagree with their approach of only using high-ranking documents for training, but we do so on the grounds of efficiency and speed, not effectiveness. The e-Discovery Team continues to advocate a Hybrid Multimodal approach to active machine learning. We use what I like to call a four-cylinder type of CAR search engine, instead of one-cylinder, like they do.

  1. High-ranking documents;
  2. Mid-level, uncertain documents;
  3. A touch, a small touch, of random documents; and,
  4. Human ingenuity found documents, using all type of search techniques (multimodal) that seem appropriate to the search expert in charge, including keyword, linear, similarity (including chains and families), concept (including passive machine learning, clustering type search).

Predictive Coding 3.0 – The method is here described as an eight-part work flow (Step 6 – Hybrid Active Training).

The latest Grossman and Cormack’s versions of CAL (their trademark) only uses the highest-ranking documents for active training. Still, in spite of this difference, we liked their article and recommend you read it.

The truth is, we also emphasize the high-probable relevant documents for training. The difference between us is that we use the three other methods as well. On that point we agree to disagree. To be clear, we are not talking about continuous training or not, we agree on that. We are not talking about active training, or not (passive), we agree on that. We are not talking about using what they call using SAL or SPL protocols (read their article for details), we agree with them that these protocols are ineffective relics invented by misguided vendors. We are only talking about a difference in methods to find documents to use to train the classifier. Even that is not a major disagreement, as we agree with Grossman and Cormack that high-ranking documents usually make the best trainers, just not in the first seed set. There are also points in a search, depending on the project, where the other methods can help you get to the relevant documents in a fast, efficient manner. The primary difference between us is that we do not limit ourselves to that one retrieval method like Grossman and Cormack do in their trademarked CAL methodology.

Cormack and Grossman emphasize simplicity, ease of use, and reliance on the software algorithms as another way to try to overcome the Bar’s continued resistance to TAR. The e-Discovery Team has the same goal, but we do not think it is necessary to go quite that far for simplicity sake. The other methods we use, the other three cylinders, are not that difficult and have many advantages. e-Discovery Team at TREC 2015 Total Recall Track, Final Report (116 pg. PDF and web page with short  summary). Put another way, we like the ability of fully automatic driving from time to time, but we want to keep an attorney’s learned hand at or near the wheel at all times. See Why the ‘Google Car’ Has No Place in Legal Search.

Accessibility with Integrity: The First Reason We Recommend the Article

Professor Gordon Cormack

Here’s the first reason we like Grossman & Cormack’s article, Continuous Active Learning for TAR: you do not have to be one of Professor Cormac’s PhD students to understand it. Yes. It is accessible, not overly technical, and yet still has scientific integrity, still has new information, accurate information, and still has useful knowledge.

It is not easy to do both. I know because I try to make all of my technical writings that way, including the 57 articles I have written on TAR, which I prefer to call Predictive Coding, or CAR. I have not always succeeded in getting the right balance, to be sure. Some of my articles may be too technical, and perhaps some suffer from breezy information over-load and knowledge deficiency. Hopefully none are plain wrong, but my views have changed over the years. So have my methods. If you compare my latest work-flow (below) with earlier ones, you will see some of the evolution, including the new emphasis over the past few years with continuous training.

predictive_coding_revised_small_size

The Cormacks and I are both trying hard to get the word out to the Bar as to the benefits of using active machine learning in legal document review.  (We all agree on that term, active machine learning, and all agree that passive machine learning is not an acceptable substitute.) It is not easy to write on this subject in an accurate, yet still accessible and interesting manner. There is a constant danger that making a subject more accessible and simple will lead to inaccuracies and misunderstandings. Maura and Gordon’s latest article meets this challenge.

Search ImageTake for example the first description in the article of their continuous active training search method using highest ranking documents:

At the outset, CAL resembles a web search engine, presenting first the documents that are most likely to be of interest, followed by those that are somewhat less likely to be of interest. Unlike a typical search engine, however, CAL repeatedly refines its understanding about which of the remaining documents are most likely to be of interest, based on the user’s feedback regarding the documents already presented. CAL continues to present documents, learning from user feedback, until none of the documents presented are of interest.

That is a good way to start an article. The comparison with a Google search having continued refinement based on user feedback is well thought out; simple, yet accurate. It represents a description honed by literally hundreds of presentations on the topic my Maura Grossman. No one has talked more on this topic than her, and I for one intend to start using this analogy.

Rare Description of Algorithm Types – Our Second Reason to Recommend the Article

Another reason our Team liked Continuous Active Learning for TAR is the rare description of search algorithm types that it includes. Here we see the masterful touch of one of the world’s leading academics on text retrieval, Gordon Cormack. First, the article makes clear the distinction between effective analytic algorithms that truly rank documents using active machine learning, and a few other popular programs now out there that use passive learning techniques and call it advanced analytics.

The supervised machine-learning algorithms used for TAR should not be confused with unsupervised machine-learning algorithms used for clustering, near-duplicate detection, and latent semantic indexing, which receive no input from the user and do not rank or classify documents.

Old_CAR_stuck_mudThese other older, unsupervised search methods are what I call concept search. It is not predictive coding. It is not advanced analytics, no matter what some vendors may tell you. It is yesterday’s technology – helpful, but far from state-of-the-art. We still use concept search as part of multimodal, just like any other search tool, but our primary reliance to properly rank documents is placed on active machine learning.

hyperplanes3d_2The Cormack-Grossman article goes farther than pointing out this important distinction, it also explains the various types of bona fide active machine learning algorithms. Again, some are better than others. First Professor Cormack explains the types that have been found to be effective by extensive research over the past ten years or so.

Supervised machine-learning algorithms that have been shown to be effective for TAR include:

–  Support vector machines. This algorithm uses geometry to represent each document as a point in space, and deduces a boundary that best separates relevant from not relevant documents.

– Logistic regression. This algorithm estimates the probability of a document’s relevance based on the content and other attributes of the document.

Conversely Cormack explains:

Popular, but generally less effective, supervised machine-learning algorithms include:

– Nearest neighbor. This algorithm classifies a new document by finding the most similar training document and assuming that the correct coding for the new document is the same as its nearest neighbor.

– Naïve Bayes (Bayesian classifier). This algorithm estimates the probability of a document’s relevance based on the relative frequency of the words or other features it contains.

Ask your vendor which algorithms its software includes. Prepare yourself for double-talk.

Hot-or-Not

If you try out your vendors software and the Grossman-Cormack CAL method does not work for you, and even the e-Discovery Team’s slightly more diverse Hybrid Multimodal method does not work, then your software may be to blame. As Grossman-Cormack put it, where the phrase “TAR tool” means software:

[I]t will yield the best possible results only if the TAR tool incorporates a state-of-the-art learning algorithm.

That means software that uses a type of support vector machine and/or logistic regression.

Teaching by Example – Our Third Reason to Recommend the Article

The article uses a long example involving search of Jeb Bust email to show you how their CAL method works. This is an effective way to teach. We think they did a good job with this. Rather than spoil the read with quotes and further explanation, we urge you to check out the article to see for yourself. Yes, it is an oversimplification, after all this is a short article, but it is a good one, and is still accurate.

 Quality Control Suggestions – Our Fourth Reason to Recommend the Article

quality_diceAnother reason we like the article are the quality control suggestions it includes. They essentially speak of using other search methods, which is exactly what we do in Hybrid Multimodal. Here are their words:

To increase counsel’s confidence in the quality of the review, they might:

Review an additional 100, 1,000, or even more documents.

Experiment with additional search terms, such as “Steve Jobs,” “iBook,” or “Mac,” and examine the most-likely relevant documents containing those terms.

Invite the requesting party to suggest other keywords for counsel to apply.

Review a sample of randomly selected documents to see if any other documents of interest are identified.

We like this because it shows that the differences are small between the e-Discovery Team’s Hybrid Multimodal method (hey, maybe I should claim Trademark rights to Hybrid Multimodal, but then again, no vendors are using my phrase to sell their products) using continuous active training, and the Grossman-Cormack trademarked CAL method. We also note that their section on Measures of Success essentially mirrors our own thoughts on metric analysis and ei-Recall. Introducing “ei-Recall” – A New Gold Standard for Recall Calculations in Legal SearchPart One, Part Two and Part Three.

Article Comes With an Online “Do it Yourself” CAL Trial Kit – Our Fifth Reason to Recommend the Article

We are big believers in learning by doing. That is especially true in legal tasks that seem complicated in the abstract. I can write articles and give presentations that provide explanations of AI-Enhanced Review. You may get an intellectual understanding of predictive coding from these, but you still will not know how to do it. On the other hand, if we have a chance to show someone an entire project, have them shadow us, then they will really learn how it is done. It is like teaching a young lawyer how to try a case. For a price, we will be happy to do so (assuming conflicts clear).

Jeb_BushMaura and Gordon seem to agree with us on that learn by doing point and have created an online tool that anyone can use to try out their method. In allows for a search of the Jeb Bush email, the same set of 290,099 emails that we used in ten of the thirty topics in 2015 TREC. In their words:

There is no better way to learn CAL than to use it. Counsel may use the online model CAL system to see how quickly and easily CAL can learn what is of interest to them in the Jeb Bush email dataset. As an alternative to throwing up their hands over seed sets, control sets, F1 measures, stabilization, and overturns, counsel should consider using their preferred TAR tool in CAL mode on their next matter.

You can try out their method with their online tool, or in a real project using your vendor’s tool. By the way, we did that as part of our TREC 2015 experiments, and the Kroll Ontrack software worked about the same as theirs, even when we used their one-cylinder, high ranking only, CAL (their trademark) method.

Here is where you can find their CAL testing tool: cormack.uwaterloo.ca/cal. Those of you who are still skeptical can see for yourself how it works. You can follow the example given in the article about searching for documents relevant to Apple products, to verify their description of how that works. For even more fun, you can dream up your own searches.

030114-O-0000D-001 President George W. Bush. Photo by Eric Draper, White House.

Perhaps, if you try hard enough, you can find some example searches where their high-end only method, which is built into the test software, does not work well. For example, try finding all emails that pertain to, or in any way mention, the then President, George Bush. Try entering George Bush in the demo test and see for yourself what happens.

It becomes a search for George + Bush in the same document, and then goes from there based on your coding the highest ranked documents presented as either relevant or non-relevant. You will see that you quickly end up in a TAR pit. The word Bush is in every email (I think), so you are served up with every email where George is mentioned, and believe me, there are many Georges, even if there is only one President George Bush. Here is the screen shot of the first document presented after entering George Bush. I called it relevant.

Screen Shot 2016-04-10 at 4.13.24 PM

These kind of problem searches do not discredit TAR, or even the Grossman Cormack one-cylinder search method. If this happened to you in a real search project, you could always use our Hybrid Multimodal™ method for the seed set (1st training), or start over with a different keyword or keywords to start the process. You could, for instance, search for President Bush, or President within five of George, or “George Bush.” There are many ways, some faster and more effective than others.

Even using the single method approach, if you decided to use the keywords “President + Bush”, then the search will go quicker than “George + Bush.” Even just using the term “President” works better than George + Bush, but still seems like a TAR pit, and not a speeding CAR. It will probably get you to the same destination, high recall, but the journey is slightly longer and, at first, more tedious. This high recall result was verified in TREC 2015 by our Team, and by a number of Universities who participated in the fully automatic half of the Total Recall Track, including Gordon’s own team. This was all done without any manual review by the fully automatic participants because there was instant feedback of relevant or irrelevant based on a prejudged gold standard. See e-Discovery Team at TREC 2015 Total Recall Track, Final Report (116 pg. PDF), and (web page with short  summary). With this instant feedback protocol, all of the teams attained high recall and good precision. Amazing but true.

You can criticized this TREC experiment protocol, which we did in our report, as unrealistic to legal practice because:

(1) there is no SME who works like that (and there never will not be, until legal knowledge itself is learned by an AI); and,

(2) the searches presented as tasks were unrealistically over-simplistic. Id.

But you cannot fairly say that CAL (their trademark) does not work. The glass is most certainly not half empty. Moreover, the elixir in this glass is delicious and fun, especially when you use our Hybrid Multimodal™ method. See Why I Love Predictive Coding: Making document review fun with Mr. EDR and Predictive Coding 3.0.

Conclusion

Ralph_head_2016Active machine learning (predictive coding) using support vector or logistic regression algorithms, and a method that employs continuous active training, using either one cylinder (their CAL), or four (our Hybrid Multimodal), really works, and is not that hard to use. Try it out and see for yourself. Also, read the Grossman Cormack article, it only takes about 30 minutes. Continuous Active Learning for TAR (Practical Law, April/May 2016). Feel free to leave any comments below. I dare say you can even ask questions of Grossman or Cormack here. They are avid readers and will likely respond quickly.


e-Disco News, Knowledge and Humor: What’s Happening Today and Likely to Happen Tomorrow

June 7, 2015

Spock_smilingMy monthly blogs seems to be getting too heavy, even for me, so this month I am going to try to change. This month I will resort to e-discovery gossip and cheap laughs. I’m hoping that even Spock himself would smile.

But first, a little introspective musing. In February this year, after nine years of writing a weekly blog, I switched to monthly. Since then my blogs have not only been long, complex and difficult, which I warned you would happen, but have also been a tad serious and intellectual. That was never my intent, but it just turned out that way. For instance, my first monthly blog in March, where I started harmlessly enough with a fantasy about time travel and a hack of the NSA, the blog morphed into a detailed outline and slide show on how to do a predictive coding project. Heavy, some might even say boring, well, at least the second half. My next blog was my all time deepest writing ever, where I explained my new intellectual paradigm, Information → Knowledge → Wisdom. I really do hope as many people as possible will read this. It is intended to be insightful, not necessarily entertaining, and certainly not light reading. It went beyond just e-discovery, and law, and ventured into general social commentary.

ZENumericsIn last month’s blog I shared a moment of ZEN, but the moment was filled with math and metrics, not bliss. That’s because in my bizarro world ZEN now means Zero Error Numerics and is designed for quality control in legal search and document review, not Enlightenment. The focus in that blog was on seventeen skills that must be learned to master the ZEN of document review, including concentration. If it were not bad enough to share deep knowledge, instead of fun facts, I even included links to wisdom words with quotes of Zen Masters, old and new. I also mentioned the new trend in corporate America, especially Silicon Valley, of meditation and mindfulness. That was a heavy blog indeed, even the name was way long: Introducing a New Website, a New Legal Service, and a New Way of Life / Work; Plus a Postscript on Software Visualization and Thanks to Kroll Ontrack.

zen_garden_kyotoThe response from most of you, my dear readers, to last month’s blog reminded me of the sound of one hand clapping, or, as I will explain latter, the pauses after Craig Ball’s jokes at his keynote in London last month. Still, last month’s blog did at least provoke an enthusiastic response from all Krollites. I have to concede, however, that this could be a result of my mention and sincere thanks to Kroll Ontrack in the Postscript to Data Visualization at the end of the blog, rather than any great fascination on Kroll’s part with ZEN. Still, I may go with KO next year to teach predictive coding in Tokyo, and even visit Kyoto, so their interest in stages beyond mere information may well be sincere. See: Information → Knowledge → Wisdom: Progression of Society in the Age of Computers.

This month, with my goal to amuse and make even Spock smile, my blog will focus on information, name dropping and insider references. Some knowledge will be thrown in too, of course, because, after all, that is the whole point of information. Information is never an end in itself, or at least should not be. A dash of wisdom may also be thrown in, but, I promise, I will wrap it in humor and sneak it by with vague allusions. No more Zen Master quotes, not even Steve Jobs. Hopefully you will not even notice the wise guy comments, and may even suspect, falsely of course, that you are none the wiser for reading all this bull.

LTN Finalist for Innovative CIO of the Year

ralphlosey_cartoon_smallI will start this newsy blog off on a personal note about my surprise nomination for an honest to God award. No. It has nothing to do with ZEN or document review competitions (ahem – never did get an award for that). It has to do with innovation. Me and new ideas. Imagine that. Unlike former government guru award laden Jason R. Baron, now IG champion of the World after his recent trashing of me in London, I have never won an award (I don’t count my third grade spelling bee) (imagine very small violins playing now). I still have not won an award, mind you, but I have at least now been nominated and qualified as one of three finalists in the Legaltech News Innovation Award 2015. For losers like me just getting a third place mention is a big deal. Sad huh? The award is supposed to recognize “outstanding achievement by legal professionals in their use of technology.”

Innovation_Award

Thank you dear readers for nominating and voting for me to receive this award. The award category I am in is a bit odd (for me at least), Chief Information Officer, but apparently that is the only one that someone like me could be crammed into. The three finalists in each Innovation category are determined by open voting by LTN magazine subscribers and through LTN’s website. So again, thanks for all of you who voted, especially my family and paid voters in Eastern Europe (they work cheap). The final winner among the finalists in each category are, according to LTN, chosen by “a panel of judges comprised of members of Legaltech News’ editorial staff.” Uh-oh.

Congratulations to all who made it as a finalists and good luck to one and all. There were many vendor categories too, aside from the law firm ones listed in the chart. I list all the vendor categories and finalists below. I have heard of most of them, and know a few very well. But to be honest, I had never heard of many of these vendors, which, no doubt, is what most law firm CIOs are now saying about me. This is an informative list, so I suggest you take time to read it. Again, congrats to all finalists.

Vendor Finalists/Winners

New Product of the Year
Avvo Inc., Advisor
Catalyst Repository Systems, Predict
Diligence Engine
Lex Machina
Best Marketing Services Providers
JD Supra
One400
Best Knowledge Management Software
Motivation Group’s Easy Data Maps
MDLegalApps’ Not Guilty App
Prosperoware
ZL Technologies
Best Mobile Device Tool or Service
Abacus Data Systems
AgileLaw
Logik Systems’ Logikcull
Best Trial Support Software
Indata Corp.’s TrialDirector
LexisNexis CaseMap
Thomson Reuters’ Case Notebook
Best Case/Matter Management System
Bridgeway Software
Mitratech Holdings’ TeamConnect 4
Lexicata
Best Records Management Software
Hewlett-Packard Co., HP Records Manager (formerly TRIM)
IBM Records Manager
ZL Technologies
Best Risk Management Software
Compliance Science Inc.
IBM OpenPages Operational Risk Management
Best Time and Billing Software
Abacus Data Systems
Tabs3 Software
Tikit North America
Best Collaboration Tool
Accellion kiteworks
Litera IDS
Mitratech’s Lawtrac Self-Service Portal
Opus2 Magnum
Best Document Automation/Management
Allegory
HotDocs Market
Leaflet Corp.
SmartRoom
Best E-Discovery Managed Service Provider
Clutch Group
FTI Consulting, FTI Technology
Iris Data Services’ Arc
UnitedLex
Best E-Discovery Processing
Exterro Inc.
iPro Tech
Nuix
UnitedLex
ZL Technologies
Best E-Discovery Review Platform
FTI Technology’s Ringtail software
iConect Developement
kCura Corp.’s Relativity
Recommind Inc.’s Axcelerate 5
Best E-Discovery Legal Hold
Exterro Legal Hold
Legal Hold Pro
Best E-Discovery Hosting Provider
Iris Data Services
Logikcull
Nextpoint Inc.
Best E-Discovery OEM Technology Partner
Content Analyst
Nuix
Best Research Product
CaseMetrix
Docket Alarm Inc.
Handshake Corp.
Best Research Platform
Bloomberg Law
Fastcase
LexisNexis’ Lexis Advance
Thomson Reuters’ Westlaw Next
Best Practice Management Software
LexisNexis Legal & Professional’s Firm Manager
Thomson Reuters’ Firm Central

The other two finalists in “my” category, CIO (if hell freezes over and I win, you know I’ll add that title to my card), are Dan Nottke and Harry Shipley. Again, good luck to them and please excuse my pathetic attempts at humor. I Googled them both and will share what I know about them and then make a prediction as to how I will do in this event (hint – it’s not good).

Nottke_danDan Nottke is currently the Chief Information Officer for Kirkland & Ellis LLP, a law firm that always seems to begin descriptions of itself by saying it is 100-years old. (In just a few more years I may be able to say that too.) Most of us know Kirkland, not as old, but as one of the largest, most powerful law firms, having over 1,800 lawyers in key cities around the world. The IT challenges of a firm like that must be huge. Dan is obviously a serious player in the law firm CIO world.

I have never met Dan, but a quick Google shows he is on LTN’s Law Firm Chief Information & Technology Officers Board. With the exception of Monica Bay, who has now left LTN and this CIO Board, I have never met, or even heard of, any of the LTN CIO board members. They all appear to be great people, we just do not travel in the same circles. They are, after all, real life law firm CIOs. They are engineers, not lawyers, but for lawyers. Googling Dan shows that he is usually described with the following defining accomplishment:

Since joining the Firm in 2008, Dan has led the transformation of the Information Technology department from a decentralized team to a fully centralized Information Technology Infrastructure Library (ITIL) based on a high performing organization.

Since his expertise is so different from mine (to be honest I had to look up ITIL on Wikipedia, since I had never heard of it), it is no surprise that we have never met. About the only thing we have in common is high performing law firms, although his firm is more than twice the size of mine. The same goes for the other finalist, Harry Shipley. He has yet another completely different skill set and list of accomplishments.

Harry Shipley is the Assistant Executive Director and CFO of the Iowa State Bar Association. According to his Linked-In profile he is a graduate of Grand View College and the top skill listed is legal research, but otherwise he does not disclose much. Further research shows that he is an expert in document automation, an area he has been working on for over 15 years. I also see that Harry received an award last year from the Iowa State Bar Association in 2014, the Patriot Award, in recognition of his support of Iowa Bar employees serving in the National Guard and Reserve. Seems like a very nice guy, but I could not find out much more about him. Obviously he has many LTN fans or he would not have made the finals.

If I were the LTN Editors and had to pick a winner from these three for the most Innovative CIO of the year, I would pick Dan Nottke (no offense Harry). After all, Dan is the only real life CIO, and taking a “decentralized team to a fully centralized Information Technology Infrastructure Library” seems pretty innovative to me. But, what do I know, I’m just a lawyer, which appears to be the only advantage I have at this point over Dan and Harry. So, congratulations in advance to Mr. Nottke. In the very unlikely event that Harry or I win instead, Dan can at least console himself in knowing that the most Innovative CIO this year did not go to a competitor CIO, it went to a Bar guy or hacker lawyer instead.

LTN-innovation-awardsThe Legaltech News announcement of the award finalists said: “The winners will be recognized at a special event on July 14 at the close of Legaltech West at the City Club of San Francisco.” Well, I never go to Legaltech West, just East. So, even if I did not already have another very important conflict, flying from Orlando to San Francisco is a tad too far to travel for a maybe award dinner. So, Dan, please do not be insulted if I do not show to applaud your acceptance. I admit I did ask Legaltech about any possible advance notice, and they said no way, come to the dinner to find out just like everyone else (I respect that, but had to try). Apparently, this is the first time LTN has ever had an awards dinner for this with all the super-secrecy stuff. (I understand they used to just make an announcement in LTN and mail you something.) But now they have a dinner and are looking for a good turn-out. I don’t blame them for that, having put on a few events myself during my nearly 100 years.

Oscar_AwardAnyway, LTN told me that I had to be there, at the awards dinner, in order to physically receive the award (not sure exactly that means). So, even though I cannot come due to the long distance, and an expected very big and important conflict, namely my playing a new role of Grandfather at about that time, my firm, Jackson Lewis, does have a nice office in San Francisco. So, I am hoping to persuade one or two of my e-discovery attorneys in that office to show up at the dinner for me, to clap a million times where appropriate as awards are doled out, and, in case lightning strikes, to accept the award for me in absentia. In fact, I hope to make them Ralph face-masks to wear, just in case, so, if I win, they can make a convincing showing and quickly grab the hardware before any of the LTN editors figure out that I’m not really there, much less not a real CIO.

e-Discovery in Switzerland

Alice_Down_the_Rabbit_HoleWe used to think of e-discovery as a unique U.S. legal obsession, but that is not true anymore. Our little preoccupation of following evidence down the rabbit hole of technology is now a worldwide phenomena. This was very evident at a couple of events I attended last month in Zurich and London. I’ll start off with Zurich, which has got to be one of the most beautiful cities in the world. The city seemed like a kind of Disney World, super clean, nice and expensive, but without the annoying characters or tourists, and, incredibly quiet. Zurich is all about pristine water, the Swiss alps, and environmentally conscious, healthy, smart people.

I knew all that coming in, but what I did not know until I got there was how sharp and interested the Swiss Bar would be about e-discovery, especially with an advanced topic like predictive coding. I now know why half the world’s money is stashed in Switzerland. They are a very secure bunch, and all carry Swiss army knives and ride around on bikes. Their only vice in Zurich appears to be chocolate, which they eat constantly, and even drink. The only negative thing I can say about Zurich is that it shuts down at 9:00 and it is thereafter impossible to find a good restaurant.

Taylor_HoffmanI was invited to Zurich by Swiss Re e-discovery department to be on a panel that followed the premiere in Switzerland of Joe Looby’s documentary, The Decade of Discovery. Our primary host was Taylor Hoffman, SVP, Head of eDiscovery at Swiss Re. What a dream job Taylor has. He primarily works in New York, but spends a lot of time in Zurich. Jason Baron and his wife, Robin, and I had a seven-course lunch at the private dining facility as Swiss Re’s headquarters overlooking Lake Zurich. We were joined by other members of Swiss Re’s legal department, plus some e-discovery lawyers who came in from Germany and elsewhere to meet and greet. We discussed e-discovery between various wine pairings and ever-changing dishes.

The focus on e-discovery in the EU is all about government investigations, a fact later confirmed by my discussions in London. They also focus on privacy and cross-border issues, and seem to think we are barbarians when it comes to privacy. Since I do not really disagree with them on their privacy criticisms (See: Losey, Are We the Barbarians at the Gate? (e-Discovery Team, Sept. 1, 2008)), a position that seemed to surprise them even more than my being a blogger in a suit, I was able to dodge the daggers very politely thrown at Jason and me.

hide the ballInstead, being the accomplished diplomat that I am (I even have my own email server, rather than blind copying the Chinese on everything), and used to arguing with lawyers everywhere, just as a matter of professional courtesy, it did not take long (one glass) for me to bring up the whole pesky notion of truth and justice. Namely, how can you have justice when both sides in litigation are permitted to hide any documents that they want? They explained to me, an obviously naive and hopelessly idealistic American, that in civilized society, namely Europe, all you are required to disclose are the documents, the ESI, that happen to support your case. In civil litigation you only produce the documents that support your side of the story of what happened.

JusticeThey have virtually no conception of a duty in private litigation to disclose to opposing parties the documents that you have found that show your witnesses are “misremembering” the facts, i.e.- lying. You can imagine how diplomatic I was, and how squirmy and quiet Jason soon became, but it did all end well. We agreed that no one should lie to a judge. Apparently judges everywhere get tired of all of the contradicting allegations and may force both sides to disclose the truth, the whole truth, and nothing but. Apparently, however, that is rare in non-criminal litigation. The primary focus of the kind of disclosure that we know, involving both good and bad documents, is in criminal cases, government investigations, and private, internal investigations.

I asked the non-Swiss Re attorneys attending the lunch how much of their time they spend doing e-discovery work, as opposed to other types of legal services. The answer was it depends, of course, but upon close cross-examination (yeah, I was popular), I learned that the percentage was from 10% to 25%. Remember, these are the outside counsel with special expertise in e-discovery. To me that made it all the more impressive to see how quickly the Zurich attorneys got it who attended the The Decade of Discovery movie. They paid attention, and most importantly, they laughed at all of the right places and seemed to understand. Their questions were good too. They were an unusually astute group, considering that no one outside of Swiss Re and the sponsoring vendor, Consilio, actually do much of this work.

Michael-BeckerConsilio sponsored Joe Looby’s movie showing in Zurich, and then again in London. Consilio’s Managing Directors also presented at both panels following the show, Michael Becker (shown here) in Zurich and Drew Macaulay in London. My thanks for Concilio’s gracious sponsorship and well-run events. Also presenting at these events were Joe Looby, Jason Baron, and Taylor Hoffman.

The main draw was not the panel discussions, as interesting as I think they were, but rather the movie itself, Looby’s Decade of DiscoveryEveryone in Zurich assumed I had seen the documentary many times, but in truth that was only the second time I had seen it. The first was the major showing in DC where everyone who is anyone attended, and most of us wondered how we ended up on the cutting room floor. Still in DC it was a standing ovation, and very emotional, as the star of the movie, Richard Braman, had recently passed away. This movie is a fitting tribute to his work.

Decade_Discovery

Notice how the movie poster says “Justice … is the right to learn the truth from your adversary.” Who knew that is not a popular sentiment in Europe and the UK? We need to learn about privacy from them and they need to learn from us about the importance of full disclosure.

The Decade of Discovery movie prominently features the award-winning Mr. Baron, as the journeyman to Sedona. It makes for a good story, and in the process explains predictive coding pretty well.

I made a movie with Jason myself many years ago, Did You Know: e-Discovery? Apparently our short little slide-show type video is now hard to find, so, even though it is not in the same league as Looby’s real movie, I reproduce it here again so all can easily find it. I can brag that all of our predictions have, so far, come true and the exponential increase in data continues. Feel free to share it by using the share button in the upper left. I would reproduce the Decade of Discovery movie instead, but it is not available online.

Unlike my little slide show video with Jason, Joe Looby’s Decade of Discovery is a real movie. Now that I have seen it twice, I appreciate it much more. I urge you to take time to see it if it ever comes to your town. Check out Joe’s Facebook page for his movie company 10th Mountain Films.

Joe_LoobyOne of the surprise treats from my European trip was to learn what a great guy Joe Looby is. I did not really know Joe. What a pleasure to learn there is no b.s. in Joe, and no big ego either. I did not make any money, nor get any new clients from this trip, but I did make a new friend in Joe Looby. Skeptics may think I’m just kissing up in the hopes of getting a part in an e-discovery sequel, but that’s not true. Joe’s next documentary will concern how emergency decisions are made in the oval office, think Cuban missile crisis. I for one cannot wait to see it. Joe is a true scholar and artist and is evolving beyond his roots in law. Unlike Jason and I, he will surely go on to bigger and better movies. It would not surprise me to see him at the Oscars some day.

e-Disclosure in London

Lord Chief JusticeWe showed the movie in London and had a panel, where, surprisingly the lawyers in attendance did not seem as engaged as the Swiss. We even served popcorn at this event, so go figure. Maybe it was because it was raining (but isn’t it always in London), or maybe it was that whole truth for justice approach that us yanks have. Anyway, Jason, Joe and I had a good time. By the way, they do not call it e-discovery in the UK, they call it e-disclosure. Also, and this amazes me, they do not take depositions over there, or least it is very rare. They just serve prepared statements on each other. That and produce the documents that they want you to see, and hide the rest. The Barristers must be very skilled at cross examination to earn their wigs.

The day after the London movie Jason and I were a keynote at the IQPC event at the Waldorf Astoria in London. We were billed as the great debate on Information Governance. Jason was pro, of course, and I was sort of against, as per my old blog post, Information Governance v Search: The Battle Lines Are Redrawn.  Our keynote was entitled: Let’s Have A Debate About Information Governance — Are We at the End or At the Beginning?

BaronsThe event was the IQPC 10th Anniversary of Information Governance and eDiscovery. Everyone there was either already an IG specialist or hoped to be one. In other words, I was there to argue to the audience that they were all wrong, that IG was dead. Needless to say, my presentation did not go over that well, and Jason soundly won. Even though the deck was stacked against me going in, Jason pulled out all the plugs to make sure he won decisively. I found out why he is banned by his family from playing Monopoly because he is over-competitive, a story he tells whenever he talks about cooperation. His beautiful wife Robin, shown right with Jason in Zurich, confirmed that story for me later. And much more, but I am sworn to secrecy.

british flagSo anyway, just to be sure that he beat me at the great debate, Jason changed the rules at the last moment to have some strange formal debate structure that I’d never heard of involving stop-watch timing, which he controlled. Then, at the closing he surprised me with a carefully scripted speech that he must have stayed up all night writing. He evoked Winston Churchill’s War Room, that was just a few blocks away, and then finished with a rousing quote from the end of Churchill’s most famous speech, We Shall Fight on the Beaches. The only thing missing was a Union Jack draped around his shoulders. The crowd went crazy with patriotic fervor and go-team IG enthusiasm. They will never surrender! It was the only time I saw London lawyers express any emotion. They were real quiet after I followed Churchill, I mean Baron, with my closing statement. Since Baron was Churchill urging all good British citizens to fight on for Information Governance, it was not hard to figure out who they thought I was. I was lucky to be able to goose step out of there alive.

Alison_NorthWell, at least I made some friends by my attack of the London IG establishment, including Alison North, another presenter at the event who is an IG expert herself. She was very nice, protected me from the flying umbrellas that came my way, and politely said she agreed with me. It was more of a whisper really. We sat together for most of the event after that. I was glad to meet such an obviously sophisticated, anti-establishment thinker. We even tried to build a structure out of toothpicks together to hold a marshmallow up in the air as high as possible. That is apparently what lawyers in London do for team building at CLEs. We were at the table with Craig Ball, who was very keen on winning this event. We spent a good fifteen minutes arguing with Craig the ethics of his interpretation of the contest rules. Even though I won that debate, I got called away, so as a team-builder, it was a another loss for me.

Ball_London_15Craig Ball gave the keynote presentation to kick-off the event the first thing in the morning. That is a difficult time slot and I thought he did a good job. As you can see from the photo I took, they had crazy disco type lighting. On stage it was hard for a speaker to see the audience over the bright lights. Craig made many attempts to humor and entertain the London IG professionals. I smiled and laughed a few times, but was alone. Most of Craig’s witty remarks did not even draw a smile, much less a laugh. Only when he made an off-color reference to Fifty Shades of Grey (who better than Craig to do that) did he get a laugh.

I learned a lesson from his start and did not even try for humor. Apparently it does not translate well into whatever language it is they speak over there. In fact the only speaker that was able to get the audience riled up was the Baron of IG himself with his Churchill impression. You know when Craig speaks here again he will surely quote Churchill at length.

Judge_LaporteOther presentations at the event included, U.S. magistrate Judge Elizabeth Laporte (shown right), whom I always enjoy hearing. She did very well with the British Judges on her panel, pointing out that if you are in her court, you have to follow U.S. rules requiring mutual full disclosure, like it or not. The rules of UK and other foreign courts are not what govern. Also presenting and moderating at many of the panels was the reporter, blogger, and retired Solicitor, Chris Dale, whom at the time I thought was a colleague and friend.

Also keynoting at the IQPC were Jeffrey Ritter, Professor of Law, Georgetown University; Jamie Brown, Global eDiscovery Counsel, UBS; Karen Watson, Digital Forensic Investigations, Betfair; Greg O’Connor, Global Head of Corporate, Policy and Regulation, Man Group; Anwar Mirza, Financial Systems Director, TNT Express; and, Jan-Johan Balkema, Global Master Data Manager, Akzo Nobel.

_Balazs_BucsayIn addition to debating Baron on IG, I presented with a reformed black-hat hacker, Balazs Bucsay, who now works for Vodaphone, and Judge Michael Hopmeier, Kingston-on-Thames Crown Court. We had a very short 35 minute panel presentation on cybersecurity. Hacker Bucsay, who is one scary guy, gave a demonstration where a volunteer came on stage and had his password hacked. Impressive. Judge Hopmeier – who was a great guy by the way, tech savvy, frank and outspoken – told everyone how many cybersecurity crimes he sees, and shared a story of a brilliant teenage hacker charged with a serious crimes, even though no money was taken. The kid did it for fun, much like Bucsay used to do. But often it is done by hardened criminals or terrorists. Judge Hopmeier well understands the problem. I hope he is invited to speak in the U.S. soon. We need to hear from him.

Data_Breach_Cost_2014I emphasized Judge Hopmeier’s points on the enormity of the problem, and the Billions of dollars now lost each year by cyber crimes. The average cost of a data breach last year was $3.5 Million. Then I closed with twelve pointers on what a lawyer can do about cyber crime to try to protect their legal practice and their client’s data:

  1. Invest in your company or law firm’s Cybersecurity.
  2. Think like a Hacker and allocate resources accordingly.
  3. Most Law Firms should Outsource primary targets.
  4. Keep Virus Protection in place and updated.
  5. Harden your IT Systems and Websites; $$ and people.
  6. Intrusion Response Focus (Hackers will get in).
  7. Penetration Testing and Vulnerability Scans
  8. Train and Test Employees on Phishing and Social Engineering; Reward/Discipline to prove you are serious.
  9. Be Careful with Cloud Providers and their Agreements.
  10. Buy as much Insurance as possible (insurer guessing game).
  11. Change Laws to make Software Cos Accountable for Errors.
  12. Update Anti-Hacking Laws.

Chinese-cyber-warIt was the only panel on cybersecurity at the IG CLE, which, as far as I am concerned, is a huge mistake. It was late in the day and not well attended. The IG crowd does not seem to grasp the importance of the problem. The Chinese Army applauds their apathy. Let me be very clear using a recent event as an example where they hacked the U.S. government employee database and email. If you are one of the four million past and present federal government employees impacted, the Chinese military not only knows where you live, and has your social security number, user names and passwords, they also know pretty much everything about your personal and professional life. Experts Say China Is Hacking Federal Employees’ Info to Create a Database of Government Workers.

If you are a federal employee who has been a bad boy or girl, say you had an affair, or took a bribe, or maybe you are paying brides to former high school students you molested years ago like Dennis Hastert, they probably know about that too. They read your emails, texts, and Facebook posts. If you have any kind of security clearance, they will have a couple of paid hackers monitoring your every move on the Net. If you were bad, or otherwise have something to hide, they will try will try to extort you. That is what spies do. The FBI is taking this seriously. The four million plus federal employees whose email was hacked should too.

Diner at the Savoy

SavoyI do not usually mention CLE speaker dinners, but the one hosted by Recommind at the IQPC deserves an exception. It was held at a private dining room in the Gordon Ramsay’s Savoy Grill, in The Savoy hotel. I stayed at the Savoy in Zurich and wish I had in London too. But do not waste your time eating at the other famous restaurant at the Savoy, Simpson’s-In-The-Strand. The atmosphere at Simpson’s was good, but not the food. Ramsay’s Savoy Grill, on the other hand, was so good that we went back there the next night. It was by far the best food we had in London, even though some of the waiters spoke with a fake French accent that sounded just like Steve Martin’s Inspector Clouseau. No. Hamburger was not on the menu.

Sherlock Holmes in the Twenty First CenturyWhat made the Recommind dinner special was the group of people they brought together as guests. This was primarily a group of young UK attorneys, the ones who specialize in e-disclosure. Many of them were not able to attend the IQPC event, but they did accept an invite from Recommind for dinner at the Savoy. Aside from the famous Chris Dale, there were only a couple of other speakers there. Most of the dinner guests were true London lawyers, with a couple of Americans lawyers thrown in, those who were lucky enough to be transferred to London. It was a sophisticated group of very smart creatives, all with lovely accents. I felt right at home will all of them and found we had much in common, including my London favorite, Sherlock Holmes.

RALPHCaricatureThis was not my first trip as a speaker to London. Last year I spoke about predictive coding at the famous Lincoln Inn, and also had a dinner with a small group of specialists and judges. That was sponsored by Kroll. I look forward to an opportunity to speak in London again. It is very important to both of our countries that we maintain a close relationship. Next time, however, I just want to speak about predictive coding and cybersecurity. I will leave IG to Jason. You know, old man, it is not really my cup of tea.


%d bloggers like this: