Searching Through Gigabytes of Data? Predictive Coding Is the Answer

Searching Through Gigabytes of Data? Predictive Coding Is the Answer

There is nothing more daunting then receiving a request for a backup drive with 1 or more gigabytes of data on it.  The good news is that the courts have recently allowed the use of a new tool that can save business owners time and money: predictive coding.

In the case of Dynamo Holdings v. Commissioner of Internal Revenue, the court allowed the use of predictive coding in order to identify relevant and confidential information stored on a company backup drive.  This was one step in the course of court technology efficiency, but a giant step in the world of electronic discovery!

Whether or not the parties were allowed to use predictive coding became a central issue because the back up drive in question held approximately one gigabyte of electronic data.  Just to give you a frame of reference, this equates to approximately 200,000 to 400,000 individual documents.  The producing party estimated that is would cost them about $450,000 just to review all the data before giving it to the opposing party.

The producing party and the client paying for the discovery was daunted with the idea or spending that much time and money just reviewing documents.  Also, the alternative of giving up the data without reviewing it could be detrimental to their case.  In the end, the cost effective, technological answer was predictive coding.

This opinion was highly influenced by the article written by Magistrate Judge Andrew Peck’ who describes predictive coding.[1]  Predictive coding is a process that essentially can predict the relevance of documents and identify which documents are not responsive.  Judge Peck’s article explains that the computer identifies properties of documents and uses those properties to code other documents.

As more sample documents are coded, the computer actually predicts the future coding.  In a way, predictive coding is a reviewer teaching the computer what types of documents are relevant and what is confidential.  Judge Peck states in his article that it usually takes only a few thousand documents to train the computer, which, compared to one gigabyte of data, is a drop in the bucket.

In other words, predictive coding is a tool that uses algorithms to search rather than manually reinventing the wheel every time a labor-intensive discovery request is made.  The algorithms use keywords, dates, custodians, and documents types in order to filter through hundreds of thousands of documents in a drastically shortened period of time.

Now, some may be thinking, “how do you know that coding is producing the correct results?”  Senior reviewers take samples throughout the process in order to determine the accuracy of the results. Additionally, a log can be produced detailing the records that were withheld and the reasons for doing so.

This process may not be as simple as implementing a “claw back” provision (aka. a party can recall a document that was not supposed to be produced); however, it presents an accurate and efficient way to move along a trial and discovery process while mitigating harm to the party producing the information.

Judge Buch weighed the interest of both parties: receiving party wanted as many documents as could be produced, and producing party wanted to protect the client from producing irrelevant or confidential documents.  The predictive coding process was considered: (1) restore some or all of the date from the tapes; (2) Qualify the restored date; (3) Index and load the qualified restored date into a review environment; (5) use predictive coding to review the remaining data using search criteria that the parties agree upon; and (6) produce the relevant non-privileged information and privilege log that sets forth claimed privileged documents.

In the end, Judge Buch’s conclusion was very clear.  Predictive coding is an acceptable electronic search tool that can be used during the discovery process.


Victoria O’Connor Blazeski (formerly Victoria L. O’Connor) received her B.S. form Stevens Institute of Technology, and she will receive her J.D. from Seton Hall University School of Law in 2015.  Prior to law school, she worked as an account manager in the Corporate Tax Provision department of Thomson Reuters, Tax & Accounting.  Victoria is a former D3 college basketball player, and she has an interest in tax law and civil litigation.  After graduating, she will clerk for the Hon. Joseph M. Andresini, J.T.C. in the Tax Courts of New Jersey.  

Want to read more articles like this?  Sign up for our post notification newsletter, here.

[1] For information about predictive coding, see Magistrate Judge Andrew Peck’s published article:  Search, Forward: Will Manual Document Review and Keyboard Searches be Replaced by Computer-Assisted Coding?, L. Tech. News (Oct. 2011).

Leave a Reply

  • Find an eLesson

  • Register for Post Notifications

    Subscribe to receive updates whenever a new eLesson is published.

    Manage Subscriptions
  • Let Us Blog Your Event!

    eLessons Learned is fast becoming the site of choice for employers, employees, judges, lawyers, and journalists who are interested in learning more about these areas without being intimidated by the complexity of the topic. In fact, organizations and event coordinators often feature eLessons Learned as their official eDiscovery blog. Fill out our simple registration form to have eLessons Learned be the official blog of your organization or event.

    Register Now
  • Recent Praise

    The blog takes a clever approach to [e-discovery]. Each post discusses an e-discovery case that involves an e-discovery mishap, generally by a company employee. It discusses the conduct that constituted the mishap and then offers its ‘e-lesson’ — a suggestion on how to learn from the mistake and avoid it happening to you.

    Robert Ambrogi

    Legal Tech Blogger and creator of LawSites

    Although I may have missed some, yours is the first article that I have seen addressing Zubulake II. It is often the lost opinion amongst the others.

    Laura A. Zubulake

    Plaintiff, Zubulake v. UBS Warburg

    Click here to see more.