Guinea Pigs Finally Liberated by Peck: Computer-Assisted Review Gets Thumbs-Up and Counsel Gets A Well-Reasoned How-To

Guinea Pigs Finally Liberated by Peck: Computer-Assisted Review Gets Thumbs-Up and Counsel Gets A Well-Reasoned How-To

Last week brought good news for young associates, weary of large-scale document review assignments, and partners too scared to utilize computer-assisted review instead of their overworked underlings: the Southern District of New York approved the use of computer-assisted review to identify documents relevant to discovery requests.  Counsel no longer have to worry about being guinea pigs for judicial acceptance of computer-assisted review.

Although attorneys have been using computer-assisted review for some time, we have been waiting for a judicial decision approving the technology – and hopefully setting forth some guidelines.  Moore v. Publicis gave Judge Andrew J. Peck, U.S.M.J., author of e-discovery articles and an e-discovery advocate, an opportunity to heed the call. 

The Back Story

In Moore v. Publicis, plaintiffs alleged that defendants, Publicis Groupe & MSL Group, engaged in systematic, company-wide gender discrimination against female employees like plaintiffs by limiting women to entry-level positions.  Counsel sought to cull discovery responsive to plaintiffs’ first round of requests from three million electronic documents using predictive coding technology.

You’re thinking, “Woah, eDiscoverista, slow down.  Computer-assisted review?  Predictive coding technology? What the heck is that?”  (You were thinking that, weren’t you…)  Instead of low-level associates combing through box after box of printed discovery documents, computer-assisted review uses one or a combination of computer functions to identify responsive documents.  Predictive coding technology (different vendors use different names) is a method of computer-assisted review.

Judge Peck’s Search, Forward walks us through a typical predictive coding protocol: First, senior counsel chooses “seed set” documents.  Specialized software identifies properties of those documents and uses them to select other similar documents from the discovery pool.  As the senior reviewer codes more sample documents, the computer predicts the reviewer’s coding (or asks for feedback.)  When the computer’s predictions and the reviewer’s coding sufficiently coincide, the computer has learned enough to confidently review the remaining documents.  I like to think of it as Pandora for document review.  Spam filters work the same way.


Moore v. Publicis

The Bar has been impatiently waiting for judicial approval of computer-assisted review because the technology is more speedy, cost-effective, and accurate (maybe – but more on that later) than manual review.  On the other hand, it raises many questions in connection with practicality, and procedural and evidentiary rules.  The Court used Moore v. Publicis to set forth a well-reasoned discussion of those issues.

Prior to conducting any discovery, the parties had to agree on the method to be used to cull responsive documents.  The first disagreement arose over defendants’ desire to limit predictive code technology costs to $200,000 by reviewing and producing only the 40,000 most relevant documents.  The Court ruled that a cut-off can be determined only after results are obtained — If stopping at 40,000 would leave a large number of highly responsive documents unproduced, such a limitation would be impermissible.

Second, the parties had to agree on which custodians’ emails would be reviewed.  They agreed to discovery in stages, the first of which would include defendants’ executive team, HR staff, and managing directors.  The Court did not allow the parties to use the predictive code technology to compare emails from seven male employees (“comparators”) with plaintiffs, because plaintiffs could not explain how the comparators’ emails could be meaningfully searched; and because similar information could be obtained in depositions.  The Court also agreed to extend discovery, if needed, to permit the conclusion of the first stage of discovery before beginning a second stage.

Then, the parties considered which sources should be reviewed, eventually settled on the defendants’ email archives, HR information management system, and certain other sources.  However, if plaintiffs sought to add additional sources, they would have to comply with Rule 26(b)(2)(C) by (1) explaining why (plaintiffs, as former employees, probably knew where to look for relevant documents), and (2) limiting redundancy.

Defendants would provide all seed set documents for plaintiffs’ review and more issue tagging.  Defendants would then incorporate plaintiffs’ issue tags into the coding system.  The parties also collected and incorporated the 50 most-responsive documents to keyword searches with Boolean connectors.  Defendants promised that all documents collected as a result of the seed set-related search would be classified as either responsive or non-responsive and produced (with the exception of documents subject to privilege).

To train the predictive coding software, defendants used iterative rounds in which counsel and the software reviewed and ranked documents on a scale of 100 to 0 – from most likely relevant to least likely relevant.  Each round ranked 500 documents from different concept clusters (developed by issue-tagging documents).  Plaintiffs insisted that, if the software was not sufficiently trained at the end of seven rounds, additional rounds would be undertaken until the computer’s coding was stabilized.


The Rules

In response to plaintiffs’ objections to the above protocol, the Court found that Rule 26(g)(1)(A) has no implications for computer-assisted review, because it requires only that initial disclosures be complete, not discovery responses.  Because coding and seed sets are based on human and computer interaction, and because the predictive technology is so new, the Court indicated that counsel could never certify that discovery responses were complete.  Even if counsel were manually reviewing documents, it would be very difficult to certify, in good faith, that discovery was complete.

Rather, it is Rule 26(g)(1)(B) that applies to discovery.  Luckily, Rule 26(g)(1)(B)  incorporates 26(b)(2)(C), the proportionality principle.  Proportionality was illustrated in this case when the Court refused to limit the number of documents to be produced based on cost alone.  But there are other ways proportionality can be achieved – it’s a subjective standard.

Additionally, Daubert is inapplicable because Fed.R.Evid. 702 concerns admissibility at trial, not the reliability of a discovery protocol.  Judge Peck noted that he might be interested in how counsel chose seed documents or determined whether predicted documents were responsive, but would probably not be interested in the computer science behind the vendor’s predictive technology.

Judge Peck cautioned that concerns about relevance are best discussed after predictive coding has been used and documents identified, as opposed to preemptively criticizing a party’s methodology.  Relevance is determined by parties’ document demands, and where opposing parties are (1) provided with seed documents, (2) allowed to issue tag, and (3) both relevant and irrelevant results are exchanged, relevance objections are less persuasive to the court.  If a smoking gun is identified in the first stage of review, for example, the computer can be retrained to identify like documents.


Peck’s How-To

The objective of review in e-discovery is to identify as many relevant documents as possible, while reviewing as few non-relevant documents as possible.  Recall is the fraction of relevant documents identified during a review; precision is the fraction of identified documents that are relevant.  Thus, recall is a measure of completeness, while precision is a measure of accuracy or correctness.  The goal is for the review method to result in higher recall and higher precision than another review method, at a cost proportionate to the “value” of the case.  See Grossman and Cormack’s tech-assisted review article.

Statistics show that computerized searches are at least as accurate, if not more so, than manual review.  Plus, they require much less effort.  In fact, technology-assisted reviews require, on average, human review of only 1.9% of the documents, a fifty-fold savings over exhaustive manual review. See Grossman and Cormack’s tech-assisted review article.

But, beware of keyword searches, which Peck surmises are a lot like playing Go Fish – the requesting party guesses which keywords might produce evidence, without knowing anything about the other party’s “cards.” See Go Fish.  Not to mention, keyword searches tend to be over-inclusive, and produce many irrelevant documents. See Responsiveness in E-Discovery

These are a few final morsels of wisdom revealed in the Moore v. Publicis opinion:

  1. The Federal Rules of Civil Procedure do not require perfection. See e.g. Montreal Pension PlanRule 1 requires the just, speedy, and inexpensive determination of lawsuits.  26(b)(2)(C) reinforces Rule 1.
  2. The Court strongly endorses The Sedona Conference Proclamation.
  3. Discovery in stages controls costs.  Start with the most relevant sources, without prejudice to the requesting party seeking more if the first stages don’t work out.
  4. Seek and utilize your client’s knowledge about the opposing party’s custodians and document sources.  Familiarize yourself with you own client’s custodians and business terminology.
  5. Bring your geek to court.  Your e-discovery vendor can help explain computer-assisted review protocols and answer questions.  BUT make sure she can break it down for the less tech-savvy.

E-discovery Lessons Learned?

You are now authorized to utilize computer-assisted review.  But don’t forget to design an appropriate process (which always includes quality control testing), based on a full consideration of the technology available.


Leah R. Glasofer, the eDiscoverista, received her B.A. in Environmental Policy from American University and J.D. from Seton Hall Law.  Leah has clerked for captive counsel of a major insurance carrier, and also for Assignment Judge Yolanda Ciccone in Somerset County.  She is now an associate at Graham Curtin, P.A. in Morristown, New Jersey.  Leah concentrates her practice in litigation, with an emphasis on professional liability defense, insurance and personal injury defense, and employment.   

Leave a Reply

  • Find an eLesson

  • Register for Post Notifications

    Subscribe to receive updates whenever a new eLesson is published.

    Manage Subscriptions
  • Let Us Blog Your Event!

    eLessons Learned is fast becoming the site of choice for employers, employees, judges, lawyers, and journalists who are interested in learning more about these areas without being intimidated by the complexity of the topic. In fact, organizations and event coordinators often feature eLessons Learned as their official eDiscovery blog. Fill out our simple registration form to have eLessons Learned be the official blog of your organization or event.

    Register Now
  • Recent Praise

    The blog takes a clever approach to [e-discovery]. Each post discusses an e-discovery case that involves an e-discovery mishap, generally by a company employee. It discusses the conduct that constituted the mishap and then offers its ‘e-lesson’ — a suggestion on how to learn from the mistake and avoid it happening to you.

    Robert Ambrogi

    Legal Tech Blogger and creator of LawSites

    Although I may have missed some, yours is the first article that I have seen addressing Zubulake II. It is often the lost opinion amongst the others.

    Laura A. Zubulake

    Plaintiff, Zubulake v. UBS Warburg

    Click here to see more.