Skip to main content

Tying Deduplication to your eDiscovery Bottom Line

Birthday Cake

Doing a project a while back, I ran across an area where I couldn’t find any solid, current numbers pertaining to eDiscovery. What I needed was the average amount of information an eDiscovery team can expect to reduce through the process of deduplication (or ‘deduping’).

Instead of spending too much time combing through reports, I reached out to my ‘human search engine’ for eDiscovery matters, Nuix USG Solutions Consultant Joe Babineau. You’d be hard-pressed to find someone with more eDiscovery knowledge—although we’ve got some contenders within our own walls—and he'd written previously about global and custodial deduplication using Nuix software.

Typical Duplicate Rates

It turns out there’s a reason that the numbers are hard to pin down in any of the available research. “It varies widely, based on many factors,” Joe responded when I asked about average eDiscovery deduplication numbers. “I always tell customers, however, that they can expect somewhere between 30-50% duplication in their datasets.”

I expected a large number given how important deduplication is during the eDiscovery process, but 30-50% is notable. Interested, I pressed on and inquired about the culprits: What data is most likely to be duplicated across a matter?

“For eDiscovery, we always dedupe based on families,” Joe explained, “so it’s not loose files … think Office docs. Typically, it’s multiple copies of the same email in different people’s inboxes.”

And there’s even more to it which, as it turns out, is a benefit to your eDiscovery budget.

“Add in email threads where previous messages are copied in the replies. And also office email blasts like birthday and party notices. There are a lot more of those than people think.”

“And, of course, we all get the same spam! For some reason people keep those messages too, which we catch during deduplication.”

Birthday Cake
You may want to go to the party, but does the birthday party email really need to go through eDiscovery review? Photo by: Jon Phillips

Effect on your eDiscovery Budget

Many organizations have some form of internal eDiscovery or early case assessment before turning data over to an eDiscovery review platform for attorney review. The cost associated with review is directly tied to the amount of information you’re sending out to the review vendor.

Using Nuix eDiscovery software, you have a number of ways to ‘slice and dice’ your data during and after processing, reducing the number of files you need to manually comb through. Reducing your case sizes considerably at the front-end results in cost savings throughout the process, through export and case review. It also makes legal review much more efficient and gives you the added comfort of knowing you’re only turning over the information you need to, retaining control of as much of your data as possible.

As for the email users in your organization who like to hold on to spam? While they might like to hold onto emails from their distant cousin who happens to be a Nigerian prince in need of their assistance, maybe it’s time to talk about email security best practices with them as well and delete those messages. I’d say that would be for the best, eDiscovery concerns or not.