Managing TIFFS in eDiscovery to control cost and time

Written by Joe Babineau


If you work in eDiscovery, you’ve worked with TIFFs at some point. Likely a lot. They are an essential component of the eDiscovery process, especially when dealing with combinations of electronic and scanned data. Have you ever felt, though, that TIFFing documents en masse is an archaic practice? Advances in eDiscovery technology and practices along with the emergence of more complete metadata information might give a more efficient alternative.



We very often continue practices simply because it’s the way we’re accustomed to doing things. Collections of TIFF files allow counsel to print and review them at their convenience. Let’s face it—you aren’t going to win that fight and convince every lawyer or litigation support staff that they don’t really have to print every document in a case.

The need to redact information and shelter it from either opposing counsel or the public is a strong driver for TIFFs as well. Redaction is most certainly not going away anytime soon! Still, end users are demanding and need tools that don’t further impose upon them, tools that allow multi-user access so that they can spread the load of redaction or even allow for auto redaction. They also need a TIFF engine that operates independently and doesn’t require them to monitor for error messages during processing.



TIFFing might be one of the key factors in the high cost of eDiscovery. TIFF files are notoriously large—they are at least the same size as their native source files. This means, at a minimum, they double the storage requirements in your production facility and your hosted environment. For the latter, you likely pay by the gigabyte. Is it really a good idea to double storage under those circumstances? Beyond hardware and storage costs is the labor. You have to pay an analyst to sit in front of a workstation while the TIFFs process, which is a 24/7 job—processing doesn’t just happen 9-5!



Some of this can be mitigated by employing better keyword searches and metadata to cull your dataset further before review or final use of the data. The more you can cull down, the less you have to TIFF. Less TIFFing directly results in lower expenses.

Metadata profiles for standard file formats—Microsoft Office files, PDFs—have so much more data than just the standard 30-or-so fields that most eDiscovery packages provide. For example, Outlook files have many MAPI fields that contain data that can be used; PDFs also have MAPI fields and many PDF-only fields that contain useful information that could be relevant for you screening process. Expand the use of this other metadata as part of your culling process and you’ll unlock efficiency you might not have known existed.



As I said before, TIFFs still have value and aren’t going on the endangered species list anytime soon. I think it’s important to understand where their value lies and use them intelligently, rather than immediately equating eDiscovery with the idea that “OK, let’s TIFF everything we’ve got.”

Redaction is the easiest way to prevent the wrong parties from seeing certain information. A TIFF is much harder to edit than an Office document or a PDF, meaning that it’s also harder for a third party to interfere with. Once the TIFF has been output by the eDiscovery system, it is very, very hard to undo or alter the file. This is, in part, why TIFFs are a preferred file format by the courts—they have confidence that the information they are seeing is unaltered and reliable.

As I’ve already mentioned, though, this also means that working with TIFFs is more expensive. TIFFs are still the greatest cost related to the courts in eDiscovery, and cost is something we’re all pressed to get better control of.

As an alternative, consider using file privileges to control access to certain documents, if possible, instead of redaction. While this practice might only work for some cases, it could help reduce the number of TIFFs you’re forced to produce.

Ultimately, the best thing you can do is to focus on efficiently and reliably culling case data before production. Take advantage of the metadata at your disposal, employ better keyword and even advanced analytics or visualizations during the early case assessment phase, and produce only what you need to produce. It will result in less cost across the board and a manageable set of files to work with in the end.


Hear more from Joe on the Nuix Unscripted podcast 'Friday Night eDiscovery'.