Skip to main content

A Word from Our Experts—The Future of Data Lakes

Winter Lake

In the two previous episodes of this series I discussed Nuix-indexed data lake flexibility for data types, architectural concerns, and various trends, and explored 5 customer case studies. So, what does the future hold for unstructured data lakes? How will they evolve? And will they continue to exist and provide value?

In today's episode, let’s explore these and several other pertinent questions. I’ve asked a few experts to weigh in with their thoughts and to share their experience and opinions.

Keep it Clean

Most of the people I ask believe that for a data lake to survive, it can't be another "dumping ground" like many of the archives and enterprise content management platforms we’ve seen. If you decide to stockpile more useless data in the lake, it will become polluted and less useful. There’s an inverse relationship between data lake usefulness and dumping garbage in it.

We believe the basic premise is that governance must begin on the front end, rather than attempting more cleanup efforts after the data lake gets dirty. Generally speaking, as data streams INTO the lake, likewise some data should stream out. The data lake can't be "just another information lifecycle management or governance exercise," as one engineer put it.

"Use it or lose it" is a fitting aphorism.

Winter lake
Your data lake should be clean, devoid of detritus and data garbage. Photo by: Micah Sheldon

Asking the Experts

John Bargiel is a Senior Principal Solutions Consultant at Nuix who works extensively with Fortune-500 companies to understand the value of unstructured data repositories. I asked him where he sees data lakes evolving in the next 18-24 months.

"As organizations realize the benefits of decreasing response time, reducing redundant processes and spend, and access to ongoing business intelligence, I think we'll see data lake models become the norm,” he said. “ There are a lot of use cases applicable to this model and it does not need to be an exhaustive repository of all an organization’s data. Targeted data lakes composed of data sources like frequently litigated custodians, FOIA responses, and results of internal investigations all provide opportunities to accelerate workflows and increase available intelligence.

“Basically, organizations become smarter and faster when they have a continuous window into the disposition and composition of high value data."

I followed up by asking about pitfalls organizations should look out for.

"Data lakes go hand-in-hand with forward thinking and good data governance policies,” he replied. “The more stakeholders or business units that want to leverage a shared data repository, the more you need to think about issues around risk and compliance. This doesn't mean it isn't worth doing; as we've seen retaining and utilizing business intelligence from previous reviews, investigations, and compliance exercises adds tremendous value and increases efficiency. It does need buy-in across business units and conversation and planning around questions like ‘who has access to what data’ and ‘what are the various retention policies’."

Alex Chatzistamatis, also a Principal Solutions Consultant with Nuix, added that the concept of the "always-on" data lake will serve many purposes for companies. The Nuix platform will be fed by "streams" of data coming from the data lake ... the Nuix vision of taking data from threat detection to collections, processing, and review requires the data lake itself to become the "fuel" for the platform's success. In turn, the lake needs to be filled by "feeder streams" of different types of data.

Both engineers also agreed that Elastic will be fundamental to the future of these data lakes. Having the ability to leverage massive scale, speed, and redundancy/fault tolerance will be critical to success. Chatzistamatis mentioned that incorporating cloud services, analytics like Tableau, and machine learning will enter the conversation soon as well.

Bargiel went on to envision using Nuix to connect to already structured repositories like Hadoop—which has its own vision for data lakes—as another probable outcome.

What Kind of Data Will Fill Tomorrow’s Lakes?

Bargiel and Chatzistamatis both see Nuix reaching out to endpoints for sensitive data; creating connection streams to Twitter, Facebook, and other social media; and intelligence from financial markets, HR databases, and anti-money laundering databases for more visibility across the whole of your data.

David Robinson, a senior project manager and Solutions Consultant, said “I am excited to see how Nuix Endpoint’s newest features and evolution expands our capabilities for intelligent data lakes. Nuix Endpoint can now proactively scan and identify potential privacy, PII, etc. AND collect that data from where the users sit. It creates tributaries of inbound data with surgical precision.”

We will certainly continue to expand our capabilities and consciousness at Nuix. There’s certainly more to come in our progress helping customers create and index data lakes, now and into the future. Keep an eye on the blog, upcoming webinars, and events to stay updated on how we’re helping our customers answer their greatest challenges!