Big batch processing on Nuix in AWS
Written by Stephen Stewart
People have been asking about running the Nuix Engine in the cloud for nearly a decade. In fact, I created my original AWS account in 2011, almost exactly 10 years ago, to help answer that question. The cool thing is that, even then, Nuix ran just fine on an AWS EC2 instance. The reality is that Nuix will run great on any supported OS regardless of where it runs: cloud, virtual machine, ‘big iron,’ the server under the desk, or even the laptop at the coffee shop.
In all these years the one constant with the Nuix Engine has been the faster the gear, the faster the Engine will run. This was great in the early days with big hardware investments. As organizations increasingly virtualized their data centers and started exploring the cloud, the costs of replicating on-premises architecture in the cloud became a barrier. Replicating the combination of big iron, lots of fast disks, and networked storage is possible, but for a long time, it didn’t realize the dream of utility computing with the simplicity of “turning out the lights” when you are done and the meter stops spinning.
FAST FORWARD TO 2020
In August 2020, David Sitsky, Nuix’s Engineering Founder and Chief Scientist, was working with a customer to package the Nuix Engine into a Docker container and deploy it into AWS Fargate. This would allow Nuix to spin up/down to be run against ‘small batches’ of data, like emails or documents, as they arrived in S3. He was able to get it running, and it ran great!.
He then moved on to AWS Batch, which gave the full power of the Nuix Engine via AWS Elastic Container Service for running against massive data sets. This was amazing because it effectively means that from the AWS Batch UI/API you can spin up as many Nuix Engine instances as AWS will allow (and your credit card will bear) in an instant.
With these options at your disposal, you don’t have to worry about provisioning hardware or managing, patching, protecting, and upgrading servers. In addition, it’s the standard Nuix Engine running in a Docker Container, and since it’s creating a regular Nuix case the entire Nuix stack just works.
DEALING WITH PERSISTENCE
Even all this capability didn’t solve the case persistence issue. Basically, it didn’t make sense to create cases on EBS storage only to have to copy them all over the place and make sure that they were mounted to the correct EC2 instance.
Enter AWS Lustre, a high-performance distributed file system that grew up in the scientific computing world. Interestingly enough, that is where David got his start at Uni—building parallel processing systems to compute atmospheric models. The immediate value of Lustre was that it offered a high-performance shared file system that could be easily mounted to as many Linux machines running Nuix as you wanted. The kicker with Lustre is that it is backed by S3, meaning that you can store your evidence and even archive your Nuix cases out to S3!
This means you get:
- A high-performance shared file system
- Ease of use because all the systems can access it
- Durability because you can flush the Nuix cases out to S3 when you are done
- Cost-effectiveness because you don’t have to keep the Lustre file system continuously up and running.
WHAT ABOUT DEPLOYMENT?
With AWS Batch, ECS, S3, Lustre, and Nuix’s Cloud License Server (a key piece of the puzzle for smooth operation in cloud environments) we have all the pieces. That only really leaves the question of deployment. After all, getting all this stuff working together must be hard, and it was! But Sits took the time to create an AWS CloudFormation template that allows you to spin the whole thing up in your own AWS tenant in minutes.
Once the environment is up and running you can submit jobs via the AWS Batch UI or the AWS CLI. They both work equally well, and both result in Nuix cases stored on the Lustre filesystem that can be accessed using Nuix Workstation or Nuix Investigate (also spun up as batch jobs in AWS).
SCALING TO YOUR REQUIREMENTS
Over the years, I’ve worked with countless customers to size their environment and without fail, the day comes when they have a job that is too big for their current hardware environment. They then find themselves in a position where they are shuffling resources, reprioritizing, and basically stressing.
Running Nuix jobs in your own AWS tenant has never been easier. In the video below, I run through the following points:
- Deploying the entire environment with a cloud formation template
- Creating jobs manually
- Running a bunch of jobs by copying and pasting commands from a spreadsheet
- Opening all the cases from an instance of Nuix Workstation running on Linux VM
- Opening those same cases, stored on the same shared file system, via Nuix Investigate.
Talk to your Nuix account manager or, if you’re considering using Nuix for the first time, visit our Contact Us page to request more information.