Get on the Bus! The Kafka Bus for Realtime Processing and Investigations
As the world worked to respond to COVID-19, almost every company in the world asked, “What can we do to help?” At Nuix, most of our use cases are geared towards protection, detection, and investigation, and while we will be there to help combat the inevitable fraud that most experts predict is coming, I struggled to get my head around how we could apply Nuix’s software to help now.
Then I saw a demo that Khaled Hegazy, one of our solution consultants in the Middle East, put together of Nuix being used to investigate the propagation of COVID-19. It did a great job of showing how the basic features of Nuix—data processing, search, timelining, mapping, etc.—could be applied to this problem.
However, even with a demo as inspiration, I couldn’t get my head around “the data.” The reality is that contact tracing is not a digital forensic investigation. There are no massive quantities of data that need to be processed. Instead, contact tracing is being done by armies of people armed with clip boards and telephones calling people and asking where they’ve been and with whom they have been in contact.
Not daunted at all, I started to think about the problem further. It sounded something like this:
“Humm, I wonder if we build a web front-end that could collect data in real-time from contact tracers and instantly upload it into Nuix, where it’s ready to be searched, time-lined, and plotted on a map?”
This simple idea, combined with a little creativity and some teamwork, has reshaped the boundaries of where Nuix’s software can be deployed. In under five hours, I was able to build a very simple web app that would take web input, convert it to JSON, drop it onto a Kafka topic, have Nuix Workstation pick it up in real-time, and index and make it available for search in Nuix.
At this point, I was confident that I was onto something cool, but I was quickly getting out of my depth. With that, I reached out to Dan Berry, Head of Innovation, for some help. He proposed we hold a “Hackathon” to build on the basic concept I’ve already outlined. We wanted to keep it simple for the first one, so we opted for an internal only event.
NOTE: Plans are in the works to run some more of these but with the entire Nuix ecosystem.
‘Hacking’ a Solution
We had about 15 people participate, and the results were amazing. In about five hours we were able to take a basic prototype and flesh out something that works pretty darn well J. The key is that most of the heavy lifting and infrastructure was ‘out of the box’ Nuix. The only custom stuff was the web front end, including:
- A custom user interface (big shout out to Cameron Stiller from Dan’s team down in Oz for taking on the UI)
- The Python Flask web server
- Some worker side scripting from Master Solution Consultant John Bargiel.
If you want all the gory details check out https://github.com/Nuix/nuix-contact-tracing. The GitRepo contains the web app, the worker side script, and some links on how to get Kafka setup.
The key prerequisites are:
- Nuix Workstation with real-time evidence type
- Elasticsearch 6.8 - Embedded Elastic cases will work fine
- Kafka installed and running
- Python Flask Web Server.
Check out this video to see it in action.
About that “Bus”
For those thinking about “getting on the bus,” it is crazy simple. If you drop a JSON document onto a Kafka topic, Nuix will extract it and convert the JSON into item properties. Nested properties are handled by prefixing each key in the key | value pair.
If you really want to get crazy, you can encode an uploaded file as a blob in the JSON then use Nuix WSS scripting to decode it and add it as a child item.
If you check out https://github.com/Nuix/nuix-contact-tracing you’ll see that this isn’t terribly complicated stuff. The majority of the work is handled by the Nuix stack!!!
Take a look and I hope that you will get the bus and open up all sorts of new use cases with Nuix!
Take care and stay healthy!!!