Powerful data enrichment is just an API away
Written by Stephen Stewart
The Nuix Engine is superb at extracting text and metadata from all manner of data, enriching it with additional metadata, and then storing it into a searchable index. Our customers and partners are always looking to push the boundaries of what they can achieve using our software. We’re often asked about advanced linguistics with NLP, translation, transcription, geo-IP lookup, etc. The list goes on and on. The reality is the level of innovation in these areas is phenomenal, with entire companies devoted to expertise in one area. This is both an opportunity and a challenge. Basically, how do you choose “the one,” particularly when Nuix helps address so many different use cases?
A meeting last fall with a federal agency where I was asked “What is it exactly that you do?” got me thinking about how to pictorialize the Nuix data ecosystem. The most concise way I could describe is that we “make data searchable.” This in turn yielded more questions, but the reality is that Nuix is world-class at taking the 10 dimensions of data and putting it into a format that’s useful.
For those of you who have been around Nuix for a while, you will recognize the “10 Dimensions of Data” arrow on the left, the Nuix Engine in the middle, and the various UI screenshots on the right in the diagram below. This is the traditional way we’ve talked about Nuix for a decade.
NOTE: The logos shown above are examples of companies that have APIs. This list does not represent Nuix’s partner list OR a specific endorsement. These logos merely are a sampling of the types of companies that have amazing APIs that can be used to enrich the data in Nuix.
One of the coolest aspects of the technology’s evolution is the tremendous variability of ways in which we can enrich data with 3rd party APIs. This includes everything from translation and transcription to geoIP look-up.
We are not world-class at translation, transcription, NLP, or any of these other specialized functions. However, our ability to initially store the enriched data in a searchable, viewable fashion is really cool. It basically means that you can take advantage of best-in-class enrichment, but never sacrifice the search, view, and tagging experiences in Nuix.
Another great part is that it’s easier than you may think with Nuix’s scripting API. I’ve been on a mission recently to improve my Python skills—this was about as easy as it gets. The below script will iterate through the properties and text of the selected items and bounce them off a Docker container running LanguageCrunch and then write extracted entities back as Nuix custom metadata.
Access this script at https://github.com/Nuix/nuix-cto-blog-posts/blob/master/simple.nlp.w.language.crunch.py
This script just pulls the Person entity but could just as easily be adjusted for sentiment analysis OR a custom classification. If you then want to search on it, you can query the custom metadata: custom-metadata:"custom_person_entities":*fred*. With this simple approach, you can really start to think about how you can get a load of extra value out of Nuix. As a bonus, if you’re using the Elastic backend (embedded or standalone), you have the full Nuix search syntax available.
The world has evolved, and some organizations are now looking at us beyond our traditional use cases. They are considering using Nuix as part of a search and analytics pipeline, where in some instances all they want to do is extract the text and metadata and “Export cleansed, enriched, usable information” out to an AWS S3 bucket / Azure Blob, where they can target it with whatever analytics platform they choose.
This is also easier than you may think. Below is a Ruby worker side script that will blast items out JSON. It requires the SuperUtilities.jar, which you can get from the Nuix Developer Portal.
Access this script at https://github.com/Nuix/nuix-cto-blog-posts/blob/master/WSS-SuperUtilities-JSON.rb
Lastly, we’re doing some work that will allow you to post a Nuix GUID query to Nuix Investigate and have those items show up in the search grid. This is especially helpful if you farm out analytics as part of your workflow.
While many of us are currently confined and working remotely, there’s an opportunity to think about how you can take full advantage of the amazing collection of data enrichment opportunities that are just an API away.
If you want to hear more, check out our recent podcast episode where I talked more about the Nuix APIs, and don’t hesitate to check out the Nuix Community for additional ideas.