Getting started with Scripting on the Nuix Engine
Written by Cameron Stiller
The Nuix Engine can be applied in many ways to many problems. When I try to explain the Nuix Engine to people who are not from the legal or eDiscovery sectors, I often use the analogy of a universal unzip tool that will not only extract the binaries of child objects but also their text and metadata. It will also identify what those extracted items are; optionally storing them in a case for further indexing and searching.
Considering the Nuix Engine supports many thousands of MIME types, including zips, it most likely is the most extensive data extraction tool in existence. The aim of this article is to introduce you to the main entry points for building a script that will allow you to tap into the power of the Nuix Engine.
All good dev posts include a link to documentation! You can access the documentation you need by visiting our download site (note, you’ll need a username and password to access this site) or by opening Nuix Workstation and going to Help → Help Topics.
I recommend you start your journey with the script console open (open Nuix Workstation Scripts→ Script Console) with the documentation nearby.
WHERE CAN I RUN SCRIPTS?
You can run scripts a few ways with the Nuix Engine. Specifically, they can be run on the console, on the welcome screen, inside a case, at the worker, or in the results view table. Before starting your script, think about what sort of automation you intend to create.
Let's start with the command line options. Open the terminal of your choice, navigate to the location Nuix is installed, and run the following command (in this case, assuming Nuix is installed on Windows inside Program Files\Nuix\Nuix 9.0:
If you have trouble with this, you may need to add one more switch for workers. If you have multiple licenses, you’ll be prompted to select a license. The good news is, those switches the tool suggests can be copied so you won’t get prompted next time. Stash them away and we’ll revisit them soon:
When the screen shows “irb” you’re good to go:
This is the Interactive Ruby console. You can interact with the Nuix Engine directly here without any prewritten script. For example, we can run commands like this, which will output ‘Hello World’ with the license description.
While this approach has its uses, it’s not the most convenient way to write a script. I’ve fallen in love with using the interactive console for MIME type detection on a file or running a quick test on Nuix Engine capabilities.
Command Line Script
The inevitable question that follows is “How do I save my script?” Writing code every time is a real pain!
Remember those switches we stashed away earlier? Bring them out and let's play. Remove the “-interactive” switch and append the licensing switches. The very last parameter is a prewritten script (Ruby, Python, ECMA) followed by the inputs.
Any parameters following the script will present themselves as the script arguments rather than Nuix arguments. Make sure your script is the last and the script is followed by any input you need. In the example above, "Cameron" will be passed along to the script.
At this point, you may ask “What’s the point?” You have access to interactive and all that is changed is the code is now a file. That’s not entirely true! You now have a reusable script, and you can write code to capture the inputs. Combining that with the switches for all our licensing needs this can be run completely headless. Headless scripts are super useful for scheduled jobs or automation pieces triggered by another source. They will spin up, consume a license, do their automation and then close.
Some of the crazy ideas I have seen using this approach include:
- A small Windows menu to right-click a file and send the file to Nuix for processing automatically
- Conducting daily case audits in a directory
- Migrating cases and creating reports.
NUIX WORKSTATION: SCRIPT CONSOLE/SCRIPT MENU
What if you want to have a license and run multiple scripts? Say hello to Nuix Workstation and the scripts menu/scripts console, which is available even before opening a case. Clicking the scripts menu shows a list of installed scripts. Start in the script console; it has a similar sort of feel to a script editor that allows you to write and execute scripts.
Script console is my favorite place to hang out when you get down to it. Drafting and testing scripts requires an iterative approach: Write, test, break, write, test, break. Console testing drives me nuts waiting for the Engine to spin up when I only want to work with a quick bit of code. Script console is by far the script writer’s preferred place to be with one major catch: There is no auto-drafting (there is an in-memory cache, but it only lasts while in session). If you accidentally close that window or case, your script goes poof. Ah man! What a bummer!
However, the experience gives you a cancelation ability, so if a bit of code is misbehaving you can easily cancel.
Introducing The Script Directory
Anything in the script directory is here to stay. Scripts in the directory can also be run on demand under the scripts menu, which is a nice user experience. You can even style these scripts by wrapping them in a ‘.nuixscript’ wrapper (see Scripting → Advanced Help in the changelog). However, once started a script cannot be canceled.
Scripts can also have a small amount of header information in the script directory to control if they require selected items or a case open. This automatically disables the script so users aren’t tempted to click it.
Let's get serious now, users are not a big fan of automation if they have no real ability to interact with it. Good news! You have access to everything that is in Java, so swing, input boxes, JDialog and all that goodness is at your fingertips. If doing that is not up your alley you can jump on our GitHub to see some examples or pull one of our utility jars down (my favorite is the nx.jar).
EVEN MORE POSSIBILITIES
We’ve now talked about how to interact directly with the Nuix Engine from a script and from within Nuix Workstation. There are two more ways to run scripts that are unique in their design.
Scripted metadata can be created via the metadata profile. Have you ever wanted a column formatted in just a certain way or to provide a combination of fields based on a condition? Scripted metadata is generated at the time of the results view being shown. It’s unique per row of the results view so be careful connecting with external resources (thread-safe + multiple IO requests), but otherwise, it can do some amazing things.
I wouldn’t recommend it because of the IO demands of the case, but you could even tag an item every time it is viewed or calculate the result on the first view and then cache it as custom metadata for next time. If the calculation takes a long time this may be worthwhile, but personally, I prefer to do a bulkAnnotater job!
The benefits of scripted metadata really come into play when you want to present a value to a user in a particular way based on a condition but only on demand. This can be instrumental in making an export profile look perfect or show only some details when relevant and not store them in the case. Lots of fields and data may bloat your case, so having data only on demand can shrink this down.
There are some examples of scripted metadata available on GitHub -> Scripted Metadata Profiles.
This simple scripted metadata example will display the datestamp of when the item was last processed:
Worker Side Scripts
These scripts are potentially the most versatile as they operate on the item as it gets processed by the workers. This means that without any developer effort, we now have access to threads that can operate continuously across billions of items. With worker-side scripts you can focus on filtering, hydrating, morphing and reporting on the data being processed, in-flight, prior to the item being written to the Nuix index.
Before I go any further, you can get the worker-side script guide on our download site as well. As I mentioned earlier, you will need a username and password to access that site.
For example, what if I was provided with a flat list of CSV records? Utterly boring to process, right? Why bother? Well, if I was told they were call records and the client wanted to have them appear in communication searches and the timestamps of the communication could overlay the activities of the investigation, my opinion about the list would totally change.
With the worker side script, we can look at the properties being brought in by the CSV, each of the columns being moved (morphing) into the appropriate communication field and using an external source to provide the phone number to alias (hydrating). Once completed, it’s likely that billions of records would be deemed as too many to review (who would have thought, right?).
Supplying a list of known ‘exempt’ internal numbers to another worker side script allows it to check for any of the exempt aliases calling external numbers not owned by a staff member, culling the records from billions to about 6,000 in one shot. This is much easier to review, made even easier by searching across dates and custodians with analytics.
For good measure we can also add some limited output for reporting, so when it comes to proving our methods we have a huge log file of skipped records, who they belonged to and why they were skipped.
WHERE TO START YOUR JOURNEY
Documentation is available and bundled with the product as well as online. Check out the Nuix package section and scroll to the section titled 'Scripting Context' these are your main entry points:
|The case currently open in the application.
|The items which the user has selected or, if viewing a single item, a list containing just that item. The ordering of the items in the list is not defined.
|Provides access to utilities.
|The current version of the Nuix application running the script.
|This object is only available for scripted metadata identifier. It represents the item being evaluated by the scripted metadata identifier.
Any script you run inside the Nuix Engine will in some aspect interact with at least one of these script context objects. Be wary of blindly expecting a value to be present, however. A currentCase will be null if a case is not open, currentSelectedItems will be null if a case is not open or be an empty list if in a case. Focus on those advanced topics for script bundles/headers so that the script menu will disable the script if the context does not support what is required.
What’s the point of a script with no case? The script could create the case as part of your workflow, or the script could prompt for another script to perform some action without opening it to the user (perhaps a report or an export).
CASE? WHO NEEDS ONE?
Users of Nuix Imager will know what I am talking about. Nuix Imager is a product installed alongside Nuix Workstation as part of the bundle. With Nuix Imager, a Nuix Engine instance is created with sourceItemFactory; this Factory provides access to the extraction component of the Nuix Workers or in the prefilter dialog you see in Nuix Workstation.
Of course, not using workers has two major drawbacks. You lose the ability to use a case index for searching and you lose the ability to extract in massive quantities across workers. If all you wanted was to present a list 'on demand' when the user expands or collapses the structure you don’t need heavy lifting to do that. Another helpful trick sourceItemFactory is good for is peeking at the MIME type of an item before processing it to make sure it is indeed the correct extension and is not corrupted before wasting time doing your work.
I WANT TO BUILD AN APPLICATION
With the Nuix Engine, available on our download site, comes documentation and the ability to build your customer products on it. We'll be releasing some pretty cool baseline templates to support your innovations and creativity in the near future.
Scripts At A Distance
The Nuix Engine is easily applied to any logic that involves the extraction, hydration, morphing, reporting and indexing of data for searching. What happens if you want to do work with the Nuix Engine at a distance? Let's say your users are on low spec machines and you want to build one application hub they can all attach too.
You’re now in the world of Nuix RESTful. REST is a technology allowing applications or websites to be built that work on a different machine than those the user is running. This is the classic scenario of a review platform when you may have thousands of users interacting with a cluster of beefy Nuix boxes that host the case indexes.
It's Not Enough! I Need More!
More? Consider all we have talked about so far and put the concept of BPMN at the front of your mind. BPMN is a way of defining processes in a standard and universal way. With these you could interact with REST or the Java engine directly and manage a pipeline of scripts.
There are also the Nuix Investigate workflows, which can be used to call a script on our review platform to perform any Nuix Engine actions available in the Nuix Engine API.
There’s so much more that the Nuix Engine can do. We are already drafting the framework that allows custom MIME types and evidence sources. When that comes there will be an explosion of interest in how we integrate with other products. Start thinking about all the evidence sources you could potentially make a connector for and sell in partnership with Nuix.
THAT’S GOT TO BE IT
We are intensely proud of the Nuix Engine and what it can do. From automating an existing workflow to a fully-fledged pipeline management tool, the possibilities are truly endless.
I’ll leave you with a handy table as a reference to help you kickstart your development efforts.
|In front of Nuix Workstation
|I am still tinkering/testing
|Nuix Console -interactive
|Scripting -> Scripting Console (make sure you save it!)
|RESTful -> PUT userScripts
|Nuix Console scriptpath.rb
|Script Directory in nuixscript package
|Nuix Investigate Workflow
|Compiled Jar of a Nuix Engine project
|Java Application wrapping Engine project
|Web application wrapping REST OR BPMN tool