Skip to main content

Python on the Nuix Engine Part 2: Integrating Nuix into Your Python Application

Software developer coding in Python on a laptop

In Part 1 of this series, we discussed a few ways you could use Python inside the Nuix Engine. Most commonly, customers use Worker Side Scripts to execute relatively simple code during processing but we focused on integrating a larger external Python application by either calling it as an external command line application or by calling into a microservice. In both cases, Nuix Workstation or the Nuix Engine is the driving application and you are integrating Python into your Nuix workflow. This time we will investigate two methods of integrating the Nuix Engine into your Python environment or application:

  1. Using the Engine Java API to call directly into the Nuix Engine
  2. Using the Nuix Engine RESTful Interface.

These approaches let you integrate Nuix into existing applications and automation workflows, minimizing the changes to your existing workflows and doing away with most manual steps needed to get data in or out of Nuix.

You can access the code repository for this blog post from our GitHub.

Using the Java API

It is relatively easy and common to run Python code inside a Java virtual machine – Nuix takes advantage of Jython to do that, and a large part of Part 1 of this series used this feature. But you can also do the opposite: run Java application code in your Python code base. There are several tools available to do this, but for this blog post I will use the pyjnius module. You can find pyjnius at kivy/pyjnius: Access Java classes from Python (github.com) and install it on Python 3 using pip.

The example presented here will be a sort of “hello world” for the Nuix Engine – starting the engine and getting a license. To understand how to use the Nuix Engine’s Java API you should read the Java Docs found online here: Java API (Engine API 9.6.10) (nuix.com) (and if you have the Nuix Engine installed, the docs are also found locally in the engine’s docs subfolder). You can find the code for this part of the post in the engine_in package in the provided GitHub repository and its usage in the repository’s ReadMe.

Before we use pyjnius, we need to configure the Java environment it will use to access the Nuix Engine’s Java code. This needs to be a JRE or JDK compatible with the Nuix Engine, so I suggest using the JRE shipped with the Engine in the jre subdirectory:

import os
nuix_engine_path = r'C:\Projects\nuix-engine'
def initialize_environment():
    engine_bin = os.path.join(nuix_engine_path, 'bin')
    engine_lib = os.path.join(nuix_engine_path, 'lib', '*')
    engine_ssl = os.path.join(nuix_engine_path, 'lib', 'non-fips', '*')
    engine_jre = os.path.join(nuix_engine_path, 'jre')
    engine_jvm = os.path.join(engine_jre, 'bin', 'server')

    classpath = ';'.join(['.',engine_lib, engine_ssl])
    java_home = engine_jre
    path_update = ';'.join([java_home, engine_jvm,engine_bin])

    os.environ['JAVA_HOME'] = java_home
    os.environ['CLASSPATH'] = classpath
    os.environ['PATH'] = f'{path_update};{os.environ["PATH"]}'

initialize_environment()

We also take this opportunity to ensure the libraries and other binaries the Nuix Engine needs are on the PATH and accessible to the executing environment. With that, we can start to create an instance of the Nuix Engine. Since the necessary classes are Java classes, we will use the autoclass function from pyjnius to import them into Python:

from jnius import autoclass

NUIX_USER = 'Inspector Gadget'
USER_DATA_DIR = r'C:\Projects\RestData'

GlobalContainerFactory = autoclass('nuix.engine.GlobalContainerFactory')
Collectors = autoclass('java.util.stream.Collectors')
global_container = GlobalContainerFactory.newContainer()
try:
    configs = dict_to_immutablemap({'user': NUIX_USER, 'userDataDirs': USER_DATA_DIR})
    engine = container.newEngine(configs)
finally:
    global_container.close()

I haven’t shown a utility method dict_to_immutablemap(…) which is used to convert Python dictionaries to an immutable implementation of java.util.Map. We now have an instance of an Engine, but it isn’t licensed yet. To get the license, we use the following code before the finally block above:

engine.whenAskedForCredentials(PCredentialsCallback())
license_config = dict_to_immutablemap({'sources': [LICENSE_SOURCE_TYPE]})
worker_config = dict_to_immutablemap({'workerCount': WORKER_COUNT})
found_licenses = engine.getLicensor()\
    .findLicenceSourcesStream(license_config) \
    .filter(PLicenseSourcePredicate()) \
    .collect(Collectors.toList())
for license_source in found_licenses:
    print(f'{license_source.getType()}: {license_source.getLocation()}')
    for available_license in license_source.findAvailableLicences():
        license_short_name = available_license.getShortName()
        if LICENSE_TYPE == license_short_name:
            available_license.acquire(worker_config)
            print(f'Acquired {license_short_name} from [{license_source.getType()}] '
                  f'{license_source.getLocation()}')
            break # return out of all the looping

This code relies on some constants – which I’m skipping for brevity – and some checking on available worker counts to be safe. It also requires two callbacks that must implement Java interfaces. The code below shows how to implement the Java interfaces in Python:

from jnius import PythonJavaClass, java_method
class PCredentialsCallback(PythonJavaClass):
    __javainterfaces__ = ['nuix/engine/CredentialsCallback']

    @java_method('(Lnuix/engine/CredentialsCallbackInfo;)V')
    def execute(self, info):
        print('Credentials Callback Called')
        info.setUsername(os.environ['nuix_user'])
        info.setPassword(os.environ['nuix_password'])

class PLicenseSourcePredicate(PythonJavaClass):
    __javainterfaces__ = ['java/util/function/Predicate']

    @java_method('(Ljava/lang/Object;)Z')
    def test(self, licence_source):
        print('License Test Called')
        return LICENSE_SOURCE_LOCATION == licence_source.getLocation()

These are Python objects that implement Java interfaces. The first can be used as a nuix.engine.CredentialsCallback to provide the credentials to log into the cloud server, while the second is a java.util.function.Predicate to filter down to license sources that connect to the desired location. Again, I have omitted some variable definitions here for code brevity.

That’s basically it, except as noted where I skipped code for brevity. With that example, you would be able to create a new instance of a Nuix Engine, claim a license and be ready to use it in your Python application. See engine_in.grab_license.py in the GitHub repository linked above for the full code. The code was created in Python 3.9 in an environment you can reconstruct using Anaconda with the environment.yml file provided in the repository.

Nuix RESTful Service

Our final approach for using Python with the Nuix Engine is to call on the REST API provided by the Nuix RESTful Service. The RESTful Service is a wrapper around the Nuix Engine that allows the engine to be up and running full time and to allow applications to connect, claim licenses and do work in the engine as needed. It allows you to share the same instance of the Nuix Engine and case files from multiple applications and computers. It also lets the Nuix Engine run on servers, clusters and the cloud.

Accessing RESTful services from Python isn’t anything new – it’s standard practice. The main module we use is requests, which you can install in any Python environment with pip or using Anaconda. The environment.yml file in the code repository for this post includes all the necessary Python packages.

If you’re following along from Part 1 of this blog series, you might recognize this as the inverse of the Python microservice example: instead of using a REST interface from the Nuix Engine to call into an external Python service, we’re using Python to call the REST interface into the Nuix Engine running as a service.

The API defining the endpoints we’ll use for calling the Nuix RESTful Service is documented in the Nuix REST API Reference. The Nuix SDK site has many examples of how to use the interface.

For this example, we’ll use the code inside the restful package in the repository. It’s a complete example that will do a paged export of all items in a case, by first doing a paged search, tagging items in each page and then exporting items in a particular tag or page. The example has several different modules, each of which we’ll describe in varying level of detail here. To starts, let’s look at restful.rest_base.py:

import requests
def post(url, headers, data):
    print("POST: " + url)
    response = requests.post(url, headers=headers, data=data)
    print(response.status_code)
    try:
        response_body = response.json()
    except:
        response_body = response
    return response.status_code, response_body

def get(url, headers):
    print("GET: " + url)
    response = requests.get(url, headers=headers)
    print(response.status_code)
    try:
        response_body = response.json()
    except:
        response_body = response
    return response.status_code, response_body

restful.rest_base.py provides support for interacting with the RESTful API – it has methods for doing POST, GET, PUT, PATCH, HEAD and DELETE requests to the service. The sample provided here shows GET and POST, as they provide the basic outline for all the others. They take in the full URL (with any query parameters), a dictionary for headers and sometimes a dictionary for the body of the request. The methods will then make an appropriate requests method call and return a tuple containing the response status code and the parsed JSON body from the response as a dictionary.

Another bit of housekeeping is stored in the resful.nuix_api.py module. We use this to make it a little easier to build the request URLs for the endpoints. There is a class with all the endpoints used in the example as both strings, and methods that replace the parameters in the endpoint paths with variables passed to the methods. For example:

import json
class NuixRestApi:
    with open("config.json") as config_file:
        config = json.load(config_file)['rest']
    service = "nuix-restful-service/svc"

    case_count_path = "cases/{case_id}/count"
    @staticmethod
    def case_count_url(case_id):
        case_count_path = NuixRestApi.case_count_path.format(case_id=case_id)
        return f"{NuixRestApi.config['host']}:{NuixRestApi.config['port']}/" \
               f"{NuixRestApi.service}/{case_count_path}"

This lets us generate the URL for the endpoint to get the count of items in a case using count_endpoint = NuixRestApi.case_count_url(case_id). This helps isolate some of the configuration, such as the host, port and service path and build the full URL without having the configuration and URL building all over the code.

The resful.nuix_api.py module also has a class which holds various Content-Types used by the service to make selecting the versions of endpoints to use a little easier.

Final few bits of utility are in restful.nuix_utility.py which provides some methods for doing some common tasks on the RESTful service, such as logging in and out, doing a paged search and monitoring async functions:

import os
import json
from nuix_api import NuixRestApi as nuix
from nuix_api import ContentTypes
from rest_base import get, put, delete

def check_ready(headers):
    try:
        status_code, response_body = get(nuix.health_url(), headers)
        return status_code == 200
    except:
        return False

def login(headers):
    usr = os.environ['nuix_user']
    pw = os.environ['nuix_password']

    data = json.dumps({
        "username": usr,
        "password": pw,
        "licenseShortName": config["license"]["type"],
        "workers": config["license"]["workers"]
    })

    headers['Content-Type'] = ContentTypes.V1
    headers['Accept'] = ContentTypes.V1

    try:
        status_code, response_body = put(nuix.login_url(), headers, data)
        if status_code == 201:
            auth_token = response_body["authToken"]
            headers["nuix-auth-token"] = auth_token
            return True
        else:
            print(f"Unexpected return status code when Logging In: "
                  f"{status_code} [{response_body}]")
            return False
    finally:
        # Reset headers to default
        headers['Content-Type'] = ContentTypes.JSON
        headers['Accept'] = ContentTypes.JSON

def logout(headers):
    usr = os.environ['nuix_user']
    status_code, response_body = delete(nuix.logout_url(usr), headers, None)
    return status_code == 200, response_body

The example provided in the restful package of the repository linked above is a complete example that will find a case, get its item counts, tag items in bulk and export them. For this blog let’s limit the scope to what we did with the previous example: getting a license. Given the groundwork we’ve already done, we can achieve that with this code:

import json
from nuix_api import ContentTypes
import nuix_utility as ute

with open("config.json") as config_file:
    config = json.load(config_file)['rest']

headers = {
    "Content-Type": ContentTypes.JSON,
    "Accept": ContentTypes.JSON
}

ok = ute.check_ready(headers)
if not ok:
    print('Server is not ready')
    exit(9)

ok = ute.login(headers)
if not ok:
    print("Failed to Log in.")
    exit(1)

try:
    # Congrats!  You've logged in.  Do your work here

finally:
    ute.logout(headers)

This, and some of the other code in this post, use a JSON config file to store some settings – that config file is in the repository and contains things like the RESTful service’s host and port, configuration for the licensing and settings you need for the tagging and export parts of the application which aren’t shown here. You can find the full example in the restful.paged_export.py module and the ReadMe will explain how to use it.

Summary

There are a variety of ways you can combine the Nuix Engine with Python applications. In this post we showed a couple of examples of integrating the Nuix Engine into your Python workflows. This way, you could use Python to fully automate your case work or use your external application to augment and enhance your existing cases.

Using the Java API in a Python application is not the smoothest experience – it isn’t very “Pythonic” in its design (of course, being a Java API), but it’s a good way to include it with minimal external configuration and it keeps your application isolated and easy to distribute.

Using the REST API makes for an easy programming experience using simple interfaces. It can keep your application more readable by pure Python developers and can take advantage of powerful features such as deployment in the cloud and clustering, which makes it ideal for large-scale deployment. It does take a little extra work to setup, however.

Further Reading

For more information on the Nuix Engine, scripting, the Java and REST APIs, see these additional resources:

The GitHub repository with the code used in this blog:

Other blog posts:

Other examples in GitHub:

Downloadable documentation: