Basis Technology and Nuix Triage Multilingual Data at Blazing Speed
In the movies, investigations are clear-cut and fast. Look for a body with bullet wounds and expended shell casings nearby. Look for the gun; there’s no need to look for a knife (no stab wounds) or a hammer (no evidence of blunt force trauma). The reality of digital investigations is more like looking for a body buried somewhere in a 5,000-acre junkyard with a mountain of debris on every acre. Forget the ‘needle in the haystack’ (that’s too easy); you’re looking for a specifc needle in a stack of needles.
Nuix specializes in tackling this kind of problem, expanding beyond investigations to include eDiscovery and data governance. It enables users to swiftly reduce the scope of a case from hundreds of systems to just the relevant ones. How? The Nuix engine is blazingly fast. It eats terabytes of data for lunch, thoroughly unpacking, processing and enriching the most complex data types — including unstructured and semi-structured text, mobile phone images, videos, files nested in PST or NSF files, social media data and forensic images. Other tools may silently fail on difficult files, but not Nuix.
Nuix then enriches data with normalization, concept grouping, deduplication and other programmatic analytics that empower analysts to ask questions (Where’s the body?) in order to ask better, targeted questions (Where’s the gun, what type of round was used, where else have similar rounds been found, is there pattern?). Nuix boasts of a 90% reduction in turnaround time for various types of investigations quickly reducing data to only what’s relevant and necessary to answer the questions being asked.
Rosette Meets the Multilingual Challenge
We sought a partner to meet the surge of data that was becoming increasingly multilingual. Without proper language support, relevant data could be missed or erroneously excluded from a case. For Nuix, the multilingual text processing also had to be fast, thorough and accurate because:
- In eDiscovery, multilingual documents need to be searchable such that a paragraph-long, English email footer doesn’t obscure the crucial one-sentence Japanese email body where the critical evidence is located.
- In investigations, all bad actors do not communicate in English. Investigators without multilingual capabilities need a tool that overcomes the language barrier.
- In data governance, the data containing names and personally identifiable information needs to be identified and securely stored, regardless of the language it is written in.
Nuix chose to partner with Basis Technology for its sophisticated, AI-powered text analytics platform, Rosette®. Operating at the same blazing speed as the Nuix Engine, Rosette identifies the language of unstructured text and then enriches it with language-specific processing in 30+ languages and their native scripts. Rosette is consistently accurate across European languages, Arabic, Chinese, Japanese, Korean, Persian, Russian, and Urdu, ensuring that Nuix searches are accurate and comprehensive.
For example, languages without spaces between words — e.g., Chinese, Japanese, and Korean — need the words to be segmented to be accurately searched. Complex languages like Arabic add affixes before, in the middle and at the end of words. Thus the stems and roots of words must be identified to enable a comprehensive search. An exact match search in Arabic for “book” (kitaab) will not match the plural “books” (kutub), unless you know that the root of both words is k-t-b.
Rosette-enriched text also enables Nuix to apply its own analytics.
In data governance or eDiscovery, you don’t want to give out personally identifiable information (PII) when you have to show data. Being able to understand PII in multiple languages quickly, accurately and at scale are essential.
Rosette also stood out to Nuix for its track record powering mission-critical systems for government intelligence, border security, financial compliance and eComms surveillance, as well as customer feedback analysis.
The Proof is in the Results
By integrating Rosette, Nuix strengthened its offerings in three key areas:
For eDiscovery, Rosette detects different language regions in a single document, so that text in each language section is properly processed to be searchable. One pass with Rosette produces a report on what proportion of a corpus of evidence is in which languages before early case assessment even begins. Every full-text search will be thorough and comprehensive, uncovering the most relevant information quickly.
In an investigation, the language used in communications can provide valuable clues. If Rosette reveals that one actor only speaks his native tongue with his mother, but then starts using it in another conversation with another person, that could be an anomaly that warrants further examination. This is particularly important in cases of human trafficking and crimes against children, where speed is essential to save lives.
Finally, with governance, understanding where your company stores sensitive data — such as unencrypted credit card numbers, electronic personal healthcare information (ePHI) or PII, is of critical importance. If a data breach occurs, you need to quickly know what the hackers found. Accurate search across languages is an indispensable tool.
An Ecosystem of Capability to Meet Future Needs
Nuix has already encountered cases on the scale of hundreds of terabytes. Data volumes are increasing at an unbelievable rate, especially if you add in social media and chat messages. To think that any individual is going to go through all that data is unrealistic. There needs to be a programmatic way to cull it down.
The need to cope with astronomical data volumes is already appearing outside of traditional knowledge-based tasks. The COVID-19 pandemic has only accelerated the massive move to digital data.
“Basis Technology and Nuix are empowering legal technologists, intelligence analysts and law enforcement to cope with the information avalanche they face every day,” said Carl Hoffman, CEO of Basis Technology. “We support Nuix’s vision of building a capabilities ecosystem that combines solutions from multiple partners to meet these challenges.”
We need to be prepared for what is going to happen, and working with Basis Technology helps us do just that for our customers. We don’t yet know the shape of the data, but it definitely isn’t all going to be in English, which is why Rosette is such an essential piece. The ability to meet the future needs of our customers will enable and empower them to continue to do their jobs; uncovering waste fraud and abuse, prosecuting the guilty and exonerating the innocent. This requires constant vigilance, and a collaborative pushing of the envelope of what’s possible.
We aren’t marking time — we’re getting after it — everyday, with partnerships like the one we have with Basis Technology.