The Mueller Report - an amazing lens into a modern federal investigation

Preface: This NOT about politics. This is all about the data discussed in Volume 1 of The Mueller Report.



Volume 1 of The Mueller Report offers an amazing look into the complexities of modern investigations and really highlights the importance of being able to handle diverse collections of data about, and created by, humans and then being able to understand the people, objects, locations, and events - (POLE).

Nuix software fits right into the middle of this landscape—helping organizations handle gross data, running hundreds of thousands of searches looking for hidden links, and visualizing relationships across the POLE framework. So within a few minutes, we successfully ran it through the Nuix Engine.

For anyone who works with unstructured data for a living, the document would fall into the category of "Gross Data." The PDF was a container for 447 JPGs with zero searchable text. Nuix made short work of this, and I was able to quickly OCR the images. Thanks to the auto-detect for rotation I was able to very quickly get a good and clean text.

From there, I extracted named entities (people and company names, email addresses, etc…) and pulled out a list of shingles (basically a quick way to look for repeating phrases).



The Mueller Report contains a wealth of named entities. Here are just a few examples:


mueller people

Email Addresses

mueller email addresses

Company Names

mueller company name


mueller shingles

NOTE: "Number of Items" refers to the number of pages that contained the entity or shingle.

Thanks to how fast I had all of this detail, I was able almost immediately to start to get a sense of the document’s content and quickly understand what the data landscape looked like. How cool is that?


Next Step, Some Analysis

So, what’s next? I decided to do a little open-source analysis and compare the report to things like the publicly available data released by the ICIJ as part of the Panama Papers and the United States Treasury Specially Designated Nationals And Blocked Persons List (SDN).

This was also super easy since I had already converted the Panama Papers and SDN to a huge search and tag (more on that in another blog post!) I should note: Many of the search tokens are over-broad, but it’s still a really interesting exercise…

mueller search tags

As a reminder, all of this was extracted from just the final report - not the actual source data!

The actual source data that was part of the investigation read:

"During its investigation, the Office issued more than 2,800 subpoenas under the auspices of a grand jury sitting in the District of Columbia; executed nearly 500 search-and-seizure warrants; obtained more than 230 orders for communications records under 18 U.S.C. § 2703(d); obtained almost 50 orders authorizing use of pen registers; made 13 requests to foreign governments pursuant to Mutual Legal Assistance Treaties; and interviewed approximately 500 witnesses, including almost 80 before a grand jury."

Within just this short example of the source data, we have a lot to consider, including:

  • 2800 subpoenas: With 87 references to Facebook and detailed documentation as to the activity of certain profiles, can you assume that the Office was sifting through Facebook, Twitter, and Instagram data?
  • 500 search and seizure warrants: That is bound to generate at least a couple hundred hard drives and mobile devices.
  • 230 2703(d) and 50 "pen registers": Interesting in that it laser focused on who is talking to whom and the frequency of their communications.
  • 500 witnesses: That is a whole lot of testimony that needs to be checked against all that digital evidence.



In Volume 1 Section III. Russian Hacking and Dumping Operations, the Mueller Report provides frightening detail about what it means to be targeted by a Nation State. The prevailing sentiment is that if you are targeted by a Nation State, it will eventually get in. 

For those in the security industry, this is old news. What is interesting about the Mueller Report is that the details are included in a document that will be read by millions of ordinary people, not just security professionals.

"In total, the GRU stole hundreds of thousands of documents from the compromised email accounts and networks."

Opportunity For Security Awareness

This is a unique opportunity for the security industry to raise awareness with people who typically tune this stuff out and heighten the mindset that people are the most vulnerable part of an organization's cyber defense posture. 

Volume 1 Section III, Russian Hacking and Dumping Operations is only about 5 pages long, but it is incredibly telling:

"GRU Officers also sent hundreds of spearphishing emails to the work and personal email accounts of Clinton Campaign employees and volunteers. Between March 10, 2016, and March 15, 2016, Unit 26165 appears to have sent approximately 90 spearphishing emails to email accounts at Starting on March 15, 2016, the GRU began targeting Google email accounts used by Clinton Campaign employees, along with a smaller number of email accounts.”

One Mistake

The reality of spearphishing is that all it takes is for someone to make a simple mistake and download malicious code onto their machine. From there it is game on. Between mid-March when the spearphishing campaign began and, "no later than April 12, 2016, the GRU had gained access to the DCC computer network using the credentials stolen from a DCCC employee who had been successfully ‘spear phished’ the week before. Over the ensuing weeks, the GRU traversed the network identifying different computers connected to the DCCC network. By stealing network access credentials along the way (including those of IT Administrators with unrestricted access to the system), the GRU compromised approximately 29 different computers on the DCCC network."

Once the network was compromised, the GRU installed customized malware that allowed them to "log keystrokes, take screenshots, and gather other data from infected computers."

"On April 25th, 2016, the GRU collected and compressed PDF and Microsoft documents from folders on the DCCCs shared file server that pertained to the 2016 election. The GRU appears to have compressed and exfiltrated over 70 gigabytes of data from this file server."

The reality is that in about 45 days from the time an employee was spear phished, 29+ machines had been compromised, the attackers had escalated privilege, and had exfiltrated at least 70 gigabytes of sensitive data. This was just a drop in the bucket compared to all the emails.

Once this data was exfiltrated, it was then released via various websites, and the rest is history.

If They Really Want To Get In...

The moral of this story is that thwarting Nation States bent on compromising your network is tough. It requires continuous employee training, constant vigilance, and top-notch cybersecurity professionals defending the walls and hunting threats. 

The combination of Nuix's endpoint and the investigative platform is used by the pros every day to combat these types of threats. Find out more about how Nuix Endpoint technologies can work for you to defend against the Nation States and other attackers, and look for the next installment in this series when I’ll cover human-generated data and its place in modern investigations.



In both instances, the most interesting data is that created by humans. At the end of the day, if you are trying to prove a point you ultimately are trying to answer the same investigative questions: who, what, where, why, when, and how. All of these questions are about peoples’ behaviors. 

Sure, there’s a ton of interesting stuff found in machine data, but ultimately we live in a world filled with people. People that are doing things, saying things, and in this case communicating things electronically.

The hackers we’re talking about were looking for things that might have been said that could be used for leverage. In the case of the investigation, the Office was looking to corroborate that an event had taken place or that two or more people were communicating. 

As I was reading the Report, I found it interesting how frequently the footnotes referenced "Emails" and "Texts" as the source of evidence. I was curious exactly how many times. So, using my favorite Swiss Army knife for data, I whipped up a quick script and ran it in our software:

import re

hitCounter = 0

for item in currentSelectedItems:

if len(re.findall('Email',item.textObject.toString())) > 0:

print str(item.guid) + "|" + str(len(re.findall('Email',item.textObject.toString())))

hitCounter = hitCounter + len(re.findall('Email',item.textObject.toString()))

print hitCounter

NOTE: For you coders out there, I’m sure it can be written more efficiently, but it got the job done.

Taking It To The 5 WHs

In the results of my quick script, it turns out "Email" is footnoted 350 times, and "Text" is footnoted 113 times. Even with the various footnotes, the Report calls out the threat of new types of encrypted communication, increasing the difficulties of conducting thorough investigations:

"Further, the Office learned that some of the individuals we interviewed or whose conduct we investigated—including some associated with the Trump Campaign—deleted relevant communications or communicated during the relevant period using applications that feature encryption or that do not provide for long-term retention of data or communications records. In such cases, the Office was not able to corroborate witness statements through comparison to contemporaneous communications or fully question witnesses about statements that appeared inconsistent with other known facts."

At the end of the day, it all comes back to understanding who, what, where, why, when, and how. Nuix continues to make it faster and easier for investigators, be they corporate, regulatory, or law enforcement to quickly understand who is talking to whom and the overall dynamics at play across social networks.

Investigative canvas blur

Check out the latest release of Nuix Investigate (formerly Nuix Web Review & Analytics) to see how you can easily take all of your electronic communication types and quickly visualize who the most important people are in the network!