The Mueller Report Part 3 - Human-generated Data at the Heart of Investigations
Preface: This article is all about the data discussed in Part 1 of this blog series. No political statements are being made.
The Mueller Report is a great window into the relative value of data, both for adversaries and for investigators. In Part 1: The Mueller Report - An Amazing Lens Into a Modern Federal Investigation I covered all of the different types of data collected and analyzed for the report.
- 2800 subpoenas. With 87 references to Facebook and the detailed documentation about the activity of certain profiles, you can assume that the Office was sifting through Facebook, Twitter, and Instagram data.
- 500 search and seizure warrants. This is bound to generate at least a couple hundred hard drives and mobile devices.
- 230 2703(d) and 50 "pen registers". This is interesting because it laser focused on who is talking to whom and the frequency of their communications.
- 500 witnesses. That is a whole lot of testimony that needs to be checked against all the digital evidence.
In Part 2: What It Feels Like To Be Targeted by a Nation State, I covered the types of exfiltrated data:
- "In total the GRU stole hundreds of thousands of documents from the compromised email accounts and networks."
- "Compressed and exfiltrated over 70 gigabytes of data from this file server."
The Data that Matters
In both instances, the most interesting data is that created by humans. At the end of the day, if you are trying to prove a point you ultimately are trying to answer the same investigative questions: who, what, where, why, when, and how. All of these questions are about peoples’ behaviors.
Sure, there’s a ton of interesting stuff found in machine data, but ultimately we live in a world filled with people. People that are doing things, saying things, and in this case communicating things electronically.
The hackers we’re talking about were looking for things that might have been said that could be used for leverage. In the case of the investigation, the Office was looking to corroborate that an event had taken place or that two or more people were communicating.
As I was reading the Report, I found it interesting how frequently the footnotes referenced "Emails" and "Texts" as the source of evidence. I was curious exactly how many times. So, using my favorite Swiss Army knife for data, I whipped up a quick script and ran it in our software:
hitCounter = 0
for item in currentSelectedItems:
if len(re.findall('Email',item.textObject.toString())) > 0:
print str(item.guid) + "|" + str(len(re.findall('Email',item.textObject.toString())))
hitCounter = hitCounter + len(re.findall('Email',item.textObject.toString()))
NOTE: For you coders out there, I’m sure it can be written more efficiently, but it got the job done.
Taking it to the 5 WHs
In the results of my quick script, it turns out "Email" is footnoted 350 times and "Text" is footnoted 113 times. Even with the various footnotes, the Report calls out the threat of new types of encrypted communication, increasing the difficulties of conducting thorough investigations:
"Further, the Office learned that some of the individuals we interviewed or whose conduct we investigated—including some associated with the Trump Campaign—deleted relevant communications or communicated during the relevant period using applications that feature encryption or that do not provide for long-term retention of data or communications records. In such cases, the Office was not able to corroborate witness statements through comparison to contemporaneous communications or fully question witnesses about statements that appeared inconsistent with other known facts."
At the end of the day it all comes back to understand who, what, where, why, when, and how. Nuix continues to make it faster and easier for investigators, be they corporate, regulatory, or law enforcement to quickly understand who is talking to whom and the overall dynamics at play across social networks.
Check out the latest release of Nuix Investigate (formerly Nuix Web Review & Analytics) to see how you can easily take all of your electronic communication types and quickly visualize who the most important people are in the network!