Removing PII information from the Enron Data Set

The EDRM Enron data set is an industry-standard collection of email data that the legal profession has used for many years for electronic discovery training and testing. Since this data was published, it has been an open secret that it contained many instances of private, health and financial data. Nuix volunteered to investigate the EDRM Enron data set and remove as much of this personal information as possible before republishing a cleansed version of the data. The results of our investigation present food for thought about the prevalence of private data in all corporate data sets and the serious business risks this represents.

Ady Cassidy & Matthew Westwood-Hill from Nuix discuss the methodology for the removal of personal identifiable information from the data set in the September bulletin of IRMS.

Read the full article here to learn how we removed personally identifiable information from the Enron data set.