Secretary Clinton's Email (Source: Wikileaks)

Application Version 1.7 | Data Version 2.1 | May 19, 2019

Date Range:

This application provides the ability to interactively filter 32,795 emails sent during Hillary Clinton's tenure as the United States Secretary of State and display features of the selected subset. The data is extracted from HTML representations of the official State Department release provided in a Wikileaks data base.

Analysis

Peak Email

Email Gaps

Email Times

This service is not meant to provide stand-alone means of analyzing this controversial data set. It is most powerful when used simultaneously with both internet searches and the Wikileaks data base or official State Department site. The latter two services provide indispensable context and precision; two services which the primarily metadata-driven displays cannot provide. Rather, the intended use of this application is the exploration of patterns present in the data to generate and explore different hypotheses.

It is hoped that this service will not simply provide you with a means of exploring this particular data set, but will demonstrate how much can be discovered about an individual using visual analytic tools of uninformative metadata to motivate searches of public data. The dissemination of data in the modern world is a topic of heated discussion, and hopefully experience firsthand exploring the way data can be leveraged will prove informative to you and help you to inform your own opinion on the subject.

Peak Email

Back to analysis links

Focusing on the email volume plot, an obvious peak can be seen near the centre of the time series. Using the date selection slider, the day of highest email volume can be identified as August 21, 2011, the beginning of the Battle of Tripoli in the Libyan Civil War. Inspecting the term frequency and tf-idf for this day reveals a host of terms related to this conflict. The United States and NATO were both heavily involved in this conflict, so this peak makes perfect sense. In fact, many other local maxima correspond to events related to the Arab Spring and the countries affected by this revolutionary wave.

Email Gaps

Back to analysis links

There are a number of conspicuous time periods where no emails are recorded in this data set. The most obvious of these occurs in early November 2012. This time period marks the beginning of much of the increased controversy surrounding the 2012 attack on the US Diplomatic compound in Benghazi, and also includes the 2012 US Presidential Election.

The time slider can be used to select a period surrounding this gap which includes the Benghazi attack, take September 11 to November 23. In this selection mentions of terms related to this attack, such as Benghazi and Ansar al-Sharia, can be seen. The network plot also reveals one of the contentious points of interest in Clinton's emails, the nature and frequency of her contact with Sidney Blumenthal during the Benghazi attack and shortly thereafter. We can also see contact with an account of unidentifiable domain with the label 'aclb.' Utilizing internet searches and inspecting emails, this account can be identified as that of Tony Blair, with the four letter string likely standing for his full initials . Finally, many of the emails surrounding this gap contain some FOIA redaction, as is clearly visible in the barplot of FOIA redaction codes.

Other gaps in the data can be found by narrowing the slider range, selecting the centre bar, and dragging this small window across the whole time range with the 'Show Emails' filter set to show only mail from Clinton. Doing this, a number of periods of no email can be discovered. By selecting the foreign travel tickbox, some of these can be identified as corresponding to official state visits. Other gaps occur near less typical events, such as a gap in mid June 2009, likely due to Clinton fracturing her elbow. Another gap in December 2012 corresponds with the resignation of four State Department officials due to the results of the Benghazi investigation.

A number of other gaps of possible interest are not discussed here, and you are encouraged to investigate any period of interest you notice for yourself. However, you should always be mindful of the tendency for all of us to seek information which confirms preconceptions, and attempt as much as possible to be honest and unbiased in your investigations.

Email Times

Back to analysis links

Several patterns in the date and time displays are immediately obvious. One of the most readily apparent of these is the appearance of modes at 2 am and 3 am in the Wikileaks reported times. Switching to the PDF extracted times, however, these modes disappear. Investigation of this pattern using the Wikileaks database with a random sample of emails reveals that these modes correspond to a default time setting applied when the PDF extracted dates cannot be read by the (presumedly) automated extractor used by Wikileaks. Unfortunately, the motivation of this methodological choice is never mentioned, let alone explained, by the Wikileaks page.

The second pattern of note is that of Clinton's strange sending times. The Wikileaks data seem to show that, regardless of which filter is applied, Clinton's team is most active in the middle of the night, with only a small communication break between 4 pm and 10 pm. This pattern is changed entirely when the extracted dates are used, and the communication gap is shifted to the far more natural 11 pm to 5 am. Once again, the presence and justification of this shift are never addressed on the Wikileaks page. Further investigation on a large sample of emails showed this 7 hour time shift was consistently applied from the times reported in the PDF to the times reported in the Wikileaks header. While the irony of this lack of transparency on a site which claims to champion that virtue is not lost on the authors, it serves as a very poignant reminder that we cannot blindly trust any source, and should always investigate ourselves.

20 Highest TF-IDF Terms

20 Highest Frequency Terms