You can see how this MAP can be used in some of the supplements to this guide already released. See for example the analysis of Donna Brazile email s and the Peter Kadzik emails (Within the Series linked above).
Again the basic structure of the database is explained in the original Guide and further elaborated upon in the other supplements. Generally the structure is based upon a sequentially ordered 8-digit number file name. (Wikileaks also assigned an Email ID - which is not logically associated with the file name - and is used to locate the email in their online database. These two numbers are the first two columns in the MAP.
You can quickly get a visual bearing on the contents thus far released - particularly the gaps that remain in the database to be released by Wikileaks later. These are noticeable by the WHITE BANDS.
You can also get a general picture of how the emails were released by the other colored bands. The first seven data dumps are combined together and are represented in the ORANGE rows. (I may fine tune the dump order information at a later date). From there the NON-ORANGE bands reflect each of the subsequent dumps. While I tried initially to keep these bands nearly "transparent" to make them stand out - the number of data dumps has made that difficult. So I have reluctantly added some "darker" colors. I have, however, in this latest release re-assigned the colors so that the user can get a quick view of the set by release date. Excluding ORANGE (for unreleased gaps) - the data dumps in sequence go from variations of RED, skipping orange (the first set), to YELLOW, GREEN, BLUE and now we will be beginning to use the VIOLET end of the spectrum. This provides the user with a quick glimpse into the data in order of release that can show both the random and not so random aspects of the releases. You can see, generally, that the bulk have been chosen by a random sample. However certain "sets" of emails can be seen to have been selected for release at certain times and this can be seen by clusters of a single color. For reference purposes the Column labeled "Dump" provides the batch from which that email came.
There are nearly 60,000 emails - and just under 50,000 thus far released - and so this is a VERY LARGE spreadsheet and file size.
It is quite useful, however, in getting a VIEW of the data. A similar project, although with different applications (such as viewing connections between email users), was recently released using this data set by MIT. [SEE STORY: What I Learned From Visualizing Hillary Clinton's Leaked Emails]
This MAP can provide other information - in particular it enables the user to determine prominent gaps in the currently released emails. When viewed by user - this can show possible clusters of emails that might be of significance but have not yet been released.
I have left the spreadsheet sortable - and thus you can sort by filename (the original sort - which can always be returned to) as well as by the FROM field (a combined USER NAME which is used to derive the original filename sort) to check whether some of the emails of the same user may fall into a different alphabetical set. You can also look to the right end of the spreadsheet and see that I have isolated out the USER NAME and the EMAIL ADDRESS. This can show that multiple users may, at different times, be using the same email address. Or that one user may utilize more than one email. (Next to the field with the name or email - is a number - that shows the total number of emails from that user or that email address. The furthest right (after Email - Instances and User Name - Instances) is a From Instances column. This counts the number of times the COMBINED FROM address (USER NAME - EMAIL ADDRESS) is found in the database. Using these three numbers can demonstrate when one should look for further emails from the user in another section. So, for example, "David Jones djones AT gmail.com" may be a FROM (sender) category with 10 emails within it. But when the email address and user name are isolated - and instances counted - it may turn out that "Jones, David djones AT gmail.com" also exists - for the same person (but one in the set of D's and the other in the set of J's - distanced by over 10,000 emails). Or it may turn out that even though the FROM occurs 10 times - the email address associated with it occurs 50 times. Thus sorting by email address will identify that particular set.
Another use is to sort by date and time - thus enabling the user to see the threads of conversations by date - and thus evaluating topics by the individuals discussing them. Or seek to see what was taking place during and around a particular event in time.
The following is a graphical representation of the MAP - this is how it will appear on the DYNAMIC spreadsheet. Again further information can be obtained about how this MAP was created and how to use it from the original Guide and the other Supplements. There are two columns with LINK and GET - this links to the Wikileaks webpage for that email and gets the original .eml file.
I have also included a link to a PDF version. Unfortunately I could not keep the links active in this format - but it does provide the quickest way to navigate the MAP. It is, by necessity, a PDF with a LARGE NUMBER of pages.
UPDATE
Podesta 30 dropped this evening and so I have updated the map files and provided a link to both the 29th and 30th versions.
[LINK TO SPREADSHEET (Right Click to Download - or it will Open the File): https://1drv.ms/x/s!Ahc0rCgpBELmjVyW9cvOYLiuYzkD ]
[LINK TO SPREADSHEET 30: https://1drv.ms/x/s!Ahc0rCgpBELmjV3tOxOuzdsqarNf]
[LINK TO PDF: https://www.documentcloud.org/documents/3214533-Podesta-29-Map-Complete.html]
(Note: You can view every article as one long page if you sign up as an Advocate Member, or higher).