The Russian Ad Explorer

Background

On May 10, 2018, Democrats on the United States House Intelligence Committee released 3500+ ads created by the Internet Research Agency between 2015 and 2017. The Internet Research Agency is believed to have created these ads to influence the outcome of the 2016 United States presidential election, and in general influence Americans' political views.

Many journalists and coders have already published visualizations and articles on this dataset — for a more in-depth analysis, you can read some of their work at these links: USA Today, NYTimes, Washington Post, Wired.

Data Extraction and Public Access

Democrats on the United States House Intelligence committee released the original dataset at this link: https://democrats-intelligence.house.gov/facebook-ads/social-media-advertisements.htm.

This site hosts 2,500+ PDF files, each of which contains a picture of an ad and a series of descriptors associated with that ad, such as "Ad Creation Date" or "Ad Clicks". Pictures and text were automatically extracted from each PDF file using pdftotext, pdfimages and ImageMagick. Because these extraction methods are not foolproof, there may be typographical errors or omissions in the ad descriptors, and images that could not be extracted. I modified some ad descriptors for clarity (more on that in the "Additional Notes" section) and loaded the final results into a JSON database for use in this website.

You can download the original PDFs either in zip file format, or from this Github repository: russian-ad-pdfs. You can download the images extracted from these PDFs, text files ripped from the original PDF documents, and a JSON file cleaning up some of the data here: russian-ad-datasets.

Brief Insights

My primary goal in this explorer is to help others take a look at these ads, and to better understand how those seeking to sway American's political views thoughts best to do so. I haven't yet been able to do any real analysis of the data, but I have noticed a view broad trends.

Contrary to my own expectations, many of these ads are aimed at people of color, LGBTQ+ people, and other audiences presumed to lean left. They often redirected to seemingly-progressive groups that nontheless encouraged visitors to vote for Trump, or not to vote for Hillary. One of the most prominent examples is Williams and Kalvin, two Black bloggers that were paid by a Russian group to persuade viewers not to vote for Hillary (DailyBeast). IRA ads would use progressive issues, such as racist police violence against Black people, to draw viewers to Williams and Kalvin.

It is also worth noting that some ad campaigns were explicitly non-political, such as the "Memopolis" campaign that marketed solely on memes (CNET). Other content included ads that reference popular movies or had pictures of women without a further political message, and ads that used offers for a music player called "facemusic" that would purportedly hijack users' Facebook Messenger (Wired).

Ads that appealed to conservative viewers featured ideas more violent and authoritarian than conservatives were typically credited with holding. One ad for "Stop A.I.", a violently anti-immigrant Facebook page, featured a racist caricature of a Mexican immigrant on a bugs body, with the subtitle "It's time to get rid of parasites". Other ads used Confederate imagery, or equated Hillary with Satan and recruited for an "army of Jesus".

There are probably many other discoveries left to be made in this dataset. Sorting by cost efficiency shows that many of the most effective ads marketed pictures of attractive and/or nude women. Searching "bernie" shows that the IRA was purchasing pro-Bernie ads both before and after the nomination of Hillary Clinton in the Democratic Primary. Select the "Native American" category and see a brief campaign aimed towards Native Americans in the weeks after President Trump gave approval for the Dakota Access Pipeline. Looking at the "Non-US" location category shows that many anti-immigration and anti-Muslim ads were also aimed at countries like England, Germany, France, and Canada.

I am sure there are many other ads to discover — send a note (russian.ad.explorer@gmail.com), or share an ad, if you find something interesting.

Contact

If you have any requests, or questions regarding this dataset and explorer, send a message to russian.ad.explorer@gmail.com. I would love to hear any feedback you may have — this is an entirely informal effort, and I'll try to incorporate any suggestions as I have time.

Notes on Interest Categories

The Russian Ad Explorer allows you to filter by "Interest Categories", a set of criteria that was not provided in the original dataset. I created Interest Categories by aggregating the descriptors "People Who Match", "And Must Also Match", "Behaviors", "Interest Expansion", and "Interests" into a smaller set of categories that referenced a similar set of ideas. Scanning the dataset from the original PDFs extracted around 1,000 categories, which I then sorted them into a smaller set of 18 sub-categories.

The decision to put an ad tag in one or multiple categories was entirely a judgment call, and you may disagree with some of the categorizations. Additionally, I may have been unsure how to categorize certain interests and target audiences, and left them out. In order to make transparent some of the choices made in these categorizations, I briefly define them below. You can see a spreadsheet detailing the specific interest categories extracted and tags assigned on this Github repository.

Progressive: Interest tags were chosen if they referred to media programs associated with progressive viewpoints, progressive media outlets, and slogans and ideas associated with progressive views. Some examples include: "Social justice", "mother jones", and "Hillary Clinton". Notably, I tried to exclude figures and ideas from traditionally left-leaning categories like "African American", "LGBTQ", or "Islam", in order to make the Progressive category more specific, as opposed to a catch-all for several categories. For example, although the interest "Mumia Abu-Jamal" rightfully is an "African-American" figure, a "Progressive" figure, and a figure associated with the "Incarcerated", I chose to categorize that interest as "African-American" as it seemed to reflect the intent of the IRA when designing that specific campaign. Despite these choices, the Progressive category in particular has heavy overlap with the African-American category anyway.

Conservative: Interest tags were chosen if they referred to popular conservative figures, organizations, or symbols. Some examples include "breitbart", "Rand Paul", and "Far-right politics". Like the "Progressive" category, traditionally conservative categories like "Anti-Immigration", "Southern / Confederate", and "Police" were not selected in this category to make it more specific.

African American: Interest tags were chosen if they referred to popular Black figures, Black media, events in Black history, or occasionally references to famous African leaders in postcolonial struggles. Some examples include: "Sister 2 Sister Magazine", "Black is Beautiful", "Martin Luther King", and "PanAfricanism". I excluded some tags that deal with social issues that relate to Black Americans, but did not explicitly mention them, such as "Say No to Racism". References to postcolonial African leaders and movements were so numerous as to maybe deserve their own category in future versions of this explorer.

Latinx: Interest tags were chosen if they related to ideas or phrases associated with Latin American countries, although the only Latin American country specified specifically by IRA tags was Mexico. Tags include "Chicano", "Mexican Pride", "Latin hip hop", and "Lowrider Chicano Rap".

Native American: Interest tags were chosen if they related to ideas or phrases associated with Native Americans, although the only specific tribe referenced was the Cherokee Nation. Tags include "Cherokee Nation", "Native Peoples Magazine", and "American Indian Movement". This category seems to refer a single two-week campaign in the wake of Donald Trump's decision to approve the Dakota Access Pipeline.

LGBTQ: Interest tags were chosen if they related to any aspect of queer culture. Tags included "Bisexuality", "LGBT history", and "Lesbian community".

Islam: Interest tags were chosen if they related to the religion of Islam, referenced ideas and phrases prominent in Islam, or famous figures in the Muslim world. Some examples include "Kaaba", "Muhammad al-Bagir", and "Muslims Are Not Terrorists".

Christianity: Interest tags were chosen if they releated to any form of Christianity or any institution related to Christianity. Some examples include "Gospel", "Presbyterianism", and "Jesus Daily".

Army / Veterans: Interest tags were chosen if they related to veterans, or any position related to the military. Tags include "Vietnam Vets", "Support our Homeless Veterans", "Colonel", and "Industry: Military". Although veterans and active duty viewers are two different groups, these categories often appeared together.

Police: Interest tags were chosen if they were supportive of police in America. I decided to exclude interests that referenced progressive views on the police, such as "police brutality". Some examples included: "Support Law Enforcement", "National Police Wives Association", and "Chief of police".

Incarcerated: Interest tags were chosen if they referred to prisons in America. I decided to exclude interests that referenced progressive views on the police, such as "police brutality". Some examples included: "Incarceration in the United States", "Prison Voices", and "Prisoner".

The South / Confederate: Interest tags were chosen if they related to the Southern United States, the Confederacy, or the US Civil War. Some examples include "Robert E. Lee", "United Daughters of the Confederacy", "Dixie", and "Southern Pride".

Texas: Interest tags were chosen if they related somehow to Texas. Although many specific geographical areas were referenced, Texas was mentioned far more than any other area. Some examples include "Texas", "Hog Hunting Texas Style", and "People who like Heart of Texas".

Anti-Immigration: Interest tags were chosen if they explicitly expressed negative viewpoints about immigration, or releated to existing campaigns targeted against immigrants and refugees in the United States. Some examples include "Illegal immigration", "People who like Secured Borders", and "The Invaders".

Gun Rights: Interest tags were chosen if they supported reduced restrictions on gun ownership in America. Some examples include "2nd Amendment", "National Rifle Association", "Guns & Patriots", and "Texas Gun Owner".

Patriotism: Interest tags were chosen if they referenced Patriotism, American pride, or patriotic taglines. some examples include "America the Beautiful", "Support our troops", "Old Glory".

Memes and Products: Interest tags were chosen if they referenced specific brands, like "Spotify", bands, like "Justice(band)", or products like "Hoodies". They were also chosen if they referenced phrases associated with internet memes, such as "Funny Pictures. LOL", "9GAG", or "Imgur". While the category seems broad, in practice it mostly selects for the IRA's "Memeopolis" campaign of usually apolitical memes, or advertisements for the IRA's fake "facemusic" Google Chrome Extension.

Self Defense: Interest tags were chosen if they related to the concept of self-defense. Some examples include "Self Defense Family", "Right of self-defense", and "Martial Arts". This category seems to code for an ad campaign aimed at Black people offering a self-defense class.

Additional Notes

Each PDF released by Congress came with a set of descriptors, but some ads had more descriptors than others. If an ad did not have a descriptor, or if it could not be transcribed during the PDF extraction process, its value is replaced with "[Unavailable]". Other edits and information on these descriptors are detailed below.

Targeting Location: Some ads were specified to target a certain city, state, or country. Over 100 locations were named, and they were reduced to the "Midwest / West", "The South", "Pacific", "Northeast", "Southwest", "Atlantic", and "Non-US". As with the interest categories, there were some judgement calls — for example I categorized North Carolina as "The South", whereas some might think its closer to "Atlantic". There were also some cities with common names like "Glendale" that I did not assign to any category. A spreadsheet detailing location categories can be found on this Github repository.

Ad Age: Targeted age groups for ad campaigns were fairly random, but usually broad. I created two extra "Interest Categories" from these age groups: "Below 30 Only" and "Age 30+ Only". Ads were put into these categories if their age ranges were entirely below 30 (e.g. 18-25), or entirely above (e.g. 30-65+).

Ad Creation Date: You can sort ads by the date that they were created. However, if the 'Ad Creation Date' was not provided, or was not transcribed correctly from the PDF, I used the 'Ad End Date' instead for sorting purposes.

Ad Spend: Original ad spend was provided in Rubles (RUB). Conversion to USD was calculated based on the rate at "Ad Creation Date" or "Ad End Date" (whichever was available), although I do not know if ads were paid for on the same day they were created.

Ad Impressions: According to Facebook, impressions are a proxy for "how often your ads were on screen for your target audience." You can read more about impressions at Facebook.

Placements: Where an ad will be placed. It is hard to tell what Facebook means by "Third Party Apps"; Facebook seems to have renamed this feature "Audience Network". You can read more about that here: Facebook. There also seems to be more to "Video" placements than I can understand, as some of the ads in this category don't seem to be videos.

Cost Efficiency and Conversion Rate: Cost efficiency was determined by dividing the US dollar price of an ad by either its clicks or impressions. Click conversion rates were determined by dividing impressions by clicks, to see how many people who saw an ad actually clicked on an ad. In at least one ad, there are more clicks than impressions, leading one to believe that Facebook may double-count clicks by the same user.

Language: Although there are options for both English (UK) and English (US), they always appear together in these ads.

Likes, Comments, and Shares: The data released by Facebook does not contain information about how many times a post was liked, commented on, or shared. "Clicks" does not seem to be a proxy for any of these actions, as many pages with low clicks have much higher likes, comments, and shares. I do not yet know if these likes/comments are synthetic or pre-populated, and will update this section if I receive any more information.