Perils and Challenges of Social Media and Election Manipulation Analysis: The 2018 US Midterms

One of the hallmarks of a free and fair society is the ability to conduct a peaceful and seamless transfer of power from one leader to another. Democratically, this is measured in a citizen population's trust in the electoral system of choosing a representative government. In view of the well documented issues of the 2016 US Presidential election, we conducted an in-depth analysis of the 2018 US Midterm elections looking specifically for voter fraud or suppression. The Midterm election occurs in the middle of a 4 year presidential term. For the 2018 midterms, 35 senators and all the 435 seats in the House of Representatives were up for re-election, thus, every congressional district and practically every state had a federal election. In order to collect election related tweets, we analyzed Twitter during the month prior to, and the two weeks following, the November 6, 2018 election day. In a targeted analysis to detect statistical anomalies or election interference, we identified several biases that can lead to wrong conclusions. Specifically, we looked for divergence between actual voting outcomes and instances of the #ivoted hashtag on the election day. This analysis highlighted three states of concern: New York, California, and Texas. We repeated our analysis discarding malicious accounts, such as social bots. Upon further inspection and against a backdrop of collected general election-related tweets, we identified some confounding factors, such as population bias, or bot and political ideology inference, that can lead to false conclusions. We conclude by providing an in-depth discussion of the perils and challenges of using social media data to explore questions about election manipulation.


INTRODUCTION
Inherent bias of drawing conclusions from political polls stretch back to the famous headline of "Dewey Defeats Truman" in the 1948 US Presidential election [43]. Confounding factors that led to false conclusions in the 1948 election included telephone surveys which did not use robust statistical methods and an under-sampling of Truman supporters. Likewise, in 2016, many political pundits underestimated the likelihood that Donald Trump would be elected as President of the United States. The research community demonstrated a strong interest in studying social media to get a better understanding of how the 2016 events unfolded. Numerous studies concluded that social media can be a vehicle for political manipulation, citing factors such as the effect of fake news and disinformation [5,9,28,29,33,46,49,51,55], bots [7,8,41,50,53,58,59], polarization [3,6], etc.
Research also suggests that social media data comes with significant biases that limit the ability to forecast offline events, e.g., the outcomes of political elections [22][23][24][25][26]38], or public health issues [2,36,57]. Despite these well documented issues and challenges, social media are frequently relied upon and referred to as a trusted source of information to speculate about, or try to explain, offline events. One such example is the recent 2018 US Midterm elections where widespread claims of voter fraud and voter suppression appeared in the news, often based on social media reports and accounts.
In this paper, we seek to understand whether it is possible to use Twitter as a sensor to estimate the expected amount of votes generated by each state. We propose an undertaking in which we use the tweets with the hashtag #ivoted on the election day as a proxy for actual votes. At first, this seemed like a promising research direction, as tweet volumes and vote counts correlated well for 47 of the 50 states in America. We also considered if this would be a useful approach to detecting voting issues like fraud or suppression, for example by isolating statistical anomalies in estimated and observed volumes. To get a sense of expected tweet volume, we carried out the same analysis against general keywords related to the midterm election from a month before election day through two weeks after the election. We also considered how bots may have had an influence on election manipulation narratives by measuring their activity in the social media discourse. We finally applied a political ideology inference technique and tested it to see how well it compared to an external source of polls data.
The conclusions from our analysis are complex, and this work is meant as a note of caution about the risks of using social media analysis to infer political election manipulation such as voter fraud and voter suppression.

Contributions of this work
After exploring multiple Twitter data sets and two external sources (vote counts and Gallup), we came to the following contributions: • We explored how social media analysis carries a lot of risks involved mainly with population bias, data collection bias, lack of location-specific data, separation of bots (and organizations) from humans, information verification and factchecking, and lastly assigning political ideology. • We saw a significant difference in the removal of retweets in our analysis as compared with including them. However, the effect was isolated to one particular state, Texas, indicating that the sensitivity of this effect could be a factor of location. • There is a significant difference between people's reported political ideologies using a source like Gallup versus that can be inferred on social media. It is not possible to know if this is due to limitations of political inference algorithms, confounders, population representation biases, or else. • In the two states (NY & TX) where there was a statistically significant discrepancy between vote counts and instances of self-reported voting via #ivoted hashtags, we found only limited anecdotal evidence of tweets reporting issues of voter fraud or suppression. The divergence can possibly be explained by confounding factors, locality and selection bias, or social influence of particular candidates in those states (e.g., Alexandria Ocasio-Cortez in NY and Beto O'Rourke in TX).

BACKGROUND
The US Midterm elections were held on 6 November, 2018. They are referred to as mid-term elections because they occur in the middle of a presidential term. Senators serve for 6 years, thus, every 2 years, nearly a third of the Senators are up for re-election. The Senate is divided into 3 classes, depending on which year they were elected. Class I was elected in 2012 and are up for re-election in 2018. For 2018, 35 Senators out of a total of 100 senators in the 115th Congress will be up for re-election. Of the 35 senators up for election, 33 are in Senate Class I and two are Senators who vacated, whereas 15 are in what is to be considered contentious races. The 33 Class I are 30 (23 Democrats (D), 5 Republicans (R), 2 Independents (I)) up for re-election and 3 Republicans (R) who are retiring. Details on the Senate seats up for re-election are in Table 1. Additionally, all 535 House of Representative seats are up for re-election every 2 years. Excluded from our analysis are the non-voting delegates for DC and the US Territories.

RELATED WORK
Since the 2016 US Presidential election, there has been a big spotlight on the sovereignty of the US election system. The Bot Disclosure and Accountability Act of 2018 1 gave clear guidelines for what has to be disclosed by social media companies. The article The Rise of Social Bots [18] brought awareness to the issue of social bots in social media platforms. In [7], Bessi & Ferrara focused on social bots detection within the online discussion related to the 2016 presidential election. Other than characterizing the behavioral differences between humans and bots, there was not an in-depth analysis of any malicious intent. In this paper, we address the potential malicious activity in online political discussion along the lines of voter fraud, voter suppression, political misinformation, and then report on the biases we found.

Voting Issues
Concerns related to voter fraud took center stage after the 2000 US Presidential election, where it was argued that the candidate with the most votes lost and the Supreme Court decided the winner [39]. Since then, a host of public debate, congressional testimony, and several new laws passed, such as the Help America Vote Act [34], which surprisingly needed to happened after the National Voter Registration Act of 1993 (NVRA). 2 The effects of the NVRA were researched by Highton and Wolfinger [32], who concluded that provisions in the NVRA would increase voter turnout by 4.7%-8.7% and that purging voter rolls of those who had not voted in the last two years would have a 2% effect. Lastly, they identified the two most vulnerable non-voting groups to be those under the age of 30 and those who moved within 2 years of an election [32]. Moreover, it has been argued that the current US voter registration has a minimal impact on registration and that there is marginal value in any updated laws [31]. Therefore, the main concern argued by both parties is voter suppression [56]. Specifically, due to recent voter identification laws, there is an increased chance of voter suppression [30]. However, in this work we seek to find instances of voter suppression from an online social media analysis. To our knowledge, this has not been done before.

Political Manipulation
Social media serve as convenient platforms for people to connect and to exchange ideas. However, social media networks like Twitter and Facebook can be used for malicious purposes [17]. Especially in the context of political discussion, there is a significant risk of mass manipulation of public opinion. Concerning the ongoing investigation of Russian meddling in the 2016 US Presidential election, Badawy et al. [4] studied political manipulation by analyzing the released Russian troll accounts on Twitter. After using label propagation to assign political ideology, they found that Conservatives retweeted Russian trolls over 30 times more than Liberals and produced 36 times more tweets. More recently, Stella et al. [52] highlighted how bots can play significant roles in targeting influential humans to manipulate online discussion thus increasing in-fighting. Especially for the spread of fake news, various studies showed how political leaning [1], age [28], and education [49] can greatly affect fake news spread, alongside with other mechanisms that leverage emotions [20,21] and cognitive limits [44,45]. Additionally, Dutt et al. [16] showed how foreign actors can more so than just backing one candidate or the other, often manipulate social media for the purpose of sowing discord.

Bias
Besides manipulation, other potential problems may affect data originating from online social systems. Selection bias is one such example. Concisely, this bias yields a statistically non-representative sample of the true population. A main concern outlined by Ruths and Pfeffer [48], and to a lesser degree by Malik et al. [37], is that social media samples are not representative of the whole voting population because users self-select to participate on the platform and in specific online discussions. Each social media platform has its own set of biases. Mislove et al. [40] looked specifically at the Twitter population from a location, gender, and ethnicity viewpoint. From a location perspective, they found underrepresented counties in the Mid-West and over-represented counties in highly dense urban areas [40]. Biases in the representation of gender [47], ethnicity [11], and other sources of distortions [13] can also potentially affect the inference of political ideology.

DATA
In this study, we examine different data sources to investigate and explore the risk of using social media in the context of political election manipulation.
We used Twitter as a sensor to estimate the expected amount of votes generated by each state. For this purpose, we carried out two data collections. In the first one, we gathered the tweets with the hashtag #ivoted on election day. The second collection aimed to enlarge the spectrum to a longer period of time exploiting a variety of general keywords, related to the midterm election, to collect the tweets. As a basis for comparison, we employ two external sources. The United States Election Project is used to unveil the amount of voters in each state, while Gallup to have an estimate of the political polarization both at the country level and at the state level. By means of these three data sources, we assembled five data sets (DS1-DS5), which will be analyzed in turn in the following subsections.
DS1: #ivoted Dataset. The #ivoted Dataset (DS1) gathers the tweets with the hashtag #ivoted generated on the day of the election, November 6, 2018. It should be noticed that #ivoted was promoted by Twitter and Instagram-which typically affects the hashtag spread [19,54]-to encourage citizens to participate in the midterm elections and increase the voter turnout. We used the Python module Twyton to collect tweets through the Twitter Streaming API 3 during election day. The data collection time window ranged from 6 a.m. EST on November 6 (when the first polling station opened) to 1 a.m. HST on November 7 (2 hours after the last polling station closed). Overall, we collected 249,106 tweets. As a sanity check, we queried the OSoMe API provided by Indiana University [14]. OSoMe tracks the Twitter Decahose, a pseudo-random 10% sample of the stream, and therefore can provide an estimate of the total volume: OSoME contains 29.7K tweets with the #ivoted hashtag posted by 27.2K users-it is worth noting that trending topics are typically slightly over-represented in the Twitter Decahose [14,42]-by extrapolation, this would suggest an estimated upper bound of the total volume at around 300K tweets. In addition, on election day, Twitter reported that the hashtag #ivoted was trending with over 200K tweets (cf. Fig. 1). Having collected 249K such tweets, we can conclude that we have at our disposal a nearly complete #ivoted sample dataset.
DS2 & DS3: General Midterm Dataset. In the General Midterm Dataset, we collect tweets on a broader set of keywords. Further, we consider two different time windows for the data collection. The rationale behind these choices is to evaluate the sensitivity of our study against a different, but correlated, set of data. In other words, the main purpose is to detect whether any divergence arose with the #ivoted Dataset analysis or, on the other hand, to inspect the consistency of the results in different settings.
Tweets were collected by using the following keywords as a filter: 2018midtermelections, 2018midterms, elections, midterm, and midtermelections. We distinguish two data sets according to their temporal extent. In DS2, we consider only tweets generated on the election day with exactly the same time window used for DS1. The third data set (DS3) provides a view of the political discussion from a wide-angle lens. It includes tweets from the month prior (October 6, 2018) to two weeks after (November 19, 2018) the day of the election. We kept the collection running after the election day as several races remained unresolved. As a result, DS3 consists of 2.7 million tweets, whose IDs are publicly available for download. 4 DS4: Actual Voting Data. The first external data source used as a basis of comparison is made available by the United States Election Project. They report on their website 5 the expected voter turnout per state, along with the (official or certified) information source and other statistics about voters. The data (DS4) we use in this work was assessed on November 18, 2018, and reflects a voter turnout of 116,241,100 citizens, which is aligned with other reported counts.
DS5: Party Affiliation Data. To have an assessment of the political party affiliation across the country, we make use of an evaluation provided by Gallup, through the Gallup Daily tracking survey, a system which continuously monitors Americans' attitudes and behaviors. 6 The data set (DS5), collected on January 22, 2019, depicts the political leaning over a sample size of 180,106 citizens. In particular, the data shows the percentage of Democratic and Republican population in each state and over the entire country. Gallup's evaluation shows that, at the national level, there exists a democratic advantage (7%), as 45% of the population is assessed as democratic leaning while 38% is estimated as republican.

Data Pre-processing
Data pre-processing involved only Twitter data sets and consisted of three main steps. First, we removed any duplicate tweet, which may 4 https://github.com/A-Deb/midterms 5 http://www.electproject.org/2018g 6 https://www.gallup.com/174155/gallup-daily-tracking-methodology.aspx Overall, we count for almost 3 millions tweets distributed over the three Twitter data sets (DS1-DS3). In Table 2, we report some aggregate statistics. It should be noticed that the number of authors is lower than the number of users, which in turn also includes accounts that got a retweet (or reply) of a tweet that was not captured in our collection and, thus, they do not appear as authors.

METHODOLOGY State Identification
The usage of geo-tagged tweets to assign a state to each user has been shown to not be effective, being the fraction of geo-tagged tweets around 0.5% [12]. The location of the data is of utmost importance, especially at the state and local level. However, less than 1% of the collected tweets have been geo-tagged. Nevertheless, we aim to map as many users as possible to a US state, to conduct a state by state comparison. For this purpose, we leveraged tweet metadata, which may include the self-reported user profile location. The location entry is a user-generated string (up to 100 characters), and it is pulled from the user profile metadata for every tweet. From this field, we first search for the two-letter capitalized state codes, followed by the full name of the state. Our analysis does not include Washington, D.C., so we have to ensure anything initially labeled Washington does not include any variant of DC. Using this string-search method, we managed to assign a state to approximately 50% of the tweets and 30% of the users. Some users had multiple states over their tweet history, thus, we only used the most common reported state. A few users often switched their location from a state name to something else: for example, one user went from New York, NY to Vote Blue!for such users, we kept the valid state location.

Bot Detection
Bot detection has received ample attention [18] and increasingly sophisticated techniques keep emerging [35]. In this study, we restrict our bot detection analysis to the use of the widely popular Botometer, 7 developed by Indiana University. The underpinnings of Figure 2: Bot Score Distribution the system were first published in [15,53] and further revised in [59]. Botometer is based on an ensemble classifier [10] fed by over 1,000 features related to the Twitter account under analysis and extracted through the Twitter API. Botometer aims to provide an indicator, namely bot score, that is used to classify an account either as a bot or as a human. The lower the bot score, the higher the probability that the user is not an automated and/or controlled account. In this study we use version v3 of Botometer, which brings some innovations and important detailed in [59]-e.g., the bot scores are now rescaled and not centered around 0.5 anymore.
In Figure 2, we depict the bot score distribution of the 1,131,540 distinct users in our datasets. The distribution exhibits a right skew: most of the probability mass is in the range [0, 0.2] and some peaks can be noticed around 0.3. Prior studies used the 0.5 threshold to separate humans from bots. However, according to the re-calibration introduced in the latest version of Botometer [59], along with the emergence of increasingly more sophisticated bots, we here lower the bot score threshold to 0.3 (i.e., a user is labeled as a bot if the bot score is above 0.3). This threshold corresponds to the same level of sensitivity setting of 0.5 in prior versions of Botometer (cf. Fig 5 in [59]). In both DS1 and DS3, 21.1% of the users have been classified as bots, while in DS2 the percentage achieves the 22.9% of the users. Finally, 19.5% of the 295,352 users for which a State was identified have been scored as bots.
Overall, Botometer did not return a score for 42,904 accounts, which corresponds to 3.8% of the users. To further examine this subset of users, we make use of the Twitter API. Interestingly, 99% of these accounts were suspended by Twitter, whereas the remaining 1% were protected (by privacy settings). For the users with an assigned location, only 1,033 accounts did not get a Botometer score. For those users, we assume that the accounts suspended (1,019) are bots and the private accounts (14) are humans.

Statistical Vote Comparison
Once the states have been identified and the bots detected, we compared the distribution of our various Twitter datasets (DS1, DS2, and DS3) with our control data in DS4 and DS5. To do this, we start by counting the number of tweets per state and dividing it by the total number of tweets across all states. We denote this fractional share in terms of tweets as State Tweet Rate (STR), for each state i as We then calculate the difference δ (i) for each state i. Here it is important to note that any positive value indicates more tweets than votes, as a percentage, and vice versa: Lastly, we convert the difference into standard deviations s(i) (stdevs) by dividing δ (i) by the standard deviation of all differences: being δ the average difference over all states. We then inspect the results for any anomalous state i whose standard deviation |s(i)| ≥ 2. States beyond two standard deviations are worth further inspection.

Political Ideology Inference
We classify users by their ideology based on the political leaning of the media outlets they share. We use lists of partisan media outlets compiled by third-party organizations, such as AllSides 8 and Media Bias/Fact Check. 9 We combine liberal and liberal-center media outlets into one list and conservative and conservative-center into another. The combined list includes 641 liberal and 398 conservative outlets. However, in order to cross reference these media URLs with the URLs in the Twitter dataset, we need to get the expanded URLs for most of the links in the dataset, since most of them are shortened. As this process is quite time-consuming, we get the top 5,000 URLs by popularity and then retrieve the long version for those. These top 5,000 URLs account for more than 254K, or more than 1/3 of all the URLs in the dataset. After cross-referencing the 5,000 long URLs with the media URLs, we observe that 32,115 tweets in the dataset contain a URL that points to one of the liberal media outlets and 25,273 tweets with a URL pointing to one of the conservative media outlets. We use a polarity rule to label Twitter users as liberal or conservative depending on the number of tweets they produce with links to liberal or conservative sources. In other words, if a user has more tweets with URLs to liberal sources, he/she is labeled as liberal and vice versa. Although the overwhelming majority of users include URLs that are either liberal or conservative, we remove any user that has equal number of tweets from each side. Our final set of labeled users includes 38,920 users.
To classify the remaining accounts as liberal or conservative, we use label propagation, similar to prior work [4]. For this purpose, we construct a retweet network, containing nodes (Twitter users) with a direct link between them if one user retweet a post of another. To validate results of the label propagation algorithm, we apply stratified cross (5-fold) validation to a set of more than 38,920 seeds. We train the algorithm on 4/5 of the seed list and see how it performs on the remaining 1/5. Both precision and recall scores are around 0.89. Since we combine liberal and liberal-center into one list (same for conservatives), we can see that the algorithm is not only labeling the far liberal or conservative correctly, which is a relatively easier task, but it is performing well on the liberal/conservative center as well. Overall, we find that the liberal users population is almost three times larger the conservative counterpart (73% vs. 27%).

RESULTS #ivoted (DS1) Statistical Analysis
There were 249,106 tweets in the #ivoted data set, of those we could map a state location for 78,162 unique authors. Once we remove the 15,856 bots (using a bot threshold score of 0.3), we have 62,306 remaining authors of tweets and retweets. After applying the method described in Statistical Vote Comparison section, we see that three states show an anomalous behavior from the remaining 47 states. Figure 4a shows how New York is 5.8 standard deviations greater than the mean difference between the #ivoted percentage and the actual voting percentage. Furthermore, both California and Texas have a stdev 2.2 greater than the mean. This would lead to believe that if there was voter suppression, it would most likely be in these three states, as they exhibit significantly more self-reported voting tweets than vote counts.
However, since our data set has both tweets and retweets, to check the sensitivity of our findings, we repeated our analysis without the retweets. Once removed, the 34,754 remaining tweets, again without bots, we noticed something interesting. Not only did Texas drop from 2.2 stdevs to 0.4 stdevs, but New York increased from 5.8 stdevs to 6.3 stdevs. This highlights the sensitivity our this type of analysis to location-specific factors such as state, and information dynamic factors such as retweet filtering. Further inspection showed that 62.2% of the tweet activity in Texas (in the #ivoted data set) was based on retweets, highlighting how this class of tweet can produce different results for some populations, and similar ones for others, since the average across the states stayed at 0 (e.g., see Figure 4b).

General Midterm (DS2&DS3) Statistical Analysis
We carried out the same analysis against the general keywords data set both on election day (DS2) and for a month before to two weeks after the election (DS3).
In DS2, we have 72,022 users, from which we filtered out 16,859 bots (using a bot threshold of 0.3). From the remaining 55,163 authors, we were able to map a state for 26,081 users. Performing the same comparative analysis from before, we found the same anomalies in the same three states: CA (1.6 stdev), TX (2.8 stdev), and NY (5.6 stdev). Visually, this can be appreciated in Figure  4c. Expanding the analysis to DS3, we removed 206,831 users, as classified as bots, from the set of 977,966 authors. This left us with 771,135 users from which we could identify a state for 295,705 of them. The statistical analysis revealed the same outliers also in this data set: CA (2.8 stdev), TX (3.1 stdev), and NY (4.7 stdev), as can been seen in Figure 4d.

Bot Sensitivity
Next, we investigate whether discarding malicious accounts, such as social bots, from the set of users may have affected the findings above. Table 4 shows the number (and percentage) of bots and humans per state in DS3. The list of states is sorted (in descending order) according to the percentage of bots, while the horizontal line separates the states with a bots percentage above and below the average (20.3%). Note in particular that all the three outliers (in bold) have values below the average. However, the distribution of bot prevalence per state varies greatly and it should be analyzed On the other side, this topic opens the way to further discussions about bots association with a given state. One could make the argument that if the account was identified as a bot, there is no point to assigning it to a state. However, the fact that automated accounts declare a location in their profile can be viewed as a malicious strategy to embed in the social system thus, it should be prudently examined. For these reasons, we repeated our analysis including social bots in the users set. Results with or without bots are substantially unchanged. In the interest of space, we do not duplicate the maps shown in Figure 4, but the same anomalies are revealed if bots are retained. It should be noticed that also for the #ivoted dataset (DS1), the percentage of bots in the three outlier states are below the average (21.0%), NY (16.0%), CA (19.4%) and TX (20.2%), respectively.

Political Ideology Analysis
Next we examine what topics talk about and how they address politically charged topics. Table 3 shows the top 10 hashtags discussed respectively by humans and bots, for both liberal and conservative ideologies. The hashtags have been colored to show the common topics between bots and humans for each political wing. The amount of overlap between bots and humans hashtags is noticeable. This is likely the reason why the removal of bots from the analyzed accounts did not have any significant impact on our outcome. To carefully interpret this table, it should be noticed that the liberal group is almost three times larger than the conservative one, as we stated in Political Ideology section.
Additionally, we took our political ideology labels by state and compared with DS5, the Gallup poll survey. As mentioned before, the political ideology inference assigned 73% liberal labels and 27% conservative labels to the nation at a whole. That compares with Gallup reporting of 45% to 38% for the Nation as a whole. At the state level, we ran a comparison to see the difference in our assessment of political leaning of a state versus Gallup's. For example, Alabama is 35% liberal and 50% conservative, according to Gallup, giving the state a marked Republican advantage. However, in Twitter we observed 42% Liberal and 31% Conservative user labels, which may suggest the opposite trend. Figure 3 shows the difference between the Gallup poll and our analysis. For Alabama going from a Republican advantage of 15% (Gallup) to a Democratic advantage of 11% (Twitter) would imply a shift of 26 percent points toward the liberal side. Overall, every state showed movement toward the left, as low as a few percent points and as high as over 60% difference. This corroborates the suspect that left-leaning users are over-represented in our data.

Voting Issues
New York was the state that exhibited the strongest statistical anomaly. Thus, we conducted a manual inspection reading all tweets originating from there. We found no red flags, but we isolated a few tweets of interest. The first one is in Figure 5 and it is from a user who was classified as a human and from inspection of the account shown to live in New York. The user mentions some important issues: at 11:20 am on the day of the election, they found out they are the Top    There is no information to suggests this was resolved in any meaningful way or if the accusation is substantiated. A second example of potential voter issue was found after a manual inspection of the tweets in New York. The tweet thread in Figure 6 is heavily redacted, but it shows an ongoing conversation through replies and it shows multiple people presenting multiple sides. The original tweet was actually posted on 5 November, 2018 and by the time of our viewing had received a significant number of retweets. It is from this original tweet that we see a reply where the user is complaining that they can not get to the voting booth without a photo ID. User 3 then asks for the name and number of the community and then User 4 provides an election hotline number. This indicates that many people today are willing to speculate on Twitter, but nothing seems to indicate that they also were going to the official Department of Justice website to file a complaint.
From our inspection other tweets that are noteworthy include: (1) "First time voter in my family registered over a month ago on DMV website online not realizing it's not automated. . . she could not vote. Not right." (2) "More voter fraud in Ohio. Why is it that all the errors are always the Democrats?? Because the only way they can win is if they cheat!! This madness needs to stop." (3) What we did see in our Twitter collection is early skepticism that there would be false claims of voter fraud. A user tweeted "a little over 24 hours from now the Racist in Chief will start Tweeting about rigged elections, voter fraud and illegal aliens voting en mass...". (4) Shortly afterwards, many people started to retweet a user that stated "Massive voter fraud in Texas Georgia Florida and others" and also indicating that MSM (main stream media) are putting out fake polls. The Washington Post @washingtonpost tweeted "without evidence, Trump and Sessions warn of voter fraud" which was retweeted throughout election day. (5) There was a user who tweeted about voting machine malfunctions which mapped to a story/blog from the Atlanta Journal Constitution (https://t.com/riCGdbwQ6R) about machines being down; people left and were encouraged to come back. There was an offer for casting a paper provisional ballot, but many said they did not trust the paper ballot and wanted to vote on a machine.

DISCUSSION & RECOMMENDATIONS
Our results have highlighted the challenges of using social media in election manipulation analysis. A superficial interpretation of anomalies in online activity compared to real world data can lead to misleading or false conclusions. In our case, we wanted to determine the feasibility of using social media as a sensor to detect election manipulation such as widespread voter suppression or voter fraud. While we did not find widespread or systematic manipulation, we learned a few lessons worthy of a discussion: • Data biases of online platforms can drastically affect the feasibility of a study. In our case, we were looking for a representative sample of actual voters who are not bots and State # of bots # of humans whose political ideology and location could be known. Despite troves of data were collected and analyzed, various encountered biases could not be adjusted for. • The second main issue is consistency in the analysis: the sensitivity to choices made when carrying out data cleaning, parameter settings of inference algorithms, etc. yield a so-called garden of forking paths [27]: some results can significantly vary in function of such choices (for example, location bias and the removal or retention of retweets played a role in determining whether Texas exhibited a statistical anomaly in terms of expected versus cast votes). • Political ideologies reported by Gallup significantly vary with respect to that can be inferred on social media. We were unable to determine if this is due to limitations of the employed political inference tool, population biases, or other factors.
This is an open problem in social media analysis and a necessary one to tackle before social media can be used to robustly replace polling. • The actual voting numbers reported by official sources correlated very closely to what we inferred from our analysis on Twitter for 47 of 50 states. As such, the approach seemed promising to identify voter suppression or fraud. However, the results show a more complex picture: no evidence of fraud or suppression beyond anecdotal was found in the three anomalous states under scrutiny. Yet, we suggest that prior and during elections there should be an online social media presence for the Department of Justice to engage with people who have a potential voting issue.

CONCLUSION AND FUTURE WORK
In this work, we conducted an investigation to analyze social media during the 2018 US Midterm election. In addition to studying bots and the political ideology of users, we studied the correlation between people talking about voting and actual voter data. We then highlighted a few issues that could lead to inaccurate conclusions. In particular, removing or retaining the bots didn't change the outcome of our results. This was not the case in prior studies. However, in our case, removing retweets did make a significant difference for one state, Texas, suggesting a dependency, or bias, on location. The challenges we faced can all be expanded upon in future work. We only mapped a state to 44.7% of DS1 and 30.2% to DS2/DS3. If we can evaluate a user timeline to better recognize what state they may be from that would enhance future location based studies. Our political ideology inference started with the labeling of 38K users leveraging any link they posted, and then labels were propagated on the retweet network. We could potentially identify the users with high centrality and evaluate their timeline for party affiliation and approach the inference problem from a different angle. We could also focus on separating not just human from bot accounts, but also human from corporate accounts. Some of the users that were classified as human could be operating as part of a collective body, that while not necessarily malicious, may insert an inorganic bias.
Ultimately, one of the goals of this work was to explore the feasibility of using social media as a sensor to detect possible election manipulation at scale: despite our initial effort did not produce the expected results, we highlighted some useful lessons that will illuminate on future endeavors to use such data for social good.