Using tidytags with a conference hashtag

This vignette serves as an introduction to how to use many tidytags functions through the example of analyzing tweets associated with the 2019 annual convention of the Association for Educational Communications & Technology (AECT): #aect19, #aect2019, or #aect19inspired. The information in this vignette is available scattered throughout the R documentation for the tidytags package. This vignette conveniently brings it all together in one place.

Considerations Related to Ethics, Data Privacy, and Human Subjects Research

Before working through this demonstration of the capabilities of tidytags, please take a few moments to reflect on ethical considerations related to social media research.

{tidytags} should be used in strict accordance with Twitter’s developer terms.

Although most Institutional Review Boards (IRBs) consider the Twitter data that {tidytags} analyzes to not necessarily be human subjects research, there remain ethical considerations pertaining to the use of the {tidytags} package that should be discussed.

Even if {tidytags} use is not for research purposes (or if an IRB determines that a study is not human subjects research), “the release of personally identifiable or sensitive data is potentially harmful,” as noted in the rOpenSci Packages guide. Therefore, although you can collect Twitter data (and you can use {tidytags} to analyze it), we urge care and thoughtfulness regarding how you analyze the data and communicate the results. In short, please remember that most (if not all) of the data you collect may be about people—and those people may not like the idea of their data being analyzed or included in research.

We recommend the Association of Internet Researchers’ (AoIR) resources related to conducting analyses in ethical ways when working with data about people. AoIR’s ethical guidelines may be especially helpful for navigating tensions related to collecting, analyzing, and sharing social media data.

Twitter Archiving Google Sheets

A core functionality of tidytags is collecting tweets continuously with a Twitter Archiving Google Sheet (TAGS).

For help setting up your own TAGS tracker, see the “Getting started with tidytags” (vignette("setup", package = "tidytags")) vignette, Key Task #1.

read_tags()

To simply view a TAGS archive in R, you can use read_tags(). Here, we’ve openly shared a TAGS tracker that has been collecting tweets associated with the AECT 2019 since September 30, 2019. Notice that this TAGS tracker is collecting tweets containing three different hashtags: #aect19, #aect2019, or #aect19inspired. As of August 5, 2022, this tracker had collected 2,564 tweets. This tracker is active through today, although these hashtags seems to be largely inactive in recent years.

tidytags allows you to work in R with tweets collected by a TAGS tracker. This is done with the googlesheets4 package.

One requirement for using {googlesheets4} is that your TAGS tracker has been “published to the web.” To do this, with the TAGS page open in a web browser, go to File >> Share >> Publish to the web. The Link field should be ‘Entire document’ and the Embed field should be ‘Web page.’ If everything looks right, then click the Publish button. Next, click the Share button in the top right corner of the Google Sheets window, select Get shareable link, and set the permissions to ‘Anyone with the link can view.’ The input needed for the tidytags::read_tags() function is either the entire URL from the top of the web browser when opened to a TAGS tracker, or a Google Sheet identifier (i.e., the alphanumeric string following https://docs.google.com/spreadsheets/d/ in the TAGS tracker’s URL). Be sure to put quotations marks around the URL or sheet identifier when entering it into read_tags() function.

Again, if you’re having trouble setting up or publishing the TAGS tracker, see the “Getting started with tidytags” vignette (vignette("setup", package = "tidytags")), Key Task #1.

tags_url <- "18clYlQeJOc6W5QRuSlJ6_v3snqKJImFhU42bRkM_OX8"
example_df_all <- read_tags(tags_url)
dim(example_df_all)
#> [1] 2564   18

Note that there are alternative ways to access TAGS files. One way is to simply download a CSV file from Google Sheets. In Google Sheets, navigate to File >> Download >> Comma-separated values (CSV) to do so. Be sure to do so from the TAGS Archive page. Once this file is downloaded, you can read it into R like any other CSV.

example_df_all <- readr::read_csv("my-downloaded-tags-file.csv")

pull_tweet_data()

With a TAGS tracker archive imported into R, tidytags allows you to gather quite a bit more information related to the TAGS-collected tweets with the pull_tweet_data() function. This function builds off the rtweet package (via rtweet::lookup_tweets() and rtweet::users_data()) to query the Twitter API.

However, to access the Twitter API, whether through rtweet or tidytags, you will need to apply for developers’ access from Twitter. You do this through Twitter’s developer website.

The rtweet documentation already contains a very thorough vignette, “Authentication with rtweet” (vignette("auth", package = "rtweet")), to guide you through the process of authenticating access to the Twitter API. We recommend the app-based authentication method that uses auth <- rtweet::rtweet_app(), described in the Apps section of the vignette.

For further help getting your own Twitter API keys, see the “Getting started with tidytags” vignette (vignette("setup", package = "tidytags")), specifically Pain Point #2.

Note that your dataset will often contain fewer rows after running pull_tweet_data(). This is because rtweet::lookup_tweets() is searching for tweet status IDs that are currently available. Any tweets that have been deleted or made “protected” (i.e., private) since TAGS first collected them are dropped from the dataset. Rather than view this as a limitation, we see this as an asset to help ensure our data better reflects the intentions of the accounts whose tweets we have collected (see Fiesler & Proferes, 2018).

Here, we demonstrate two different ways of using pull_tweet_data(). The first method is to query the Twitter API with the tweet ID numbers from the id_str column returned by rtweet. However, a limitation of TAGS is that the numbers in this column are often corrupted because Google Sheets considers them very large numbers (instead of character strings) and rounds them by putting them into exponential form. The results of this first method are stored in the variable example_after_rtweet_A below. The second method pulls the tweet ID numbers from the tweet URLs. For example, the tweet with the URL https://twitter.com/tweet__example/status/1176592704647716864 has a tweet ID of 1176592704647716864. The results of this second method are stored in the variable example_after_rtweet_B below.

app <- rtweet::rtweet_app(bearer_token = Sys.getenv("TWITTER_BEARER_TOKEN"))
rtweet::auth_as(app)

example_after_rtweet_A <- pull_tweet_data(id_vector = example_df_all$id_str)
example_after_rtweet_B <- pull_tweet_data(url_vector = example_df_all$status_url)

When this vignette was run on Aug 09 22, the TAGS tracker had collected 18 variables associated with 2564 tweets. The first method searching with id_str collected 66 variables associated with 2195 tweets. The second method using ‘tidytags::get_char_tweet_ids()’ collected 66 variables associated with 2195 tweets.

Notice how many more variables are in the dataset after using pull_tweet_data(), and how many more tweets are in the dataset when using the second method. We have found that in the process of storing and retrieving tweet IDs, the long string of numbers can sometimes be interpreted as an object of type double (i.e., numeric) and subsequently converted into scientific notation form. This loses the specific identifying use of the string of numerical digits. Therefore, we strongly recommend the second method, obtaining tweet IDs from the tweet URL, which is why we have included get_char_tweet_ids() as an internal function in the tidytags package.

The built-in default of pull_tweet_data() is to simply enter the dataframe retrieved from read_tags() and implement the second method, retrieving metadata starting with tweet URLs. That is, pull_tweet_data(read_tags(example_url)). Take a quick look at the result, viewed with the glimpse() function from the dplyr package:

example_after_rtweet <- pull_tweet_data(read_tags(tags_url))
dplyr::glimpse(example_after_rtweet)
#> Rows: 2,195
#> Columns: 66
#> $ created_at                    <dttm> 2020-04-19 15:22:23, 2020-03-01 15:00:41, 2020…
#> $ id                            <dbl> 1.251954e+18, 1.234207e+18, 1.229405e+18, 1.225…
#> $ id_str                        <chr> "1251954312772812801", "1234206946732830720", "…
#> $ full_text                     <chr> "RT @RoutledgeEd: Congrats to authors Joseph Re…
#> $ truncated                     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ display_text_range            <dbl> 140, 140, 140, 140, 268, 140, 140, 140, 140, 14…
#> $ entities                      <list> [[<data.frame[1 x 2]>], [<data.frame[1 x 2]>],…
#> $ source                        <chr> "<a href=\"https://mobile.twitter.com\" rel=\"n…
#> $ in_reply_to_status_id         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ in_reply_to_status_id_str     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ in_reply_to_user_id           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ in_reply_to_user_id_str       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ in_reply_to_screen_name       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ geo                           <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ coordinates                   <list> [<data.frame[1 x 3]>], [<data.frame[1 x 3]>], …
#> $ place                         <list> [<data.frame[1 x 3]>], [<data.frame[1 x 3]>], …
#> $ contributors                  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ is_quote_status               <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ retweet_count                 <int> 4, 28, 2, 2, 2, 9, 9, 9, 9, 9, 1, 9, 9, 9, 9, 9…
#> $ favorite_count                <int> 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0,…
#> $ favorited                     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ retweeted                     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ possibly_sensitive            <lgl> NA, NA, NA, NA, FALSE, NA, NA, NA, NA, NA, FALS…
#> $ lang                          <chr> "en", "en", "en", "en", "en", "en", "en", "en",…
#> $ retweeted_status              <list> [<data.frame[1 x 30]>], [<data.frame[1 x 30]>]…
#> $ quoted_status_id              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1.21854…
#> $ quoted_status_id_str          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "121854…
#> $ quoted_status_permalink       <list> [<data.frame[1 x 3]>], [<data.frame[1 x 3]>], …
#> $ quoted_status                 <list> [<data.frame[1 x 26]>], [<data.frame[1 x 26]>]…
#> $ text                          <chr> "RT @RoutledgeEd: Congrats to authors Joseph Re…
#> $ favorited_by                  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ scopes                        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ display_text_width            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ quote_count                   <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ timestamp_ms                  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ reply_count                   <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ filter_level                  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ metadata                      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ query                         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ withheld_scope                <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ withheld_copyright            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ withheld_in_countries         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ possibly_sensitive_appealable <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ user_id                       <dbl> 1.251952e+18, 3.294167e+09, 9.225363e+17, 8.048…
#> $ user_id_str                   <chr> "1251951804398669825", "3294167372", "922536306…
#> $ name                          <chr> "Harriet Watkins", "Augusta Avram", "AECT GSA",…
#> $ screen_name                   <chr> "Harriet96152202", "ELTAugusta", "gsa_aect", "A…
#> $ location                      <chr> "", "British Columbia, Canada", "", "", "Moscow…
#> $ description                   <chr> "Love educational technology, online learning a…
#> $ url                           <chr> "https://t.co/ztRaRj9BLo", "https://t.co/OpLrmt…
#> $ protected                     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ followers_count               <int> 7, 789, 452, 2050, 2253, 452, 861, 1553, 1719, …
#> $ friends_count                 <int> 17, 1325, 47, 15, 1362, 47, 482, 714, 728, 0, 1…
#> $ listed_count                  <int> 0, 0, 4, 57, 178, 4, 6, 0, 149, 98, 124, 12, 12…
#> $ user_created_at               <chr> "Sun Apr 19 19:12:33 +0000 2020", "Sun Jul 26 0…
#> $ favourites_count              <int> 9, 5096, 147, 772, 4252, 147, 254, 4681, 6669, …
#> $ verified                      <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ statuses_count                <int> 9, 7313, 2351, 1388, 12144, 2351, 261, 4980, 11…
#> $ profile_image_url_https       <chr> "https://pbs.twimg.com/profile_images/125195195…
#> $ profile_banner_url            <chr> "https://pbs.twimg.com/profile_banners/12519518…
#> $ default_profile               <lgl> TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, FA…
#> $ default_profile_image         <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
#> $ user_withheld_in_countries    <list> [], [], [], [], [], [], [], [], [], [], [], []…
#> $ derived                       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ user_withheld_scope           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ user_entities                 <list> [[<data.frame[1 x 5]>], [<data.frame[1 x 5]>]]…

At this point, the purpose of tidytags should be restated. TAGS tweet trackers are easily set up and maintained, and they do an excellent job passively collecting tweets over time. For instance, the example TAGS tracker we demo here has collected thousands of tweets related to the AECT 2019 annual convention since September 30, 2019. In contrast, running this query now using rtweet::search_tweets() is limited by Twitter’s API, meaning that an rtweet search can only go back in time 6-9 days, and is limited to returning at most 18,000 tweets per query. So, if you are interested in tweets about AECT 2019, today you could get almost no meaningful data using rtweet alone.

rtweet_today <-
  rtweet::search_tweets("#aect19 OR #aect2019 OR #aect19inspired")

You can see that an rtweet search for #AECT19 tweets run today returns 0tweets.

In sum, although a TAGS tracker is great for easily collecting tweets over time (breadth), it lacks depth in terms of metadata related to the gathered tweets. Specifically, TAGS returns information on at most 18 variables; in contrast, rtweet returns information on 66 variables. Thus, tidytags brings together the breadth of TAGS with the depth of rtweet.

lookup_many_tweets()

The Twitter API only allows the looking up of 90,000 tweet IDs at a time, a rate limit which resets after 15 minutes. Hence rtweet::lookup_tweets() will only return results for the first 90,000 tweet IDs in your dataset. The function tidytags::lookup_many_tweets() will automatically divide your dataset into batches of 90,000 tweets, looking up one batch per 15 minutes until finished. Note that lookup_many_tweets() also works for datasets with fewer than 90,000 tweets.

Because our AECT 2019 examples includes fewer than 90,000 tweets (and because lookup_many_tweets() involves waiting for 15 minutes between batches), we do not include an example here. However, this function can be used in the same way as pull_tweet_data().

get_upstream_tweets()

If your research questions conceptualize your tweet dataset as a conversation or affinity space, it may be useful to retrieve and add additional tweets. Specifically, TAGS collects tweets that contain one or more keywords or text strings. For example, the TAGS tracker we have been working with in this vignette collected tweets containing the keywords: #aect19 OR #aect2019 OR #aect19inspired. This is a reasonable approach, from a researchers’ point of view. However, participants who have been following or contributing to these hashtags would also see additional tweets in these “conversations” because Twitter connects together tweets that reply to other tweets into potentially lengthy reply threads. Tweets in a reply thread are all displayed to a user viewing tweets on Twitter’s platform, but because some tweets in a thread may not contain the hashtag of interest, not all tweets in a user’s experience of a conversation would be collected by TAGS. Additionally, tweets contained in a reply thread but composed before the TAGS tracker was initiated would also be left out of the dataset.

There is a solution to this problem. Because the Twitter API offers a in_reply_to_status_id_str column, it is possible to iteratively reconstruct reply threads in an upstream direction, that is, retrieving tweets composed earlier than replies in the dataset. We include the get_upstream_tweets() function in tidytags to streamline this process. We also print output at each iteration to demonstrate how the process is progressing.

example_with_upstream <-
  get_upstream_tweets(example_after_rtweet)

The dataset contained 2195 tweets at the start. Running get_upstream_tweets() added 29 new tweets.

Unfortunately, due to limitations in what information is provided by the Twitter API, it is not practical to retrieve downstream replies, or those tweets in a reply thread that follow a tweet in the dataset but neglect to include the hashtag or keyword.

process_tweets()

After pull_tweet_data() is used to collect additional information from TAGS tweet IDs (in this case, the example_after_rtweet dataframe), the tidytags function process_tweets() can be used to calculate additional attributes and add these to the dataframe as new columns. Specifically, five new variables are added: mentions_count, hashtags_count, urls_count, tweet_type, and is_self_reply. This results in 71 variables associated with the collected tweets.

example_processed <- process_tweets(example_after_rtweet)

Notice that you now have 71 variables associated 2195 tweets.

At this point, depending on your research questions, you may wish to calculate some descriptive statistics associated with this tweet data. For instance, the mean and standard deviation of hashtags per tweet:

mean_hash <- round(mean(example_processed$hashtags_count), 2)
sd_hash <- round(sd(example_processed$hashtags_count), 2)

This shows that the mean number of hashtags per tweet is 1.48 with a standard deviation of SD = 1.36.

Or, perhaps, the mean, median, and max number of mentions per tweet would be useful to know:

mean_mentions <- round(mean(example_processed$mentions_count), 2)
sd_mentions <- round(sd(example_processed$mentions_count), 2)
median_mentions <- median(example_processed$mentions_count)
max_mentions <- max(example_processed$mentions_count)

The mean number of mentions per tweet is 1.35 (SD = 1.31). The median is 1, and the maximum number of mentions in a tweet is 14.

get_url_domain()

The tidytags function get_url_domain() combines the expand_urls() function from the longurl package and the domain() function from the urltools package to easily return the domain names of any hyperlinks including in tweets.

As an example, get_url_domain() finds that the domain in the shortened URL bit.ly/2SfWO3K is aect.org:

get_url_domain("bit.ly/2SfWO3K")
#> [1] "aect.org"

It may also be of interest to examine which websites are linked to most often in your dataset. get_url_domain() can be combined with a function from base R like table() to calculate frequency counts for domains present in the dataset. This process is useful to get a picture of to where else on the Internet tweeters are directing their readers’ attention.

Keep in mind, however, that this process takes a bit of time.

example_urls <- dplyr::filter(example_processed, urls_count > 0)
urls_list <- list()
for(i in 1:nrow(example_urls)) {
  urls_list[[i]] <- example_urls$entities[[i]]$urls$expanded_url
}
urls_vector <- unlist(urls_list)
example_domains <- get_url_domain(urls_vector)
domain_table <- tibble::as_tibble(table(example_domains))
domain_table_sorted <- dplyr::arrange(domain_table, desc(n))
head(domain_table_sorted, 20)
#> # A tibble: 20 × 2
#>    example_domains                 n
#>    <chr>                       <int>
#>  1 twitter.com                   127
#>  2 convention2.allacademic.com    35
#>  3 instagram.com                  24
#>  4 aect.org                       20
#>  5 docs.google.com                12
#>  6 youtube.com                    11
#>  7 drive.google.com                8
#>  8 caranorth.com                   7
#>  9 nodexlgraphgallery.org          7
#> 10 sites.google.com                7
#> 11 apps.apple.com                  6
#> 12 litnet.co.za                    6
#> 13 isfsu.blogspot.com              5
#> 14 tecfa.unige.ch                  5
#> 15 viral-notebook.com              5
#> 16 app.core-apps.com               4
#> 17 flipsnack.com                   4
#> 18 linkedin.com                    4
#> 19 play.google.com                 4
#> 20 springer.com                    4

Unsurprisingly, in this dataset, by far the most common domain (as of August 5, 2022) was “twitter.com”, meaning that AECT 2019 tweeters were linking most often to other Twitter content. Other common domains included “convention2.allacademic.com” (i.e., the host of the conference website, including the schedule and session information) as well as “instagram.com” and “youtube.com”, where tweeters likely shared conference-related content.

filter_by_tweet_type()

This function quickly subsets the data, returning just the tweets of the type indicated (e.g., filter_by_tweet_type(df, "reply") returns only reply tweets). The filter_by_tweet_type() function can also be used to look at how many tweets of each type are present in the dataset.

In the dataset of 2195 tweets, there are 62 replies, 1146 retweets, 100 quote tweets, and 457 tweets that are not replies, retweets, or quote tweets.

create_edgelist()

Another useful approach to social media research is social network analysis. Getting started with social network analysis is as simple as producing an edgelist, a two-column dataframe listing senders and receivers. An edgelist gives a complete accounting of whom is interacting with whom. In Twitter, this is complicated somewhat by the number of ways a user is able to interact with someone else, namely, through replying, retweeting, quote tweeting, mentioning, and liking tweets. The tidytags function create_edgelist() uses filter_by_tweet_type() to create an edgelist that takes into account three of these types of interaction. create_edgelist() returns a dataframe with three columns: two for the sender and receiver Twitter handles, and a third column listing the tweet type (i.e., the form of interaction). The default is for create_edgelist() to create an edgelist of all possible interactions, but focusing on one type of interaction is easily accomplished as well (e.g., looking at interactions through replies using create_edgelist(df, "reply")).

Run create_edgelist() after completing get_upstream_tweets() and process_tweets() for a complete picture of the interactions.

example_edgelist <-
  create_edgelist(process_tweets(example_with_upstream))
head(example_edgelist, 20)
#>    tweet_type         sender        receiver
#> 1       reply  nicolapallitt     eromerohall
#> 2       reply       jeroen69         vdennen
#> 3       reply    jmenglund03        bonni208
#> 4       reply          DKSch           DKSch
#> 5       reply          DKSch           DKSch
#> 6       reply      _valeriei     eromerohall
#> 7       reply    jmenglund03       SBarksway
#> 8       reply       DrTerriC       mete_akca
#> 9       reply       pazureka       SBarksway
#> 10      reply       pazureka       SBarksway
#> 11      reply    TAmankwatia     TAmankwatia
#> 12      reply PaulineMuljana       robmoore3
#> 13      reply          yinbk    DrVirtuality
#> 14      reply    arasbozkurt     arasbozkurt
#> 15      reply    arasbozkurt     arasbozkurt
#> 16      reply    arasbozkurt     eromerohall
#> 17      reply      correia65   AnnaRoseLeach
#> 18      reply    jmenglund03  PaulineMuljana
#> 19      reply PaulineMuljana AmyLomellini_ID
#> 20      reply  teacherrogers   teacherrogers

We can then easily visualize the edgelist as a sociogram using the tidygraph and ggraph packages.

First, create a graph object using tidygraph:

if (requireNamespace("tidygraph", quietly = TRUE)) {
  example_graph <-
    tidygraph::as_tbl_graph(example_edgelist)
  example_graph <-
    dplyr::mutate(example_graph,
                  popularity = tidygraph::centrality_degree(mode = 'in'))
}

Then plot using ggraph:

if (requireNamespace("ggraph", quietly = TRUE) &
    requireNamespace("ggplot2", quietly = TRUE)
) {
  ggraph::ggraph(example_graph, layout = 'kk') +
    ggraph::geom_edge_arc(alpha = .2,
                          width = .5,
                          strength = .5,
                          edge_colour = 'steelblue'
    ) +
    ggraph::geom_node_point(alpha = .4, ggplot2::aes(size = popularity)) +
    ggplot2::scale_size(range = c(1,10))
}

plot of chunk network-visualization

Running create_edgelist() also provides a simple way to re-look at how many tweets of each type are present in the dataset, using the count() function from dplyr.

dplyr::count(example_edgelist, tweet_type, sort = TRUE)
#>   tweet_type    n
#> 1    retweet 1146
#> 2      quote  108
#> 3      reply   71

Note that create_edgelist() does not yet accept type = "mention" or type = "like" parameters due to the complicated ways mentions are included in tweets as well as limitations of the information provided by the Twitter API.

add_users_data()

Finally, tidytags also has functionality to add user-level data to an edgelist through the function add_users_data(). These additional features are very useful when taking an inferential approach to social network analysis, such as building influence or selection models.

example_senders_receivers_data <- add_users_data(example_edgelist)
dplyr::glimpse(example_senders_receivers_data)
#> Rows: 1,325
#> Columns: 47
#> $ tweet_type                       <chr> "reply", "reply", "reply", "reply", "reply",…
#> $ sender                           <chr> "nicolapallitt", "jeroen69", "jmenglund03", …
#> $ receiver                         <chr> "eromerohall", "vdennen", "bonni208", "DKSch…
#> $ sender_id                        <dbl> 1.183683e+08, 6.848632e+06, 6.476904e+07, 3.…
#> $ sender_id_str                    <chr> "118368334", "6848632", "64769037", "3865781…
#> $ sender_name                      <chr> "Dr Nicola Pallitt", "Jeroen", "jennιғer eng…
#> $ sender_location                  <chr> "Grahamstown, South Africa", "Boise, ID", "S…
#> $ sender_description               <chr> "EdTech @Rhodes_Uni working in @CHERTL_RU & …
#> $ sender_url                       <chr> "https://t.co/p1veXbw0pP", "https://t.co/GEk…
#> $ sender_protected                 <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
#> $ sender_followers_count           <int> 1753, 378, 663, 290, 290, 6324, 663, 5297, 1…
#> $ sender_friends_count             <int> 1924, 379, 627, 86, 86, 6280, 627, 5825, 461…
#> $ sender_listed_count              <int> 127, 7, 72, 15, 15, 416, 72, 261, 44, 44, 33…
#> $ sender_created_at                <dttm> 2010-02-28 08:13:53, 2007-06-16 03:10:38, 2…
#> $ sender_favourites_count          <int> 2040, 2847, 2052, 228, 228, 15749, 2052, 946…
#> $ sender_verified                  <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
#> $ sender_statuses_count            <int> 7157, 1537, 2918, 643, 643, 57099, 2918, 269…
#> $ sender_profile_image_url_https   <chr> "https://pbs.twimg.com/profile_images/122079…
#> $ sender_profile_banner_url        <chr> "https://pbs.twimg.com/profile_banners/11836…
#> $ sender_default_profile           <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
#> $ sender_default_profile_image     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
#> $ sender_withheld_in_countries     <list> [], [], [], [], [], [], [], [], [], [], [],…
#> $ sender_derived                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ sender_withheld_scope            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ sender_entities                  <list> [[<data.frame[1 x 5]>], [<data.frame[1 x 5]…
#> $ receiver_id                      <dbl> 9.187479e+08, 1.423597e+08, 1.477788e+07, 3.…
#> $ receiver_id_str                  <chr> "918747876", "142359679", "14777884", "38657…
#> $ receiver_name                    <chr> "Dr. Enilda Romero-Hall (She/Her)", "Vanessa…
#> $ receiver_location                <chr> "🇵🇦🇨🇦🇺🇸", "Tallahassee, FL", "Orange County,…
#> $ receiver_description             <chr> "Associate Professor @UTKnoxville | Learning…
#> $ receiver_url                     <chr> "https://t.co/bIbSToCw8t", "https://t.co/2br…
#> $ receiver_protected               <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
#> $ receiver_followers_count         <int> 1773, 1151, 7415, 290, 290, 1773, 329, 764, …
#> $ receiver_friends_count           <int> 1345, 759, 3538, 86, 86, 1345, 480, 501, 480…
#> $ receiver_listed_count            <int> 87, 40, 294, 15, 15, 87, 5, 28, 5, 5, 33, 57…
#> $ receiver_created_at              <dttm> 2012-11-01 06:56:04, 2010-05-10 13:39:35, 2…
#> $ receiver_favourites_count        <int> 7326, 1097, 23383, 228, 228, 7326, 2160, 196…
#> $ receiver_verified                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
#> $ receiver_statuses_count          <int> 422, 2102, 20642, 643, 643, 422, 428, 1942, …
#> $ receiver_profile_image_url_https <chr> "https://pbs.twimg.com/profile_images/141641…
#> $ receiver_profile_banner_url      <chr> "https://pbs.twimg.com/profile_banners/91874…
#> $ receiver_default_profile         <lgl> FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE…
#> $ receiver_default_profile_image   <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
#> $ receiver_withheld_in_countries   <list> [], [], [], [], [], [], [], [], [], [], [],…
#> $ receiver_derived                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ receiver_withheld_scope          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ receiver_entities                <list> [[<data.frame[1 x 5]>], [<data.frame[1 x 5]…