knitr::opts_chunk$set(tidy = TRUE, tidy.opts=list(blank=FALSE, width.cutoff=50), cache=FALSE)
knitr::opts_chunk$set(
  tidy = TRUE,
  tidy.opts = list(blank = FALSE, width.cutoff = 50),
  cache = 1
)
knitr::knit_hooks$set(
  source = function(x, options) {
    if (options$engine == 'R') {
      # format R code
      x = highr::hilight(x, format = 'html')
    } else if (options$engine == 'bash') {
      # format bash code
      x = paste0('<span class="hl std">$</span> ',
                 unlist(stringr::str_split(x, '\\n')),
                 '\n',
                 collapse = '')
    }
    x = paste(x, collapse = "\n")
    sprintf(
      "<div class=\"%s\"><pre class=\"%s %s\"><code class=\"%s %s\">%s</code></pre></div>\n",
      'sourceCode',
      'sourceCode',
      tolower(options$engine),
      'sourceCode',
      tolower(options$engine),
      x
    )
  }
)

Today

  1. Earlier: Scraping data from web sites

    • HTML and CSS selectors
    • rvest
  2. Now: Social media

    • JSON and OATH
    • rtweet

Platform APIs

OAuth

How OAuth works, at a very high level...

  1. App: asks platform for a request token
  2. Platform: here is a request token and a secret
  3. App: redirects user to web page hosted by platform
  4. User: logs in and confirms level of access to be given to app
  5. Platform: redirects user back to app
  6. Platform: sends app an access token
  7. App: uses secret and access token to encrypt and authenticate

API calls


{  
   "id":"23462029838",
   "first_name":"Jason",
   "last_name":"Roos",
   "location":{  
      "id":"10952412900",
      "name":"Rotterdam, Netherlands"
   },
   "friends":{  
      "data":[  
         {  
            "name":"Nancy MacNancyface",
            "id":"5278757"
         },
         {  
            "name":"Davy McDavyface",
            "id":"72186234457"
         }
      ]
   }
}

API wrappers

library(tidyverse)
library(rtweet)
statuses <- search_tweets(q = '#hashtag', 
                          n = 10, lang = 'en', type = 'recent' )
library(tidyverse)
library(rtweet)
library(lubridate)

Basic setup of rtweet calls

library(rtweet)
token <- create_token(
  app = "my_twitter_research_app",
  consumer_key = "XYznzPFOFZR2a39FwWKN1Jp41",
  consumer_secret = "CtkGEWmSevZqJuKl6HHrBxbCybxI1xGLqrD5ynPd9jG0SoHZbD",
  acess_token = "9551451262-wK2EmA942kxZYIwa5LMKZoQA4Xc2uyIiEwu2YXL",
  access_secret = "9vpiSGKg1fIPQtxc5d5ESiFlZQpfbknEN1f1m2xe5byw7")

User information

jmtroos <- lookup_users('jmtroos') %>% users_data()
jmtroos %>% select(screen_name, name, location, description, followers_count, friends_count, statuses_count)

Timelines

wzxhzdk:6 wzxhzdk:7

wzxhzdk:8

djt %>% select(text) %>%
  tidytext::unnest_tokens('word', 'text') %>%
  count(word) %>% arrange(desc(n)) %>%
  anti_join(bind_rows(tidytext::stop_words, tibble(word = c('https', 'amp', 't.co'))))

Exercise

library(tidyverse)
library(rtweet)
search_tweets( '[ your code goes here ]' )

Before the next session

install.packages('tm', dependencies = TRUE)
install.packages('topicmodels', dependencies = TRUE)
vignette('tm', package = 'tm')

(or https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf if that doesn't work...)



jasonmtroos/rook documentation built on May 24, 2020, 3:16 p.m.