knitr::opts_chunk$set(echo = TRUE)

The User Activity API lets you query an individual user's movement through your website, by sending in the individual clientId or userId. It is accessed via the ga_clientid_activity() function.

Use Activity is available in version of googleAnalyticsR >= 0.6.9000 and needs googleAuthR >= 0.7.0.9000 so install via:

remotes::install_github("MarkEdmondson1234/googleAuthR")
remotes::install_github("MarkEdmondson1234/googleAnalyticsR")

User Activity API example

You first need to have a clientId or userId to query. You can now get this from the reporting API via the dimension ga:clientId.

Its also available via the User Explorer report in the Web UI, or via a BigQuery export, or you may be capturing the ID in a custom dimension.

Once you have an ID, specify the Google Analytics view that user was browsing and the data range of the activity you want to query:

viewId <- 123456
date_range <- c("yesterday","yesterday")
cids <- google_analytics(viewId, date_range = date_range, 
                         metrics = "sessions", dimensions = "clientId")
users <- ga_clientid_activity(cids$clientId,
                              viewId = viewId, 
                              date_range = date_range)
#> 2019-08-15 10:43:15> Fetching id: 19123.15505123
#> 2019-08-15 10:43:16> Fetching id: 12123.15657123
#> 2019-08-15 10:43:17> Fetching id: 11234.15657123
#> 2019-08-15 10:43:17> Fetching id: 21123.1565123
#> ..etc..

Multiple ids

You can send in multiple IDs of the same type in a vector:

two_clientIds <- c("1106980347.1461227730", "476443645.1541099566")
two_users <- ga_clientid_activity(two_clientIds,
                                  viewId = 81416156, 
                                  date_range = c("2019-01-01","2019-02-01"))

Return format

The API returns two types of data: session level and activity hit level. Access it via $sessions or $hits:

two_users$sessions
#    sessionId deviceCategory  platform dataSource sessionDate                    id
#1  1548361067        desktop Macintosh        web  2019-01-24 1106980347.1461227730
#2  1548261976        desktop Macintosh        web  2019-01-23 1106980347.1461227730
#3  1548251272        desktop Macintosh        web  2019-01-23 1106980347.1461227730
#4  1548017997        desktop Macintosh        web  2019-01-20 1106980347.1461227730
# ...

two_users$hits
# A tibble: 102 x 26
#   sessionId activityTime        source medium channelGrouping campaign keyword hostname
#   <chr>     <dttm>              <chr>  <chr>  <chr>           <chr>    <chr>   <chr>   
# 1 15483610… 2019-01-24 21:17:47 t.co   refer… Social          (not se… (not s… code.ma…
# 2 15482619… 2019-01-23 17:46:16 t.co   refer… Social          (not se… (not s… code.ma…
# 3 15482512… 2019-01-23 14:47:52 t.co   refer… Social          (not se… (not s… code.ma…
# ...

The amount of data returned is rich for the activity, the data columns are shown below (Although some will be empty for some rows if not applicable).

names(two_users$hits)
# [1] "sessionId"            "activityTime"         "source"               "medium"              
# [5] "channelGrouping"      "campaign"             "keyword"              "hostname"            
# [9] "landingPagePath"      "activityType"         "customDimension"      "pagePath"            
#[13] "pageTitle"            "screenName"           "mobileDeviceBranding" "mobileDeviceModel"   
#[17] "appName"              "ecommerce"            "goals"                "has_goal"            
#[21] "eventCategory"        "eventAction"          "eventLabel"           "eventValue"          
#[25] "eventCount"           "id" 

The data.frames returned include the ID you sent in as the $id column so you can distinguish between users.

Nested columns

The output uses nested columns for some values so you may want to get familiar with the tidyr::unnest() function when working with the data.

The nested columns are hits$customDimension, hits$ecommerce and hits$goals.

The nesting is necessary as you can have multiple of these events per hit, and expanding them in the response would make a very large data.frame to work with.

An example on how to unnest goals is shown below:

library(tidyr)
library(purrr)
library(dplyr)

a_user$hits %>% 
  filter(has_goal) %>% # filter to just hits with goals
  select(id, sessionId, activityTime, goals) %>% 
  unnest(goals) %>% # unnest the goals list column
  mutate(goalIndex = map_chr(goals, "goalIndex"), 
         goalName = map_chr(goals, "goalName"), 
         goalCompletionLocation = map_chr(goals, "goalCompletionLocation")) %>%
  select(-goals)
## A tibble: 4 x 6
#  id                  sessionId  activityTime        goalIndex goalName           goalCompletionLocation                
#  <chr>               <chr>      <dttm>              <chr>     <chr>              <chr>                                 
#1 1106980347.1461227… 1548016803 2019-01-20 21:40:53 20        Visited over 4 pa… /googleAnalyticsR/articles/setup.html 
#2 1106980347.1461227… 1546979541 2019-01-08 21:34:18 1         Time over a minut… /googleAnalyticsR/articles/ganalytics…
#3 1106980347.1461227… 1546802623 2019-01-06 20:26:59 1         Time over a minut… /googleAnalyticsR/articles/v4.html    
#4 1106980347.1461227… 1546467261 2019-01-02 23:15:50 1         Time over a minut… /googleAnalyticsR/   

To unnest custom dimensions, some example code is below:

library(tidyr) # devtools::install_github("tidyverse/tidyr")
library(purrr)
library(dplyr)

a_user$hits %>% 
  select(id, sessionId, activityTime, customDimension) %>% 
  unnest(customDimension) %>% 
  mutate(cd_index = map_chr(customDimension, "index"), 
         cd_value = map_chr(customDimension, ~ .$value %||% NA_character_)) %>%
  filter(!is.na(cd_value)) %>%
  select(-customDimension) %>%
  distinct() %>%
  pivot_wider(names_from = cd_index, values_from = cd_value, names_prefix = "customDim")

To unnest ecommerce and filter to only transactions, an example is shown below:

a_user$hits %>%
  filter(activityType == "ECOMMERCE") %>%
  select(id, sessionId, activityTime, ecommerce) %>%
  mutate(transaction = map(ecommerce, "transaction"),
         transactionRevenue = map_dbl(transaction, ~.[["transactionRevenue"]] %||% NA),
         transactionId = map_chr(transaction, ~.[["transactionId"]] %||% NA)) %>%
  filter(!is.na(transactionRevenue)) %>%
  select(-transaction, -ecommerce)

To get the traffic sources per hit, you only need the first hit per session so can compute via:

a_user$hits %>%
  filter(activityType == "PAGEVIEW") %>%
  select(id, sessionId, activityTime, 
         source, medium, 
         channelGrouping, campaign, 
         keyword, landingPagePath) %>%
  group_by(id, sessionId) %>%
  summarise_all(min)

Filtering the response

If you specify the activity_type parameter, you can filter down the response to only the events you include in a vector.

The permitted types are: c("PAGEVIEW","SCREENVIEW","GOAL","ECOMMERCE","EVENT") - include some of these to specify which you would like to see.

only_goals <- ga_clientid_activity(two_clientIds,
                                   viewId = 81416156, 
                                   date_range = c("2019-01-01","2019-02-01"),
                                   activity_types = "GOAL")

Example calling users via a custom dimension

If you are capturing Google Analytics cookie ID or a userId in a custom dimension then the below workflow shows how you can use the standard reporting API to fetch more detail on the users:

library(googleAnalyticsR)

al <- ga_account_list()

# get a viewID you know has implemented putting cookie ID in a custom dimension
viewId <- 84714057

view_row <- al[al$viewId == viewId,]

# get the custom dimensions
cus_dims <- ga_custom_vars_list(accountId = view_row$accountId,
                                webPropertyId = view_row$webPropertyId,
                                type = "customDimensions")

#In this example, user.id is in ga:dimension2

# date range of ids to query
dates <- c(Sys.Date() - 30, Sys.Date() - 1)

# download all client.ids who had a pageview in last 30 days
# change this query to a segment of users you are interested in
cids <- google_analytics(viewId, date_range = dates,
                         dimensions = "dimension2", metrics = "pageviews",
                         order = order_type("pageviews", "DESCENDING"),
                         max = 1000)

# download user activity for all the users
user_activities <- ga_clientid_activity(cids$dimension2,
                                        viewId = viewId,
                                        date_range = dates)

Sampled response

The API response may be sampled - it will send a message if this happens. If it does, follow the advice on the API documentation such as splitting up the call into smaller date ranges.

Also bear in mind each API call counts against your Analytics Reporting v4 API quota which by default is 50k per day, so you won't be able to fetch more user activity than that without increasing your API quota.



MarkEdmondson1234/googleAnalyticsR_public documentation built on Dec. 10, 2023, 2:43 a.m.