Getting Started

This vignette goes through the basics of getting started with adobeanalyticsr.

Basic Setup

The basic setup is documented on the home page of adobeanalyticsr.com, which reviews the details for:

Understanding AW_COMPANY_ID and AW_REPORTSUITE_ID

Many adobeanalyticsr functions require a company ID (the Adobe account being accessed) and a report suite ID (the specific report suite to pull data or information from) to run. As such, these are arguments (company_id and rsid) within those functions that default to the values for the AW_COMPANY_ID and AW_REPORTSUITE_ID environment variables.

This approach has some important ramifications:

In this vignette, both AW_COMPANY_ID and AW_REPORTSUITE_ID environment variables have been set up (but not shown).

Load the Package and Authenticate

Once the package is loaded, authenticate with aw_token(). If you have not authenticated ever, or if you have not authenticated within the last 24 hours, then a browser should open requiring you to log in to your Adobe account, and it will then redirect you to a web page that displays a lengthy token that you can copy and paste back into an Enter your authorization code: prompt in the R console.

library(dplyr)            # Light cleanup of output
library(knitr)            # Cleaner data tables
library(adobeanalyticsr)
aw_token()

The token will look something like this (with different letters and numbers and characters):

eyJ4NXUiOiJpbXNfbmExLWtleS0xLmNlciIsImFsZyI6IlJTMjU2In0.eyJpZCI6IjE2MTI3MTEyMjM5MzNfNGE2ZmVlM2-tYzdkOS00M2QyLWI5ODUtYzAzMDk2NDA2MGFjX3VlMSIsImNsaWVudF9pZCI6IjA2MTU0ZDgzMjl-OTQxZjViMGUzM2EwZGM2M2U3NmQxIiwidXNlcl9pZCI6IkY2MUJBOTU0NUE3QjMyM0QwQTQ5NUQ2Q0BBZG9iZUlEI-wic3RhdGUiOiIiLCJKeXBlIjoiYXV0aG9yaXphdGlvbl9jb2RlIiwiYXMiOiJpbXMtbmExIiwiZmciO-JWRlFSUkhCTVhMTzU2SDRDRTZSTFJZZUE0TT09PT09PSIsInNpZCI6IjE2MTI3MTEyMjM5MzNfMWE2ZDc4ZTUtNT-zZi00OWFkLWFmZTktYTk0MTJkZmYwMzA0X3VlMSIsIm90byI3InRydWUiLCJleHBpcmVzX2luIjoiMT-wMDAwMCIsImNyZWF0ZWRfYXQiOiIxNjEyNzExMjIzOTMzIiwic2NvcGUiOiJvcGVuaWQsQWRvYmVJRCxyZWFkX-9yZ2FuaXphdGlvbnMsYWRkaXRpb25hbF9pbmZvLnByb2plY3RlZFByb2R1Y3RDb250ZXh0LGFkZGl0aW9uY-xfaW5mby5qb2JfZnVuY3Rpb24ifQ.EvoihD7IPXkew_yFQuzjZGfiU_Q8-vlUkmytwfB_Y46DKePINPn7URq8bit0dXoO-tUdMUKVNOHEBbqww6ydDGBPHSCbOH1GgNsILoN96tjHTtA-jDxN8jrAnQ0PuCssA-GePnqryxCxOH9WyZIl2fog00ib8iZ3ZFAJLDrvthWwWHUw-ivu-K-F3UAtU7A4ma_7pe07D1rW5MhTcZOSr0pri68bjFA86cJqH5DHyMdp4F2d7QgZcYPMdrvVfXTwWXv9s5r6huvDcqv6nny-WOKZbKmoP6zwMzn5xa343wrQ5oXFbRxYem-tC154_dc7ekrC8YUX0pY5up9a-OUy0w

Confirm that the authorization worked by running get_me(), which will return two things if the authorization was successful:

  1. A message: Your data is now available!
  2. A data frame showing the company ID (globalCompanyId) and company name (companyName) for all of the companies (accounts) to which you have access based on the login you used when you called aw_token().

Get Available Dimensions, Metrics, and Segments

Because the Adobe Analytics API works with the variable and segment IDs rather than the plain English names of dimensions, metrics, and segments, it is often useful, at least in the initial development of a project, to create data frames that contain all of the available values for these three types of variables.

Get Available Dimensions

dims_df <- aw_get_dimensions()
#> Auto-refreshing stale OAuth token.
#> Warning: Unable to refresh token: unauthorized_client
#> Request failed [401]. Retrying in 1 seconds...
#> Auto-refreshing stale OAuth token.
#> Warning: Unable to refresh token: unauthorized_client
#> Request failed [401]. Retrying in 1 seconds...
#> Auto-refreshing stale OAuth token.
#> Warning: Unable to refresh token: unauthorized_client
#> Error in aw_call_api(req_path = urlstructure, debug = debug, company_id = company_id): Unauthorized (HTTP 401).
head(dims_df, 10) %>% select(id, name, type, category) %>% kable()

|id |name |type |category | |:---------------------|:-----------------------------|:------------|:---------------| |averagepagetime |Time Spent on Page - Bucketed |ordered-enum |Metrics | |browser |Browser |string |Audience | |browserheight |Browser Height - Granular |int |Audience | |browserheightbucketed |Browser Height - Bucketed |ordered-enum |Audience | |browsertype |Browser Type |enum |Audience | |browserwidth |Browser Width - Granular |int |Audience | |browserwidthbucketed |Browser Width - Bucketed |ordered-enum |Audience | |campaign |Tracking Code |string |Traffic Sources | |campaign.1 |Campaign Source |string |Traffic Sources | |campaign.2 |Campaign Medium |string |Traffic Sources |

This data frame includes:

The data frame can then be searched and filtered for specific dimensions for use in subsequent data calls.

Get Available Metrics

metrics_df <- aw_get_metrics()
head(metrics_df, 10) %>% select(id, name, type, category) %>% kable()

|id |name |type |category | |:----------------------|:------------------------------------|:-------|:---------------| |averagepagedepth |Average Page Depth |int |Traffic | |averagetimespentonpage |Average Time Spent on Page (seconds) |int |Traffic | |averagetimespentonsite |Average Time Spent on Site (seconds) |int |Traffic | |averagevisitdepth |Average Visit Depth |int |Traffic | |bouncerate |Bounce Rate |percent |Traffic | |bounces |Bounces |int |Traffic | |campaigninstances |Campaign Click-throughs |int |Traffic Sources | |cartadditions |Cart Additions |int |Conversion | |cartremovals |Cart Removals |int |Conversion | |carts |Carts |int |Conversion |

This data frame includes:

aw_get_metrics() does not return calculated metrics, so those require a separate function call.

calc_metrics_df <- aw_get_calculatedmetrics()
head(calc_metrics_df, 10) %>% select(id, name, type) %>% kable()

|id |name |type | |:------------------------------------|:-------------------------------|:--------| |cm300003965_557fc577e4b07e827d177ad0 |Crash Rate (Mobile) |percent | |cm300003965_557fc577e4b07e827d177ad3 |Average Session Length (Mobile) |time | |cm300003965_557fc578e4b0094eea4f5201 |Single Access (Calculated) |decimal | |cm300003965_557fc578e4b0094eea4f5204 |Crash Rate (Mobile) |percent | |cm300003965_557fc578e4b0416196926008 |Average Session Length (Mobile) |time | |cm300003965_557fc656e4b0adadba08f337 |Avg. Order Value |currency | |cm300003965_557fc656e4b0094eea4f58b6 |Average Order Value |decimal | |cm300003965_557fc656e4b0094eea4f58b8 |Bounce Rate |decimal | |cm300003965_557fc790e4b0094eea4f6222 |Crash Rate (Mobile) |percent | |cm300003965_557fc790e4b0416196926fac |Average Session Length (Mobile) |time |

Calculated metric IDs start with cm and are assigned by Adobe when the calculated metric is created.

These two metrics data frames can be searched and filtered for specific metrics for use in subsequent data calls.

Get Available Segments

The default limit for the number of segments returned by the aw_get_segments() function is 10, so, depending on the number of segments you have, the limit value can be increased to return a complete list of segments.

segments_df <- aw_get_segments(limit = 1000)
names(segments_df)
#> [1] "name"        "description" "id"          "owner"       "migratedIds" "rsid"

Similar to the dimensions and metrics, aw_get_segments() returns a data frame of available segments that can be searched and filtered to identify the specific id values to use in subsequent data calls.

Freeform Table Data Pull

Just as in Analysis Workspace, the freeform table is the workhorse of adobeanalyticsr, and, as such, is the focus of the rest of this vignette.

Single Dimension, Single Metric

The following is a breakdown of visits by device category for the last 30 full days for the report suite that is specified in the AW_REPORTSUITE_ID environment variable.

# Specify a start date and end date. These can be specified as Date
# objects or as string objects in YYYY-MM-DD format.
start_date <- Sys.Date() - 30
end_date <- Sys.Date() - 1

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = "mobiledevicetype",
                        metrics = "visits")
#> A total of 3 rows have been pulled.

# Output the results as a formatted table
df %>% kable()

|mobiledevicetype | visits| |:----------------|------:| |Other | 5546| |Mobile Phone | 1509| |Tablet | 28|

Note that the top argument for aw_freeform_table() defaults to 5, so only the top 5 values will be shown unless that argument is passed a higher value.

Single Dimension, Multiple Metrics

Getting multiple metrics is simply a matter of passing a vector to the metrics argument rather than a single string:

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = "mobiledevicetype",
                        metrics = c("visits", "pageviews", "visitors"))
#> A total of 3 rows have been pulled.

# Output the results as a formatted table
df %>% kable()

|mobiledevicetype | visits| pageviews| visitors| |:----------------|------:|---------:|--------:| |Other | 5546| 10055| 3829| |Mobile Phone | 1509| 2414| 1216| |Tablet | 28| 95| 23|

The example above used standard metrics, but custom metrics (e.g., "event10") and calculated metrics (e.g., "cm300003965_557fc578e4b0094efa4f5204") can also be included in the vector of metrics as needed.

The default for the function is to return the API field names, but the "pretty names" can be returned instead by setting the prettynames argument to TRUE:

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = "mobiledevicetype",
                        metrics = c("visits", "pageviews", "visitors"),
                        prettynames = TRUE)
#> A total of 3 rows have been pulled.

# Output the results as a formatted table
df %>% kable()

|Mobile Device Type | Visits| Page Views| Unique Visitors| |:------------------|------:|----------:|---------------:| |Other | 5546| 10055| 3829| |Mobile Phone | 1509| 2414| 1216| |Tablet | 28| 95| 23|

While the pretty names are, indeed, "prettier," they can add downstream complexity to the code. And, since custom metrics and calculated metrics can have their names changed at any point, if subsequent code references columns using these pretty names, the code is subject to break in the future if the metric(s) it references gets renamed. So, it is generally considered a best practice to work with the API field names (prettynames = FALSE) and then only convert to more readable names at the point of the data being presented to the end user.

Trending Metrics Over Time

What is time? This could be a deeply philosophical question, but, instead, we'll treat it as a pragmatic one: time is a dimension. It's just a dimension that gets some special treatment in this package.

The first thing to note is that the id values for time values are all prepended with daterange.

dims_df %>% filter(grepl("^daterange.*", id)) %>% 
  select(id, name) %>% 
  kable()

|id |name | |:----------------|:-------| |daterangeday |Day | |daterangehour |Hour | |daterangeminute |Minute | |daterangemonth |Month | |daterangequarter |Quarter | |daterangeweek |Week | |daterangeyear |Year |

To get trended data for one or more metrics, simply use the appropriate date dimension as a dimension. The first "special" thing that happens here is that you don't need to worry about setting the top argument to include all of the values. The package will assume that you want to include all of the date values in the range and will do this automatically.

What will not (yet) be done automatically is for the package to return the results ordered by the date value (it will sort them by the first metric or whatever is specified in the metricSort argument), so we'll need to do that sorting with the results:

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = "daterangeday",
                        metrics = "visits") %>% 
  arrange(daterangeday)
#> A total of 30 rows have been pulled.

# Output the results as a formatted table
df %>% kable()

|daterangeday | visits| |:------------|------:| |2021-01-23 | 66| |2021-01-24 | 67| |2021-01-25 | 268| |2021-01-26 | 359| |2021-01-27 | 331| |2021-01-28 | 291| |2021-01-29 | 257| |2021-01-30 | 81| |2021-01-31 | 99| |2021-02-01 | 286| |2021-02-02 | 336| |2021-02-03 | 396| |2021-02-04 | 334| |2021-02-05 | 329| |2021-02-06 | 85| |2021-02-07 | 119| |2021-02-08 | 372| |2021-02-09 | 557| |2021-02-10 | 352| |2021-02-11 | 337| |2021-02-12 | 244| |2021-02-13 | 85| |2021-02-14 | 106| |2021-02-15 | 193| |2021-02-16 | 260| |2021-02-17 | 214| |2021-02-18 | 307| |2021-02-19 | 207| |2021-02-20 | 65| |2021-02-21 | 84|

To include a non-date dimension and specify a top value for that dimension while also returning all of the date values, use 0 in the position of the date dimension when specifying the top argument.

The following code breaks down each day by Mobile Device Type, but only includes the top 2 values for each breakdown:

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = c("daterangeday", "mobiledevicetype"),
                        metrics = "visits",
                        top = c(0, 2)) %>% 
  arrange(daterangeday)
#> Estimated runtime: 24.8sec./0.41min.
#> 1 of 31 possible data requests complete. Starting the next 30 requests.
#> Request failed [429]. Retrying in 6 seconds...
#> A total of 60 rows have been pulled.

# Output the first few rows as a formatted table
head(df, 10) %>% kable()

|daterangeday |mobiledevicetype | visits| |:------------|:----------------|------:| |2021-01-23 |Other | 38| |2021-01-23 |Mobile Phone | 28| |2021-01-24 |Other | 41| |2021-01-24 |Mobile Phone | 25| |2021-01-25 |Other | 236| |2021-01-25 |Mobile Phone | 31| |2021-01-26 |Other | 280| |2021-01-26 |Mobile Phone | 79| |2021-01-27 |Other | 266| |2021-01-27 |Mobile Phone | 62|

If the daterange... dimension is not the first dimension (and, for speed reasons, it may make sense to not have it there, as is discussed in the next section), the 0 can simply be re-located:

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = c("mobiledevicetype", "daterangeday"),
                        metrics = "visits",
                        top = c(2, 0)) %>% 
  arrange(mobiledevicetype, daterangeday)
#> Estimated runtime: 2.4sec./0.04min.
#> 1 of 3 possible data requests complete. Starting the next 2 requests.
#> A total of 60 rows have been pulled.

# Output the first few rows as a formatted table
head(df) %>% kable()

|mobiledevicetype |daterangeday | visits| |:----------------|:------------|------:| |Mobile Phone |2021-01-23 | 28| |Mobile Phone |2021-01-24 | 25| |Mobile Phone |2021-01-25 | 31| |Mobile Phone |2021-01-26 | 79| |Mobile Phone |2021-01-27 | 62| |Mobile Phone |2021-01-28 | 60|

Multiple Dimensions, Multiple Metrics

Working with multiple dimensions is much easier once you understand two fundamental aspects of the 2.0 API, which may seem contradictory at first:

The trick to reconciling these two statements is that any "single dimension" call can be filtered by an unlimited number of other dimensions.

Consider Multiple Dimensions in Analysis Workspace

A thought experiment to explain how this works is to imagine that you are in Analysis Workspace (or, actually go into Analysis Workspace and try this directly to make it a real experiment rather than an experiment of the mind):

  1. Create a blank freeform table.
  2. Using only your mouse (no keyboard keys allowed!) create a freeform table report that has two dimensions.

Each drag-and-drop with your mouse triggers a new API call. To build a freeform table that has Marketing Channel broken down by Mobile Device Type might look something like the following:

  1. [Click and drag] Marketing Channel onto the freeform table. [API Call #1]
  2. [Click and drag] Mobile Device Type onto the first marketing channel in the list: Direct (for instance). [API Call #2]
  3. [Click and drag] Mobile Device Type onto the second marketing channel in the list: Paid Search. [API Call #3]
  4. [Click and drag] Mobile Device Type onto the third marketing channel the list: Natural Search. [API Call #4]
  5. [Click and drag] Mobile Device Type onto the third marketing channel the list: Display. [API Call #5]

At this point, you're cursing the constraint of not being able to use the <Shift> key! But, each of those steps is actually an API call that Analysis Worskpace is making to the 2.0 API:

  1. API Call #1 is a simple, single-dimension call that returns a table listing all of the Marketing Channel values.
  2. API Call #2 is also a single-dimension API call, but the single dimension being queried is now Mobile Device type. But, that query has a constraint on it, in that it is filtered to only include results where Marketing Channel = Direct (the first marketing channel in the list; that's the value we dragged and dropped Mobile Device Type on).
  3. API Call #3 is similar to #2, except the results are filtered for Marketing Channel = Page Search).
  4. And so on...

These API calls all happen relatively quickly (wildly faster than API calls using the older v1.4 API), but they still all have to happen, and they happen one after another (serially rather than in parallel).

To push this experiment one step farther, think about what you would have to do if you wanted to drill down to a third dimension: Entry Page. You would have to repeatedly drag the Entry Page dimension onto each of:

For the first call above, the API call for the single dimension Entry Page filtered to only include results where Marketing Channel = Direct AND Mobile Device Type = Mobile Phone.

To build a freeform table in Analysis Workspace that has three dimensions fully broken down would require dozens of drag-and-drop actions! It's possible that that tedium is one of the reasons that you're looking to adobeanalyticsr in the first place. As well you should! aw_freeform_table() handles these multiple API calls for you!

Multiple Dimensions with aw_freeform_table()

On the one hand, all of the exposition above may seem like overkill, because all you have to do with aw_freeform_table() to pull multiple dimensions is to...pass a vector of dimensions to the dimensions argument!

Where things get a little trickier is when it comes to specifying how many rows to include for each dimension level, and, more importantly, for constructing the function calls so that they run as quickly as possible. To illustrate, we'll explore a scenario where we want to get Visits broken down by Mobile Device Type and Browser.

First, let's query each of the dimensions independently to see how many unique dimension values we're dealing with. This isn't something that is required, but we're going to do a little math to illustrate the differences that dimension order can make.

device_types <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = "mobiledevicetype",
                        metrics = "visits",
                        # The default of 5 is probably going to get all of them,
                        # but set a higher cutoff just in case.
                        top = 10)
#> A total of 3 rows have been pulled.

browsers <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = "browser",
                        metrics = "visits",
                        # We want to get all of the browsers, so set top as a high 
                        # value rather than the default of "5"
                        top = 1000)
#> A total of 113 rows have been pulled.

When making our actual call to get both dimensions at once, we have two options for the dimensions argument value:

Assuming we set top appropriately to include all possible values, we should get the same data, ultimately. But, the number of API calls required behind the scenes and, therefore, the time it will take for the function to run, will vary quite a bit between these two!

For dimensions = c("mobiledevicetype", "browser"), the API calls will be:

  1. One call to get the full list of mobiledevicetype values.
  2. One query for each of the 3 mobiledevicetype values to get each value broken down by browser.

This will result in a total of 4 API calls.

Let's try it out, including logging how long the function takes to run.

start_time <- Sys.time()

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = c("mobiledevicetype", "browser"),
                        metrics = "visits",
                        top = c(10, 1000))
#> Estimated runtime: 8.8sec./0.15min.
#> 1 of 11 possible data requests complete. Starting the next 3 requests.
#> A total of 125 rows have been pulled.

end_time <- Sys.time()

# Show the summary for the resulting df for comparison to the next approach.
summary(df)
#>  mobiledevicetype     browser              visits       
#>  Length:125         Length:125         Min.   :   1.00  
#>  Class :character   Class :character   1st Qu.:   1.00  
#>  Mode  :character   Mode  :character   Median :   3.00  
#>                                        Mean   :  56.66  
#>                                        3rd Qu.:  11.00  
#>                                        Max.   :3488.00

# Output how long it took to run the query
end_time - start_time
#> Time difference of 4.994079 secs

Now, instead, let's consider what happens if we swap the order of our dimensions and, instead, use dimensions = c("browser", "mobiledevicetype"). Now, the API calls will be:

  1. One call to get the full list of browser values.
  2. One query for each of the 113 browser values to get each value broken down by mobiledevicetype.

This will result in a total of 114 API calls!!!

Let's try it out:

start_time <- Sys.time()

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = c("browser", "mobiledevicetype"),
                        metrics = "visits",
                        top = c(1000, 10))
#> Estimated runtime: 800.8sec./13.35min.
#> 1 of 1001 possible data requests complete. Starting the next 113 requests.
#> A total of 125 rows have been pulled.

end_time <- Sys.time()

# Show the summary for the resulting df for comparison to the next approach.
summary(df)
#>    browser          mobiledevicetype       visits       
#>  Length:125         Length:125         Min.   :   1.00  
#>  Class :character   Class :character   1st Qu.:   1.00  
#>  Mode  :character   Mode  :character   Median :   3.00  
#>                                        Mean   :  56.66  
#>                                        3rd Qu.:  11.00  
#>                                        Max.   :3488.00

# Output how long it took to run the query
end_time - start_time
#> Time difference of 1.615242 mins

This is a BIG difference in run-time even though, aside from the column order being slightly different, the resulting data is identical.

If you followed the Analysis Workspace experiment in the previous section, then another way to think about this is that (without the <Shift> key), breaking down Mobile Device Type by Browser would require a lot less clicking and dropping than breaking down Browser by Mobile Device Type. In the latter, you would have to drag Mobile Device Type onto each of the numerous Browser values one at a time!

The good (great?) news is that you can string as many dimensions together as you would like and then let R just take it from there and get a resulting data frame with multiple dimensions! The order of your dimensions just can have a dramatic effect on how long you have to wait for the results to be returned.



Try the adobeanalyticsr package in your browser

Any scripts or data that you put into this service are public.

adobeanalyticsr documentation built on Nov. 9, 2023, 5:07 p.m.