merge_data: Merge rds files from different folders

Description Usage Arguments Value

View source: R/merge_data.R

Description

This function merges rds files of the same name from two different folders. It is intended for usage with social media data collected through my update functions in this package. Data from the file in old_folder is kept, data from new_folder is added but only kept if rows with the same id do not exist already.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
merge_data(
  old_folder,
  new_folder,
  output_folder,
  id = "id",
  sort = c("created_time", "scrape_time", "id"),
  sort_direction = c("desc", "desc", "asc"),
  keep_newest = TRUE,
  ignore_scrape_time = FALSE
)

Arguments

old_folder

Folder to look for old rds files.

new_folder

Folder to look for new rds files.

output_folder

Folder to save merged rds files to.

id

Unique id(s), only data where id is distinct is kept. Defaults to "id" for Facebook data.

sort

Merged data is sorted by these variable(s). Defaults to "created_time", then "scrape_time", and then "id" for Facebook data.

sort_direction

Sort parameters are applied in this directions. Should be length 1 (all parameters are sorted this way) or the same length as sort. Possible values are "desc" for descending and "asc" or "" for ascending. Defaults to c("desc", "desc", "asc"). Thus, by default, created_time is sorted descendingly, posts with the same created_time are sorted descendingly by scrape_time and then ascendingly by message id.

keep_newest

Logical, indicating which version of a duplicate text is kept. If TRUE (default), the newest texts according to scrape date are kept if ignore_scrape_time is not TRUE. Furthermore, texts from files in the new_folder are preferred over those from old_folder. FALSE prefers older data.

ignore_scrape_time

Logical, indicating whether the scrape time should be ignored for deciding which texts to keep. Defaults to FALSE. If TRUE, only age only depends on where the file is stored (old or new folder).

Value

A data.frame with the results.


jogrue/socmedhelpeRs documentation built on April 24, 2020, 2:09 p.m.