remove_facebook_duplicates: Clean Facebook data duplicates

Description Usage Arguments

View source: R/misc.R

Description

Somehow duplicates ended up in the data, where the same post is stored with two different message IDs. Here, only messages where the sender (from_id), message text (message), time of the posting (created_time), and message type (type) are distinct are kept. You can provide either a directory or a file.

Usage

1
2
3
4
5
6
remove_facebook_duplicates(
  dir,
  file,
  sort = c("created_time", "scrape_time", "id"),
  sort_direction = c("desc", "desc", "asc")
)

Arguments

dir

A path to a directory containing Facebook data files.

file

A file path to one Facebook data file (as rds file).

sort

Data is sorted by these variable(s). Defaults to c("created_time", "scrape_time", "id") to sort data by these variables. The sort is applied before duplicates are removed. Therefore by default newer data (by scrape_time) is kept.

sort_direction

Sort parameters are applied in this directions. Should be length 1 (all parameters are sorted this way) or the same length as sort. Possible values are "desc" for descending and "asc" or "" for ascending. Defaults to c("desc", "desc", "asc"). Thus, by default, created_time is sorted descendingly, posts with the same created_time are sorted descendingly by scrape_time and then ascendingly by message id.


jogrue/RfacebookHelperFunctions documentation built on May 7, 2020, 2:03 p.m.