parse_jsonl: Parse .jsonl files into a tibble

Description Usage Arguments Value Examples

Description

Takes a .jsonl file, converts it to a tibble. Tweet and user metadata are provided in 99 columnswhich are selected for consistency and convenience.

Usage

1
2
3
parse_jsonl(jsonl_path, tweet_colnames = tweet_cols, tweet_lang = ".",
  filter_term_regex = ".", export_as_csv = FALSE,
  export_as_rds = FALSE)

Arguments

jsonl_path

Character. The path to the .jsonl file. It's a good idea to use the output of list.files() as an input.

tweet_colnames

Character vector. The colnames you want in the output tibble. In the object tweetWrangleR::tweet_cols, 99 opinionated colnames provided as default columns. However, customisation is possible by providing a user defined column names vectors.

tweet_lang

Character. Accepts regex. Not case sensitive. Filter tweets from selected languages. Default returns tweets from all languages. For English language tweets only, use "en". For multiple languages use regex or (e.g. "en|es|ar"). Users can use ISO 639-1 alpha-2 ('en'), ISO 639-3 alpha-3 ('msa'), or ISO 639-1 alpha-2 combined with an ISO 3166-1 alpha-2 localization ('zh-tw') formats.

filter_term_regex

Character. Accepts regex. Not case sensitive. Default returns all tweets. You can pass the keyword(s) you've used quotation marks while filtering data. If you want to provide multiple keywords, put them between speech marks and separate by '|' (which means 'or').

export_as_csv

Logical. If **TRUE**, it will export parsed data as a .csv file within the same path as the .jsonl file. If **FALSE** (default), it will return a tibble without exporting.

export_as_rds

Logical. If **TRUE**, it will export parsed data as a .rds file within the same path as the .jsonl file. If **FALSE** (default), it will return a tibble without exporting.

Value

Always returns a tibble with the columns provided in the tweet_colnames argument. Default alwayns returns a tibble with 99 columns. If there are values that are missing in the source .jsonl file, fills NA.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# parse_jsonl("user/desktop/tweets.jsonl")

# parse_jsonl("user/desktop/tweets.jsonl", filter_term_regex = "this|that", tweet_lang = "en")

# **Parallel Example**
# input_path <- "user/where/your/jsonl/files/are"
# files <- list.files (input_path, full.names=TRUE, recursive=TRUE)
# parallel::mclapply(files, safely(parse_jsonl), export_as_csv=TRUE,mc.cores=3)
# files_csv <- list.files(input_path,full.names = TRUE, recursive = TRUE,pattern = ".csv$")
# whole_data <- map_df(files_csv, .f =read_csv,
# col_types= "cccciclc?ddcccciilcccciiiccclciilccilciccclddcccccciiiccciiillccciccclddccciiccciiicccilclccicicccc")

sefabey/tweetWrangleR documentation built on May 4, 2019, 4:17 a.m.