Description Usage Arguments Value Examples
Takes a .jsonl file, converts it to a tibble. Tweet and user metadata are provided in 99 columnswhich are selected for consistency and convenience.
1 2 3 | parse_jsonl(jsonl_path, tweet_colnames = tweet_cols, tweet_lang = ".",
filter_term_regex = ".", export_as_csv = FALSE,
export_as_rds = FALSE)
|
jsonl_path |
Character. The path to the .jsonl file. It's a good idea to use the output of list.files() as an input. |
tweet_colnames |
Character vector. The colnames you want in the output tibble. In the object tweetWrangleR::tweet_cols, 99 opinionated colnames provided as default columns. However, customisation is possible by providing a user defined column names vectors. |
tweet_lang |
Character. Accepts regex. Not case sensitive. Filter tweets from selected languages. Default returns tweets from all languages. For English language tweets only, use "en". For multiple languages use regex or (e.g. "en|es|ar"). Users can use ISO 639-1 alpha-2 ('en'), ISO 639-3 alpha-3 ('msa'), or ISO 639-1 alpha-2 combined with an ISO 3166-1 alpha-2 localization ('zh-tw') formats. |
filter_term_regex |
Character. Accepts regex. Not case sensitive. Default returns all tweets. You can pass the keyword(s) you've used quotation marks while filtering data. If you want to provide multiple keywords, put them between speech marks and separate by '|' (which means 'or'). |
export_as_csv |
Logical. If **TRUE**, it will export parsed data as a .csv file within the same path as the .jsonl file. If **FALSE** (default), it will return a tibble without exporting. |
export_as_rds |
Logical. If **TRUE**, it will export parsed data as a .rds file within the same path as the .jsonl file. If **FALSE** (default), it will return a tibble without exporting. |
Always returns a tibble with the columns provided in the tweet_colnames argument. Default alwayns returns a tibble with 99 columns. If there are values that are missing in the source .jsonl file, fills NA.
1 2 3 4 5 6 7 8 9 10 11 | # parse_jsonl("user/desktop/tweets.jsonl")
# parse_jsonl("user/desktop/tweets.jsonl", filter_term_regex = "this|that", tweet_lang = "en")
# **Parallel Example**
# input_path <- "user/where/your/jsonl/files/are"
# files <- list.files (input_path, full.names=TRUE, recursive=TRUE)
# parallel::mclapply(files, safely(parse_jsonl), export_as_csv=TRUE,mc.cores=3)
# files_csv <- list.files(input_path,full.names = TRUE, recursive = TRUE,pattern = ".csv$")
# whole_data <- map_df(files_csv, .f =read_csv,
# col_types= "cccciclc?ddcccciilcccciiiccclciilccilciccclddcccccciiiccciiillccciccclddccciiccciiicccilclccicicccc")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.