clean_news: Clean retrieved news articles
In MikeJohnPage/newsrivr: Newsriver API R Client

clean_news wrangles the messy data fetched by get_news, returning a tidy tibble with sensible defaults.

1
2
3

clean_news(data, min_nchar = 300, as_date = TRUE, drop_vars = TRUE,
  to_lower = TRUE, distinct = TRUE, drop_na = FALSE,
  tif_corpus = FALSE)

`data`	Tbl, returned from `get_news`.
`min_nchar`	Integer, specifying the minimum number of characters of articles to be kept in the corpus.
`as_date`	Logical, indicating whether dates should be transformed to class "Date".
`drop_vars`	Logical, indicating whether all variables (other than `title, text, discoverDate,` & `website.domainName`) should be dropped. The Newsriver API (typically) returns 26 variables, many of which contain sparse metadata.
`to_lower`	Logical, indicating whether the `title` and `text` variables should be transformed to lowercase.
`distinct`	Logical, indicating whether only articles with either distinct `title` or `text` values should be kept.
`drop_na`	Logical, indicating whether to drop rows containing missing values.
`tif_corpus`	Logical, indicating whether the tibble should be a TIF valid corpus.

## Not run: 
clean_news(data = my_tbl)

clean_news(my_tbl, min_nchar = 500, tif_corpus = TRUE)

## End(Not run)

MikeJohnPage/newsrivr documentation built on Jan. 4, 2021, 7:48 p.m.

MikeJohnPage/newsrivr index

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Description