compute_tweet_features: Compute tweet features
In haukelicht/politicaltweets: Classify political tweets

compute_tweet_features

R Documentation

Compute tweet features

Description

Internal helper to create_tweet_features

Usage

compute_tweet_features(
  x,
  .as.data.table = TRUE,
  .mentions.regexp = "(?<=^|\\W)(@\\w{1,15})(?=\\s|$|\\W)",
  .hashtags.regexp = "(?<=\\.|^|\\s)(#\\w{1,139})(?=\\s|$|\\W)",
 
    .url.regexp = "\\b(([a-z][\\w-]+:(/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)([^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:\\'\\\".,<>?«»“”‘’]))"
)

Arguments

`x`	a `data.frame` `data.table`, or `tibble` recording tweets. For required column (naming and typing conventions) refer to `?required.tweets.df.cols`. For an example see `?tweets.df.prototype`.
`.as.data.table`	logical. Whether or not to return a `data.table`. Defaults to `TRUE`. If `FALSE`, the returned object is a `tibble`.
`.mentions.regexp`	unit-length character vector, specifying the regular expression pattern used to match, count, and remove mentions in the tweet text (applied to column `text`).
`.hashtags.regexp`	unit-length character vector, specifying the regular expression pattern used to match, count, and remove hashtags in the tweet text (applied to column `text`).
`.url.regexp`	unit-length character vector, specifying the regular expression pattern used to match, count, and remove URLs in the tweet text (applied to column `text`).

Value

A data.table if .as.data.table = TRUE (default), otherwise a tibble. The return object contains all columns contained in x plus the created tweet features.

haukelicht/politicaltweets documentation built on July 3, 2023, 4:11 a.m.