compute_tweet_features: Compute tweet features

View source: R/create_tweet_features.R

compute_tweet_featuresR Documentation

Compute tweet features

Description

Internal helper to create_tweet_features

Usage

compute_tweet_features(
  x,
  .as.data.table = TRUE,
  .mentions.regexp = "(?<=^|\\W)(@\\w{1,15})(?=\\s|$|\\W)",
  .hashtags.regexp = "(?<=\\.|^|\\s)(#\\w{1,139})(?=\\s|$|\\W)",
 
    .url.regexp = "\\b(([a-z][\\w-]+:(/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)([^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:\\'\\\".,<>?«»“”‘’]))"
)

Arguments

x

a data.frame data.table, or tibble recording tweets. For required column (naming and typing conventions) refer to ?required.tweets.df.cols. For an example see ?tweets.df.prototype.

.as.data.table

logical. Whether or not to return a data.table. Defaults to TRUE. If FALSE, the returned object is a tibble.

.mentions.regexp

unit-length character vector, specifying the regular expression pattern used to match, count, and remove mentions in the tweet text (applied to column text).

.hashtags.regexp

unit-length character vector, specifying the regular expression pattern used to match, count, and remove hashtags in the tweet text (applied to column text).

.url.regexp

unit-length character vector, specifying the regular expression pattern used to match, count, and remove URLs in the tweet text (applied to column text).

Value

A data.table if .as.data.table = TRUE (default), otherwise a tibble. The return object contains all columns contained in x plus the created tweet features.


haukelicht/politicaltweets documentation built on July 3, 2023, 4:11 a.m.