preprocess_reviews: Preprocess review text for topic modeling
In Goodreader: Scrape and Analyze 'Goodreads' Book Data

preprocess_reviews

R Documentation

Preprocess review text for topic modeling

Description

This function preprocesses the review text by optionally filtering non-English reviews, removing punctuation, converting to lowercase, removing stopwords, and stemming.

Usage

preprocess_reviews(reviews, english_only = TRUE)

Arguments

`reviews`	A data frame containing the scraped reviews
`english_only`	A logical value indicating whether to filter out non-English reviews. Default is TRUE

Value

A list containing the following elements:

corpus: The preprocessed corpus object.
dtm: The document-term matrix.
filtered_reviews: The filtered reviews data frame.

Examples


# Create a temporary file with sample book IDs
temp_file <- tempfile(fileext = ".txt")
writeLines(c("1420", "2767052", "10210"), temp_file)

# Scrape reviews
reviews <- scrape_reviews(temp_file, num_reviews = 5, use_parallel = FALSE)

# Preprocess the reviews
preprocessed <- preprocess_reviews(reviews, english_only = TRUE)

# Print the document-term matrix
print(preprocessed$dtm)

# Clean up: remove the temporary file
file.remove(temp_file)

Goodreader documentation built on Oct. 30, 2024, 9:11 a.m.