strsplit.data.frame: Obtain a tokenised data frame by splitting text alongside a...
In udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

strsplit.data.frame

R Documentation

Obtain a tokenised data frame by splitting text alongside a regular expression

Description

Obtain a tokenised data frame by splitting text alongside a regular expression. This is the inverse operation of paste.data.frame.

Usage

strsplit.data.frame(
  data,
  term,
  group,
  split = "[[:space:][:punct:][:digit:]]+",
  ...
)

Arguments

`data`	a data.frame or data.table
`term`	a character with a column name from `data` which you want to split into tokens
`group`	a string with a column name or a character vector of column names from `data` indicating identifiers of groups. The text in `term` will be split into tokens by group.
`split`	a regular expression indicating how to split the `term` column. Defaults to splitting by spaces, punctuation symbols or digits. This will be passed on to `strsplit`.
`...`	further arguments passed on to `strsplit`

Value

A tokenised data frame containing one row per token.
This data.frame has the columns from group and term where the text in column term will be split by the provided regular expression into tokens.

Examples

data(brussels_reviews, package = "udpipe")
x <- strsplit.data.frame(brussels_reviews, term = "feedback", group = "id")
head(x)
x <- strsplit.data.frame(brussels_reviews, 
                         term = c("feedback"), 
                         group = c("listing_id", "language"))
head(x)  
x <- strsplit.data.frame(brussels_reviews, term = "feedback", group = "id", 
                         split = " ", fixed = TRUE)
head(x)

udpipe documentation built on Jan. 6, 2023, 5:06 p.m.

udpipe index

README.md UDPipe Natural Language Processing - Basic Analytical Use Cases UDPipe Natural Language Processing - Model Building UDPipe Natural Language Processing - Parallel UDPipe Natural Language Processing - Text Annotation UDPipe Natural Language Processing - Topic Modelling Use Cases UDPipe Natural Language Processing - Try it out UDPipe Natural Language Processing - Universe

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

udpipe
Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

strsplit.data.frame: Obtain a tokenised data frame by splitting text alongside a...
In udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

Obtain a tokenised data frame by splitting text alongside a regular expression

Description

Usage

Arguments

Value

See Also

Examples

Related to strsplit.data.frame in udpipe...

R Package Documentation

Browse R Packages

We want your feedback!

udpipe Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

strsplit.data.frame: Obtain a tokenised data frame by splitting text alongside a... In udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

Obtain a tokenised data frame by splitting text alongside a regular expression

Description

Usage

Arguments

Value

See Also

Examples

Related to strsplit.data.frame in udpipe...

R Package Documentation

Browse R Packages

We want your feedback!

udpipe
Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

strsplit.data.frame: Obtain a tokenised data frame by splitting text alongside a...
In udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit