sentSplit: Sentence Splitting

Description Usage Arguments Value Author(s) See Also Examples

Description

sentSplit - Splits turns of talk into individual sentences (provided proper punctuation is used). This procedure is usually done as part of the data read in and cleaning process.

sentCombine - Combines sentences by the same grouping variable together.

TOT - Convert the tot column from sentSplit to turn of talk index (no sub sentence). Generally, for internal use.

Usage

1
2
3
4
5
6
7
8
9
  sentSplit(dataframe, text.var,
    endmarks = c("?", ".", "!", "|"),
    incomplete.sub = TRUE, rm.bracket = TRUE,
    stem.col = FALSE, text.place = "right", ...)

  sentCombine(text.var, grouping.var = NULL,
    as.list = FALSE)

  TOT(tot)

Arguments

dataframe

A dataframe that contains the person and text variable.

text.var

The text variable.

endmarks

A character vector of endmarks to split turns of talk into sentences.

incomplete.sub

logical. If TRUE detects incomplete sentences and replaces with "|".

rm.bracket

logical. If TRUE removes brackets from the text.

stem.col

logical. If TRUE stems the text as a new column.

text.place

A character string giving placement location of the text column. This must be one of the strings "original", "right" or "left".

...

Additional options passed to stem2df.

grouping.var

The grouping variables. Default NULL generates one output for all text. Also takes a single grouping variable or a list of 1 or more grouping variables.

tot

A tot column from a sentSplit output.

as.list

logical. If TRUE returns the output as a list. If false the output is returned as a dataframe.

Value

sentSplit - returns a dataframe with turn of talk broken apart into sentences. Optionally a stemmed version of the text variable may be returned as well.

sentCombine - returns a list of vectors with the continuous sentences by grouping.var pasted together. returned as well.

TOT - returns a numeric vector of the turns of talk without sentence sub indexing (e.g. 3.2 become 3).

Author(s)

Dason Kurkiewicz and Tyler Rinker <tyler.rinker@gmail.com>.

See Also

bracketX, incomplete.replace, stem2df , TOT

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#sentSplit EXAMPLE:
sentSplit(DATA, "state")
sentSplit(DATA, "state", stem.col = TRUE)
sentSplit(DATA, "state", text.place = "left")
sentSplit(DATA, "state", text.place = "original")
sentSplit(raj, "dialogue")[1:20, ]

#sentCombine EXAMPLE:
dat <- sentSplit(DATA, "state")
sentCombine(dat$state, dat$person)
truncdf(sentCombine(dat$state, dat$sex), 50)

#TOT EXAMPLE:
dat <- sentSplit(DATA, "state")
TOT(dat$tot)

trinker/qdap2 documentation built on May 31, 2019, 9:47 p.m.