crf_cbind_attributes: Enrich a data.frame by adding frequently used CRF attributes
In crfsuite: Conditional Random Fields for Labelling Sequential Data in Natural Language Processing

crf_cbind_attributes

R Documentation

Enrich a data.frame by adding frequently used CRF attributes

Description

The CRF attributes which are implemented in this function are merely the neighbouring information of a certain field. For example the previous word, the next word, the combination of the previous 2 words. This function cbinds these neighbouring attributes as columns to the provided data.frame.

By default it adds the following columns to the data.frame

the term itself (term[t])
the next term (term[t+1])
the term after that (term[t+2])
the previous term (term[t-1])
the term before the previous term (term[t-2])
as well as all combinations of these terms (bigrams/trigrams/...) where up to ngram_max number of terms are combined.

See the examples.

Usage

crf_cbind_attributes(
  data,
  terms,
  by,
  from = -2,
  to = 2,
  ngram_max = 3,
  sep = "-"
)

Arguments

`data`	a data.frame which will be coerced to a data.table (cbinding will be done by reference on the existing data.frame)
`terms`	a character vector of column names which are part of `data` for which the function will look to the preceding and following rows in order to cbind this information to the `data`
`by`	a character vector of column names which are part of `data` indicating the fields which define the sequence. Preceding/following terms will be looked for within data of `by`. Typically this will be a document identifier or sentence identifier in an NLP context.
`from`	integer, by default set to -2, indicating to look up to 2 terms before the current term
`to`	integer, by default set to 2, indicating to look up to 2 terms after the current term
`ngram_max`	integer indicating the maximum number of terms to combine (2 means bigrams, 3 trigrams, ...)
`sep`	character indicating how to combine the previous/next/current terms. Defaults to '-'.

Examples

x <- data.frame(doc_id = sort(sample.int(n = 10, size = 1000, replace = TRUE)))
x$pos <- sample(c("Art", "N", "Prep", "V", "Adv", "Adj", "Conj", 
                  "Punc", "Num", "Pron", "Int", "Misc"), 
                  size = nrow(x), replace = TRUE)
x <- crf_cbind_attributes(x, terms = "pos", by = "doc_id", 
                          from = -1, to = 1, ngram_max = 3)
head(x)


## Example on some real data
x <- ner_download_modeldata("conll2002-nl")
x <- crf_cbind_attributes(x, terms = c("token", "pos"), 
                          by = c("doc_id", "sentence_id"),
                          ngram_max = 3, sep = "|")

crfsuite documentation built on Sept. 17, 2023, 1:06 a.m.

crfsuite index

README.md Conditional Random Fields for NLP"

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

crfsuite
Conditional Random Fields for Labelling Sequential Data in Natural Language Processing

crf_cbind_attributes: Enrich a data.frame by adding frequently used CRF attributes
In crfsuite: Conditional Random Fields for Labelling Sequential Data in Natural Language Processing

Enrich a data.frame by adding frequently used CRF attributes

Description

Usage

Arguments

Examples

Related to crf_cbind_attributes in crfsuite...

R Package Documentation

Browse R Packages

We want your feedback!

crfsuite Conditional Random Fields for Labelling Sequential Data in Natural Language Processing

crf_cbind_attributes: Enrich a data.frame by adding frequently used CRF attributes In crfsuite: Conditional Random Fields for Labelling Sequential Data in Natural Language Processing

Enrich a data.frame by adding frequently used CRF attributes

Description

Usage

Arguments

Examples

Related to crf_cbind_attributes in crfsuite...

R Package Documentation

Browse R Packages

We want your feedback!

crfsuite
Conditional Random Fields for Labelling Sequential Data in Natural Language Processing

crf_cbind_attributes: Enrich a data.frame by adding frequently used CRF attributes
In crfsuite: Conditional Random Fields for Labelling Sequential Data in Natural Language Processing