predict.crf: Predict the label sequence based on the Conditional Random...

View source: R/modelling.R

predict.crfR Documentation

Predict the label sequence based on the Conditional Random Field

Description

Predict the label sequence based on the Conditional Random Field

Usage

## S3 method for class 'crf'
predict(
  object,
  newdata,
  embeddings,
  group,
  type = c("marginal", "sequence"),
  trace = FALSE,
  ...
)

Arguments

object

an object of class crf as returned by crf

newdata

a character matrix of data containing attributes about the label sequence y or an object which can be coerced to a character matrix. This data should be provided in the same format as was used for training the model

embeddings

a matrix with the same number of rows as x and in the same order with numeric information used to predict

group

an integer or character vector of the same length as nrow newdata indicating the group the sequence y belongs to (e.g. a document or sentence identifier)

type

either 'marginal' or 'sequence' to get predictions at the level of newdata or a the level of the sequence group. Defaults to 'marginal'

trace

a logical indicating to show the trace of the labelling output. Defaults to FALSE.

...

not used

Value

If type is 'marginal': a data.frame with columns label and marginal containing the viterbi decoded predicted label and marginal probability.
If type is 'sequence': a data.frame with columns group and probability containing for each sequence group the probability of the sequence.

See Also

crf

Examples



library(udpipe)
data(airbnb_chunks, package = "crfsuite")
udmodel <- udpipe_download_model("dutch-lassysmall")
udmodel <- udpipe_load_model(udmodel$file_model)
airbnb_tokens <- unique(airbnb_chunks[, c("doc_id", "text")])
airbnb_tokens <- udpipe_annotate(udmodel, 
                                 x = airbnb_tokens$text, 
                                 doc_id = airbnb_tokens$doc_id)
airbnb_tokens <- as.data.frame(airbnb_tokens)
x <- merge(airbnb_chunks, airbnb_tokens)
x <- crf_cbind_attributes(x, terms = c("upos", "lemma"), by = "doc_id")
model <- crf(y = x$chunk_entity, 
             x = x[, grep("upos|lemma", colnames(x))], 
             group = x$doc_id, 
             method = "lbfgs", options = list(max_iterations = 5)) 
scores <- predict(model, 
                  newdata = x[, grep("upos|lemma", colnames(x))], 
                  group = x$doc_id, type = "marginal")
head(scores)
scores <- predict(model, 
                  newdata = x[, grep("upos|lemma", colnames(x))], 
                  group = x$doc_id, type = "sequence")
head(scores)


## cleanup for CRAN
file.remove(model$file_model)
file.remove("modeldetails.txt")
file.remove(udmodel$file)



crfsuite documentation built on Sept. 17, 2023, 1:06 a.m.