predictifyR: Word Prediction Language Model Evaluation

## ---- chunk_corpus
#==============================================================================#
#                           chunkDocument                                      #
#==============================================================================#
#'  chunkDocument
#'
#' This function takes as its parameters, a document in unlisted sentence or
#' word tokenized format and the chunk size in terms of numbers of tokens,
#' and returns the document in a list of chunks of equal size.
#'
#' @param document - the document to be sampled
#' @param chunkSize - the number of sentences per chunk
#' @return chunks - list of chunks of equal number of sentences
#' @author John James
#' @export
chunkDocument <- function(document, chunkSize) {

  # Validate inputs
  docLength <-length(document)
  stopifnot(chunkSize <= docLength)

  # Set num chunks
  numChunks <- floor(length(document) / chunkSize)

  # Break file into chunks
  chunks <- list()
  start <- 1
  end <- min(length(document), chunkSize)

  for (i in 1:numChunks) {
    start <- chunkSize * (i - 1) + 1
    end <- start + chunkSize - 1
    chunks[[i]] <- document[start:end]
  }

  # Last Chunk
  if (end < length(document)) {
    start <- end + 1
    chunks[[i+1]] <- document[start:length(document)]
  }
  return(chunks)
}
## ---- end

j2scode/predictifyR documentation built on May 14, 2019, 10:34 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

j2scode/predictifyR
Word Prediction Language Model Evaluation

R/U07.chunkDocument.R
In j2scode/predictifyR: Word Prediction Language Model Evaluation

R Package Documentation

Browse R Packages

We want your feedback!

j2scode/predictifyR Word Prediction Language Model Evaluation

R/U07.chunkDocument.R In j2scode/predictifyR: Word Prediction Language Model Evaluation

R Package Documentation

Browse R Packages

We want your feedback!

j2scode/predictifyR
Word Prediction Language Model Evaluation

R/U07.chunkDocument.R
In j2scode/predictifyR: Word Prediction Language Model Evaluation