textprocessingDSI: Clean an arbitrarily large corpus for topic modelling over many cores

# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

#' Rcpp Doccount
#'
#' Given a file returns the number of words for each document in the file.
#' Assumes documents are delimited
#' by newlines. If the file contains only one document then set fileflag arg to 1.
#' @param ipath A string specifying the path to the input file.
#' @param fileflag An int, set to 1 if only one document per file, 0 if each document is on a newline.
#' returns An int vector of the results.
#' @export
rcpp_doccount <- function(ipath, fileflag) {
    .Call(`_textprocessingDSI_rcpp_doccount`, ipath, fileflag)
}

#' Dos to Unix line endings
#'
#' Removes the windows dos \\r from input file.
#' @param ifilename A string specifying path to input file.
#' @param ofilename A string specifying path to output file.
#' @export
dos2unix <- function(ifilename, ofilename) {
    invisible(.Call(`_textprocessingDSI_dos2unix`, ifilename, ofilename))
}

#' Rcpp Filter
#'
#' Given a vector of words and an input file, search through the file
#' and remove all instances of that word. Overwrites the input file
#' with the modified content.
#' @param words A string vector containing the list of words to be removed
#'		   from the file.
#' @param ifilename A string containing the path to the inputfile>
#' Returns the name of the file that was modified.
#' @export
rcpp_filter <- function(words, ifilename) {
    .Call(`_textprocessingDSI_rcpp_filter`, words, ifilename)
}

#' Rcpp Join
#'
#' Given an input directory merge all the files into one large file.
#' Expects each file to have multiple documents each delimited by newline.
#' If that is not the case, set the newline argument to 1 to ensure each
#' document is delimited by newlines.
#' @param idir A string specifying the path to the input directory.
#' @param ofilename A string specifying the path to the outputfile.
#' @param newline An int, if set to 0 files are just concatenated as they are
#'		   if set to 1 the files have their newlines replaced by spaces and
#'		   when they are merged together, a newline is added between them. 
#' Returns number of files that were joined.
#' @export
rcpp_join <- function(idir, ofilename, newline) {
    .Call(`_textprocessingDSI_rcpp_join`, idir, ofilename, newline)
}

#' Rcpp Split
#' 
#' This function reads in a file and splits it into smaller segments
#' either by byte size or by line count dependent on user input.
#' When finished, it deletes the original file.
#' @param fpath A string specifying the path to the input file.
#' @param odir A string specifying the path to the output.
#' @param splitter Either l or c, 'l' for lines, 'c' for kilobytes.
#' @param count Number of lines or kilobytes to split the file on based on splitter.
#' Returns number of output files.
#' @export
rcpp_split <- function(fpath, odir, splitter, count) {
    .Call(`_textprocessingDSI_rcpp_split`, fpath, odir, splitter, count)
}

#' Rcpp Summary
#'
#' Given a file returns each unique word, the number of times that word appeared,
#' and the number of documents that word appeared in. Assumes documents are delimited
#' by newlines. If the file contains only one document then set delim arg to 1.
#' @param ipath A string specifying the path to the input file.
#' @param fileflag An int, set to 1 if only one document per file, 0 if each document is on a newline.
#' returns A string vector of the results.
#' @export
rcpp_summary <- function(ipath, fileflag) {
    .Call(`_textprocessingDSI_rcpp_summary`, ipath, fileflag)
}

avkoehl/textprocessingDSI documentation built on June 5, 2019, 7:41 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

avkoehl/textprocessingDSI
Clean an arbitrarily large corpus for topic modelling over many cores

R/RcppExports.R
In avkoehl/textprocessingDSI: Clean an arbitrarily large corpus for topic modelling over many cores

R Package Documentation

Browse R Packages

We want your feedback!

avkoehl/textprocessingDSI Clean an arbitrarily large corpus for topic modelling over many cores

R/RcppExports.R In avkoehl/textprocessingDSI: Clean an arbitrarily large corpus for topic modelling over many cores

R Package Documentation

Browse R Packages

We want your feedback!

avkoehl/textprocessingDSI
Clean an arbitrarily large corpus for topic modelling over many cores

R/RcppExports.R
In avkoehl/textprocessingDSI: Clean an arbitrarily large corpus for topic modelling over many cores