R/vidente.R

#' vidente: A package for parsing and preprocessing SEER data.
#'
#' The vidente package provides two categories of important functions.
#' Functions to parse SEER data and functions to preprocess it.
#' 
#' @section Parsing data:
#'  The \link{buildSEERParser} function builds parsing instructions based on the
#'  instructions in the downloaded folder (.sas file) or in the dictionary file
#'  exported from SEER*Stat software.
#'
#'  The \link{readSEER} function reads the SEER data from ASCII text files
#'  downloaded from SEER website or exported from SEER*Stat software based on
#'  the instructions provided in the dictionary (.dic) or .sas file.
#'  
#'  The \link{listPrimarySites} function provides a list of keywords recognized
#'  recognized as primary site names in the terminology adopted by SEER so that
#'  you know what primary sites you can provide as the primary_site parameter
#'  for the readSEER function. 
#'  
#' @section Preprocessing data:
#'  The \link{plotHistNA} function plots a histogram of the proportion of NA
#'  values for every feature in the dataframe.
#'  
#'  The \link{removeFullNAFeatures} function removes features whose all values
#'  are NA (or along with some additional NA value such as "Blank(s)", as some
#'  datasets exported from SEER*Stat software).
#'  
#'  The \link{findSingleValueFeatures} function finds features with an unique
#'  value for all rows in the dataframe. This can help you find features that
#'  can be removed for they only add overhead to the analysis.
#'
#'  The \link{getNormalizedEntropy} function calculates the normalized entropy
#'  by dividing the entropy by the information length (number of unique possible
#'  values by feature). This ratio is also called metric entropy and is a measure
#'  of randomness of the information.
#'
#' @docType package
#' @name vidente
NULL
mribeirodantas/vidente documentation built on May 15, 2019, 4:47 p.m.