View source: R/genDataPreprocess.R
genDataPreprocess | R Documentation |
This function prepares the data to be used in Haplin analysis
genDataPreprocess( data.in = stop("You have to give the object to preprocess!"), map.file, design = "triad", file.out = "data_preprocessed", dir.out = ".", ncpu = 1, overwrite = NULL )
data.in |
Input data, as loaded by genDataRead or genDataLoad. |
map.file |
Filename (with path if the file is not in current directory) of the .map file holding the SNP names, if available. |
design |
The design used in the study - choose from:
. |
file.out |
The core name of the files that will contain the preprocessed data (character string); ready to load next time with genDataLoad function; default: "data_preprocessed". |
dir.out |
The directory that will contain the saved data; defaults to current working directory. |
ncpu |
The number of CPU cores to use - this speeds up the process for large datasets significantly. Default is 1 core, maximum is 1 less than the total number of cores available on a current machine (even if the number given by the user is more than that). |
overwrite |
Whether to overwrite the output files: if NULL (default), will prompt the user to give answer; set to TRUE, will automatically overwrite any existing files; and set to FALSE, will stop if the output files exist. |
A list object with three elements:
cov.data - a data.frame
with covariate data (if available in
the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters:
variables - tabulated information of the covariate data;
variables.nas - how many NA values per each column of covariate data;
alleles - all the possible alleles in each marker;
alleles.nas - how many NA values in each marker;
nrows.with.missing - how many rows contain any missing allele information;
which.rows.with.missing - vector of indices of rows with missing data (if any)
.
The .map file should contain at least two columns, where the second one contains SNP names. Any additional columns should be separated by a whitespace character, but will be ignored. The file should contain a header.
# The argument 'overwrite' is set to TRUE! # First, read the data: examples.dir <- system.file( "extdata", package = "Haplin" ) example.file <- file.path( examples.dir, "exmpl_data.ped" ) ped.data.read <- genDataRead( example.file, file.out = "exmpl_ped_data", dir.out = tempdir( check = TRUE ), format = "ped", overwrite = TRUE ) ped.data.read # Take only part of the data (if needed) ped.data.part <- genDataGetPart( ped.data.read, design = "triad", markers = 10:12, dir.out = tempdir( check = TRUE ), file.out = "exmpl_ped_data_part", overwrite = TRUE ) # Preprocess as "triad" data: ped.data.preproc <- genDataPreprocess( ped.data.part, design = "triad", dir.out = tempdir( check = TRUE ), file.out = "exmpl_data_preproc", overwrite = TRUE ) ped.data.preproc
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.