View source: R/genDataGetPart.R
genDataGetPart | R Documentation |
This function enables to extract (and save for later use) part of genetic data read in with genDataRead.
genDataGetPart( data.in = stop("No data given!", call. = FALSE), design = stop("Design type must be given!"), markers, indiv.ids, rows, cc, sex, file.out = "my_data_part", dir.out = ".", overwrite = NULL, ... )
data.in |
The data object (in format as the output of genDataRead). |
design |
The design used in the study - choose from:
. Any of the following can be given to narrow down the dataset: |
markers |
Numeric vector with numbers indicating which markers to choose. |
indiv.ids |
Character vector giving IDs of individuals. CAUTION: in a standard PED file, individual IDs are not unique, so this will select all individuals with given IDs. |
rows |
Numeric vector giving the positions - this will select only these rows. |
cc |
One or more values to choose based on case-control status ('cc' column). |
sex |
One or more values to choose based on the 'sex' column. |
file.out |
The base for the output filename (default: "my_data_part"). |
dir.out |
The path to the directory where the output files will be saved. |
overwrite |
Whether to overwrite the output files: if NULL (default), will prompt the user to give answer; set to TRUE, will automatically overwrite any existing files; and set to FALSE, will stop if the output files exist. |
... |
If any additional covariate data are available in |
The genetic data from GWAS studies can be quite large, and thus the analysis is time-consuming. If a user knows where they want to focus the analysis, they can use this function to extract part of the entire dataset and use only this part in subsequent Haplin analysis.
A list object with three elements:
cov.data - a data.frame
with covariate data (if available in
the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters.
This now contains only the selected subset of data.
No checks are performed when choosing a subset of the data - it is the user's
obligation to check whether the data subset contains correct number of individuals
(especially important when using the triad
design study) and/or markers!
# The argument 'overwrite' is set to TRUE! # Read the data: examples.dir <- system.file( "extdata", package = "Haplin" ) example.file <- file.path( examples.dir, "HAPLIN.trialdata2.txt" ) my.gen.data.read <- genDataRead( file.in = example.file, file.out = "trial_data", dir.out = tempdir( check = TRUE ), format = "haplin", allele.sep = "", n.vars = 2, cov.header = c( "smoking", "sex" ), overwrite = TRUE ) my.gen.data.read # Extract part with only men: men.subset <- genDataGetPart( my.gen.data.read, design = "triad", sex = 1, dir.out = tempdir( check = TRUE ), file.out = "gen_data_men_only", overwrite = TRUE ) men.subset # Extract the part with only smoking women: women.smoke.subset <- genDataGetPart( my.gen.data.read, design = "triad", dir.out = tempdir( check = TRUE ), sex = 0, smoking = c( 1,2 ), overwrite = TRUE ) women.smoke.subset
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.