View source: R/genDataGetPart.R
genDataGetPart | R Documentation |
This function enables to extract (and save for later use) part of genetic data read in with genDataRead.
genDataGetPart(
data.in = stop("No data given!", call. = FALSE),
design = stop("Design type must be given!"),
markers,
indiv.ids,
rows,
cc,
sex,
file.out = "my_data_part",
dir.out = ".",
overwrite = NULL,
...
)
data.in |
The data object (in format as the output of genDataRead). |
design |
The design used in the study - choose from:
. Any of the following can be given to narrow down the dataset: |
markers |
Vector with numbers or names indicating which markers to choose. |
indiv.ids |
Character vector giving IDs of individuals. CAUTION: in a standard PED file, individual IDs are not unique, so this will select all individuals with given IDs. |
rows |
Numeric vector giving the positions - this will select only these rows. |
cc |
One or more values to choose based on case-control status ('cc' column). |
sex |
One or more values to choose based on the 'sex' column. |
file.out |
The base for the output filename (default: "my_data_part"). |
dir.out |
The path to the directory where the output files will be saved. |
overwrite |
Whether to overwrite the output files: if NULL (default), will prompt the user to give answer; set to TRUE, will automatically overwrite any existing files; and set to FALSE, will stop if the output files exist. |
... |
If any additional covariate data are available in |
The genetic data from GWAS studies can be quite large, and thus the analysis is time-consuming. If a user knows where they want to focus the analysis, they can use this function to extract part of the entire dataset and use only this part in subsequent Haplin analysis.
A list object with three elements:
cov.data - a data.frame
with covariate data (if available in
the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters.
This now contains only the selected subset of data.
No checks are performed when choosing a subset of the data - it is the user's
obligation to check whether the data subset contains correct number of individuals
(especially important when using the triad
design study) and/or markers!
# The argument 'overwrite' is set to TRUE!
# Read the data:
examples.dir <- system.file( "extdata", package = "Haplin" )
example.file <- file.path( examples.dir, "HAPLIN.trialdata2.txt" )
my.gen.data.read <- genDataRead( file.in = example.file, file.out = "trial_data",
dir.out = tempdir( check = TRUE ), format = "haplin", allele.sep = "", n.vars = 2,
cov.header = c( "smoking", "sex" ), overwrite = TRUE )
my.gen.data.read
# Extract part with only men:
men.subset <- genDataGetPart( my.gen.data.read, design = "triad", sex = 1,
dir.out = tempdir( check = TRUE ), file.out = "gen_data_men_only", overwrite = TRUE )
men.subset
# Extract the part with only smoking women:
women.smoke.subset <- genDataGetPart( my.gen.data.read, design = "triad",
dir.out = tempdir( check = TRUE ), sex = 0, smoking = c( 1,2 ), overwrite = TRUE )
women.smoke.subset
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.