haplinSlide: Run haplin analysis in a series of sliding windows over a...

View source: R/haplinSlide.R

haplinSlideR Documentation

Run haplin analysis in a series of sliding windows over a sequence of markers/SNPs

Description

Produces a list, each element of which is an object of class haplin, which is the result of fitting the log-linear haplin models to the data one "window" at a time.

Usage

haplinSlide( data, markers = "ALL", winlength = 1, 
strata = NULL, table.output = TRUE, cpus = 1, para.env = NULL, slaveOutfile = "", 
printout = FALSE, verbose = FALSE, ...)

Arguments

data

R-object of class "haplin.ready", which is e.g., output from genDataPreprocess or genDataLoad, and contains covariate and genetic data.

markers

Default is "ALL", which means haplinSlide uses all available markers in the data set in the analysis. Alternatively, the relevant markers can be specified by giving a vector or numbers (e.g., markers = c(1, 3:10) will use the 10 first markers except marker 2) or characters (e.g., markers = c("m1", "m3", "rs35971")). haplinSlide will then run haplin on a series of windows selected from the supplied markers. The winlength argument decides the length of the windows. See details.

winlength

Length of the sliding, overlapping windows to be run along the markers. See details.

strata

A single numeric value specifying which data column contains the stratification variable.

table.output

If TRUE, the haptable function will be applied to each result after estimation, greatly reducing the size of the output. If FALSE, each element of the output list is a standard haplin object. To preserve memory, default is set to TRUE.

cpus

haplinSlide allows parallel processing of its analyses. The cpus argument should preferably be set to the number of available cpu's. If set lower, it will save some capacity for other processes to run. Setting it too high should not cause any serious problems.

para.env

The user can choose parallel environment to use — "parallel" (default) or "Rmpi" (for use on clusters); this option is used only when cpus argument is larger than 1.

slaveOutfile

Character. To be used when cpus > 1. If slaveOutfile = "" (default), output from all running cores will be printed in the standard R session window. Alternatively, the output can be saved to a file by specifying the file path and name.

printout

Default is FALSE. If TRUE, provides a full summary of each haplin result during the run of haplinSlide.

verbose

Same as for haplin, but defaults to FALSE to reduce output size.

...

Remaining arguments to be used by haplin in each run.

Details

haplinSlide runs haplin on a series of overlapping windows of the chosen markers. Except for the markers and winlength arguments, all arguments are used exactly as in haplin itself. For instance, if markers = c(1, 3, 4, 5, 7, 8) and winlength = 4, haplinSlide will run haplin on first the markers c(1, 3, 4, 5), then on c(3, 4, 5, 7), and finally on c(4, 5, 7, 8). The results are returned in a list. The elements are named "1-3-4-5" etc., and can be extracted with, say, summary(res[["1-3-4-5"]]) etc., where res is the saved result. Or the output can be examined by, for instance, using lapply(res, summary) and lapply(res, plot).
When running haplinSlide on a large number of markers, the output can become prohibitively large. In that case table.output should be set to TRUE, and haplinSlide will return a list of summary "haptables". This list can then be stacked into a single dataframe using toDataFrame. To avoid exessive memory use, the default is table.output = TRUE.
When multiple cores are available, set the cpus to the number of cores that should be used. This will run haplinSlide in parallel on the chosen number of cores. Note that feedback is provided by each of the cores separately, and some cores may start working on markers far out in the sequence.

Value

A list of objects of class haplin is returned.

Note

Further information is found on the web page.

Author(s)

Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
hakon.gjessing@uib.no

References

Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.

Web Site: https://haplin.bitbucket.io

See Also

haplin, summary.haplin, plot.haplin, haptable, toDataFrame

Examples


## Not run: 
# (Almost) all standard haplin runs can be done with haplinSlide. 
# Below is an illustration. See the haplin help page for more 
# examples.
# 

# 1. Read the data:
my.haplin.data <- genDataRead( file.in = "HAPLIN.trialdata.txt", file.out =
  "trial_data1", dir.out = tempdir( check = TRUE ), format = "haplin", n.vars = 0 )

# 2. Run pre-processing:
haplin.data.prep <- genDataPreprocess( data.in = my.haplin.data,
  format = "haplin", design = "triad", file.out = "trial_data1_prep",
  dir.out = tempdir( check = TRUE ) )

# 3. Analyze:
# Analyzing the effect of fetal genes, including triads with missing data,
# using a multiplicative response model. When winlength = 1, separate
# markers are used. To make longer windows, winlength can be increased
# correspondingly:
result.1 <- haplinSlide( haplin.data.prep, use.missing = T, response = "mult",
reference = "ref.cat", winlength = 1, table.output = F)
# Provide summary of separate results:
lapply(result.1, summary)
# Plot results:
par(ask = T)
lapply(result.1, plot)





## End(Not run)


Haplin documentation built on Sept. 11, 2024, 7:13 p.m.