library("knitr")
r opts_chunk$set(fig.width=5, fig.height=5, tidy=FALSE)
If you do not have R, navigate to
http://www.r-project.org to download
and install it. New users should download and read the reference
manual.
To install the analysis package, within R, execute:
install.packages("DSPRqtl", repos = "http://wfitch.bio.uci.edu/R/", type = "source")
Note: you must use the statement repos =
"http://wfitch.bio.uci.edu/R/".
DSPRqtlDataA and/or
DSPRqtlDataB). If only one population was used, only one data
package (A or B) needs to be installed.DSPRqtl package will fetch each position file from the
internet. This method takes considerably longer to run.Within R (not recommended in RStudio), type:
install.packages("DSPRqtlDataA", repos = "http://wfitch.bio.uci.edu/R/", type = "source")
and/or
install.packages("DSPRqtlDataB", repos = "http://wfitch.bio.uci.edu/R/", type = "source")
for the pA and pB data packages, respectively.
Note: you must use the statement repos = "http://wfitch.bio.uci.edu/R/".
If installation within R fails, the packages can be downloaded manually and installed from within R from local source. Download the package source files from:
http://wfitch.bio.uci.edu/R/src/contrib/DSPRqtlDataA_2.0-1.tar.gzhttp://wfitch.bio.uci.edu/R/src/contrib/DSPRqtlDataB_2.0-1.tar.gzusing a web browser or command line tool (e.g.,
wget). After the files have
downloaded, execute the following commands in R (one, the other, or
both as necessary):
install.packages("DSPRqtlDataA_2.0-1.tar.gz", repos = NULL, type = "source") install.packages("DSPRqtlDataB_2.0-1.tar.gz", repos = NULL, type = "source")
Note: You may have to modify the path to the package file.
Once the data packages are installed, they can be loaded using:
library(DSPRqtlDataA)
and/or
library(DSPRqtlDataB)
Then, the set of additive HMM probabilities and raw HMM probabilities for any given position can be obtained with:
data(A_chromosome.arm_position.in.base.pairs)
e.g.,
data(A_X_12000000) # This gives a data.frame named A_X_12000000
A list of regularly spaced positions every 10 kb is available at
http://FlyRILs.org/Data or within the
DSPRqtl package (see above for install instructions).
library(DSPRqtl) data(positionlist_wgenetic) # This gives a data.frame named poslist
Before you begin, please also obtain the DSPRqtl manual (see
http://FlyRILs.org/Tools/Tutorial) and refer to it throughout this tutorial. Also load the DSPRqtl
package:
library(DSPRqtl)
"patRIL" (and in the case
of crossing designs, also one named "matRIL") containing the
numeric RIL IDs (e.g., 11001, 11002, ...) and a column with
phenotype measurements (with a name chosen by the user). Cross
designs must also contain a column specifying the sex of the
individuals measured so the X chromosome is handled correctly.DSPRqtl
package. Throughout this tutorial, the example dataset ADHdata
(included in the R package) will be used to illustrate each step.data(ADHdata) # a data.frame named ADHdata head(ADHdata)
To import your data into R, you must read the file in. R will accept
many types of files. For help reading files into R, see
http://cran.r-project.org/doc/manuals/R-data.html. The code to read in a
tab-delimited text file is shown below as an example.
phenotype.dat <- read.table("/path/to/your/file.txt", header = TRUE)
One can perform a genome scan using the DSPRscan() function for
experimental designs where the inbred RILs are measured directly,
crosses to a standard, or the pA-pB cross design. For users requiring
more flexibility, see the Advanced Analysis section
below. Handling interactive covariates and analyzing multiple
phenotypes at once is in development. The DSPRscan() function
performs a genome scan that regresses the 8 founder genotype
probabilities (or 16 in the case of the pA-pB cross) on the measured
phenotype at evenly spaced 10 kb positions across the genome:
DSPRscan(model, design, phenotype.dat, id.col, batch, sex)
model is the null model in formula notation. For a phenotype with
no covariates, it is of the form phenotype ~ 1. With a single
additive covariate, it is: phenotype ~ covariate.design is either "inbredA" or "inbredB" for inbred designs pA
and pB RILs and crosses to a standard. For the pA-pB cross, use
"ABcross".phenotype.dat is the data.frame read in the section Phenotype
Data containing the RIL IDs, phenotype, and any
covariates.id.col identifies the column to use as unique ids for the
measurements. For an inbred design, this column could be the patRIL
column. For a cross design, a unique id column should be included.batch is the number of positions tested at one time. Defaults to
1,000. If memory use is a problem, reduce this number.sex is the sex of the individuals measured and must be specified
for the ABcross. It is either "m" or "f". For inbred designs,
it can be left out.For a single phenotype with the data package installed, a genome scan should take ~15--20 min.
Using the ADH example data:
data(ADHdata) scan.results <- DSPRscan(adh ~ 1, design = "inbredA", phenotype.dat = ADHdata, id.col='patRIL')
The output of DSPRscan() is a list containing:
LODscores This is a data.frame with positions and LOD scores.model This is the model statement.design This is the design specified.phenotype This is the phenotype data.sex This is the sex of the individuals measured.There is example output from a finished genome scan of the ADH data in this package as well. Type:
data(ADHscan)
To extract the LOD scores data.frame:
ADH.lod.scores <- ADHscan$LODscores ADH.lod.scores[100:105, ]
Perform a permutation test using the function DSPRperm() to obtain
a significance threshold. For initial data exploration, the value 6.8
can be used for inbred designs and 10.1 for the ABcross. This
threshold seems to be fairly stable for multiple phenotypes that we
have tested, but we recommend each user perform a permutation test
for their specific data set.
DSPRperm(model, design, phenotype.dat, id.col, batch, niter, alpha, sex)
For model, design, phenotype.dat, id.col, batch, sex see above
description for DSPRscan(). The two additional arguments to
DSPRperm() are:
niter The number of permutations to perform. Default is 1,000.alpha The desired Type I error rate. The output of DSPRperm() is a list containing:
maxLODs A vector of maximum LOD scores for each permutation.alpha The specified alpha levelthreshold The significance threshold.After a permutation test is performed, the maxLODs can also be used
to determine the significance threshold at another alpha level. For
example,
perm.test <- DSPRperm(adh ~ 1, design = "inbredA", phenotype.dat = ADHdata)
For $\alpha = 0.01$:
quantile(perm.test$maxLODs, 1 - 0.01)
Get a summary of the significant peaks. Finding and summarizing
significant peaks can be done in a single step using the function
DSPRpeaks(). The individual functions to get the values are also
available (see the DSPRqtl manual). After peaks are identified, it
is important for the user to confirm these represent distinct peaks.
DSPRpeaks(qtldat, method, threshold, LODdrop, BCIprob)
qtldat Output from DSPRscan().threshold The threshold found with the permutation test. Defaults
to 6.8 or 10.1 for inbred and ABcross designs respectively.LODdrop The desired LODdrop for support intervals. Defaults to 2.BCIprob The desired nominal Bayes fraction for support intervals.
Defaults to 0.95.The output of DSPRpeaks() is a list of peaks. Each peak is a list
containing:
threshold The specified threshold.peak The peak position and LOD score.LODdrop The specified LODdrop.BCIprob The specified BCIprob.CI The confidence interval.founderNs The number of RILs with each founder genotype at the
peak.geno.means The founder means and standard errors at the peak.perct.var The percent variation explained by the QTL.entropy The proportion missing information.Using the ADHscan output:
load("peaks.rda")
peaks <- DSPRpeaks(ADHscan, threshold = 6.8, LODdrop = 2)
And the main QTL is found at position 26 in the list:
peaks[[26]]
The DSPRscan() results can be plotted using the DSPRplot()
function. Multiple scan results can be plotted on the same plot. Pass
the scan results as a list() to the DSPRscan() function.
DSPRplot(list(ADHscan), threshold=6.8)
The user may wish to perform local interval mapping to compare the
peak locations and confidence intervals. The LocalInt() function
will perform interval mapping for a range of positions given by the
user. FindCI() can then be used to re-estimate confidence
intervals. This function is only available for the "inbredA" or
"inbredB" designs.
LocalInt(peakChr, peakPos, range, phenotype.dat, pheno.name, design)
peakChr Chromosome arm at the peak.peakPos Position in base pairs at the peak.range The range of positions to examine. Default is 100 (on
either side, positions are 10 kb apart)phenotype.dat The phenotype data.frame. See the section
Phenotype Data.pheno.name The name of the column containing the phenotype.design "inbredA" or "inbredB"The output of LocalInt is the same as the LODscores from DSPRscan
but only for the specified set of positions.
Using the ADH sample data:
# The main QTL main.peak <- peaks[[26]] peakChr <- main.peak$peak$chr peakPos <- main.peak$peak$Ppos
peak.int <- LocalInt(peakChr, peakPos, phenotype.dat = ADHdata, pheno.name = "adh", design = "inbredA")
This package helps automate analysis for simple DSPR designs. Many
users (especially those with complex interactions or complex cross
designs) will require more flexibility. To aid those users, the
DSPRgenos() function is available to set up the genotype
information using a given phenotype data file. This function can
handle the following designs:
"inbredA" and "inbredB" Direct measurements on the RILs or
crosses to a standard"AAcross" and "BBcross" Crosses within the pA or pB RIL panels
(e.g., round robin design)"ABcross" Crosses between the pA and pB RIL panels. The function can handle multiple measurments, a mix of males and females, and a mix of cross directions in the case of the pA-pB cross (i.e. A males to B females and vice versa). The function outputs either a list or a 3 dimensional array and the user can then use the many built in analysis tools in R to fit a model across the genome.
DSPRgenos(design, phenotype.dat, id.col, output)
design "inbredA","inbredB","AAcross","BBcross", or
"ABcross".phenotype.dat The phenotype data.frame. See the section
Phenotype Data.id.col identifies the column to use as unique ids for the
measurements. For an inbred design, this column could be the patRIL
column. For a cross design, a unique id column should be included.output Format for the genotype output. Either "list" or
"array". Default is "list".The output of DSPRgenos() is a list containing:
genolist a list containing the matrix of additive genotype
probabilities at each position in the genome. The list is in the
same order as the list of positions described below. Column names
are the different DSPR haplotypes and row names are the unique ids
provided in id.col. Alternatively, if output='array', the output is
a 3 dimensional array [samples,haplotypes,positions]. Haplotypes
and samples are named while positions are in the order of the list
of positions described below.positions a data.frame containing regularly spaced positions
(every 10KB) in the genome where the genotype probabilities are
calculated. Columns are: chr = chromosome, Ppos = physical position
(zero offset), Gpos = genetic position, Gaxis = cummulative genetic
position.phenotype the phenotype data.frame ordered by the specified id
column and in the same order as the genotype information at each
position in the genome in the geno list described above.The user can easily use apply (in the case of an array), lapply (in the case of list), or a loop to do a genome scan of whatever type they wish. Future tutorials will provide sample code for specific senarios (e.g., the round robin design).
The DSPRgenos() function is somewhat time consuming to run (~30
min. - 1 hour). However genome scans using the output should be fast
(2-10 min for simple designs). The user will likely wish to save the
output especially if planning to run genome scans with multiple
different models. Because the output is a large object (~11,000
positions in the genome), it is much more efficient to use the save
function in R as shown below:
myGenos <- DSPRgenos(design, phenotype.dat, id.col, output) save(myGenos, file = "my file path")
To return to these results later, use load:
load(file="my file path") # The object will load into the workspace with the same name as # when save was used. In this example, this is myGenos.
If you have any questions or have trouble with this tutorial, please contact flyrils@gmail.com.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.