library("knitr")

r opts_chunk$set(fig.width=5, fig.height=5, tidy=FALSE)

Prerequisites

If you do not have R, navigate to http://www.r-project.org to download and install it. New users should download and read the reference manual.

Installing the DSPRqtl Analysis Package

To install the analysis package, within R, execute:

install.packages("DSPRqtl", 
                 repos = "http://wfitch.bio.uci.edu/R/", 
                 type = "source")

Note: you must use the statement repos = "http://wfitch.bio.uci.edu/R/".

Installing the DSPRqtlData Packages

  1. Install one or both data packages (DSPRqtlDataA and/or DSPRqtlDataB). If only one population was used, only one data package (A or B) needs to be installed.
  2. The data packages are ~3.2 GB each and will take several minutes (to hours) to download and install depending on your connection speed. These packages are too large to be stored on and installed from CRAN. If you do not install these packages, the functions in the DSPRqtl package will fetch each position file from the internet. This method takes considerably longer to run.

Installation within R

Within R (not recommended in RStudio), type:

install.packages("DSPRqtlDataA", 
                 repos = "http://wfitch.bio.uci.edu/R/", 
                 type = "source")

and/or

install.packages("DSPRqtlDataB", 
                 repos = "http://wfitch.bio.uci.edu/R/", 
                 type = "source")

for the pA and pB data packages, respectively.

Note: you must use the statement repos = "http://wfitch.bio.uci.edu/R/".

Installation from the Downloaded Package Source

If installation within R fails, the packages can be downloaded manually and installed from within R from local source. Download the package source files from:

using a web browser or command line tool (e.g., wget). After the files have downloaded, execute the following commands in R (one, the other, or both as necessary):

install.packages("DSPRqtlDataA_2.0-1.tar.gz", repos = NULL, type = "source")
install.packages("DSPRqtlDataB_2.0-1.tar.gz", repos = NULL, type = "source")

Note: You may have to modify the path to the package file.

Loading and Using the Data Packages

Once the data packages are installed, they can be loaded using:

library(DSPRqtlDataA)

and/or

library(DSPRqtlDataB)

Then, the set of additive HMM probabilities and raw HMM probabilities for any given position can be obtained with:

data(A_chromosome.arm_position.in.base.pairs)

e.g.,

data(A_X_12000000)
# This gives a data.frame named A_X_12000000

A list of regularly spaced positions every 10 kb is available at http://FlyRILs.org/Data or within the DSPRqtl package (see above for install instructions).

library(DSPRqtl)
data(positionlist_wgenetic)
# This gives a data.frame named poslist

Data Analysis Tutorial

Before you begin, please also obtain the DSPRqtl manual (see http://FlyRILs.org/Tools/Tutorial) and refer to it throughout this tutorial. Also load the DSPRqtl package:

library(DSPRqtl)

Phenotype Data {#PD}

data(ADHdata)
# a data.frame named ADHdata

head(ADHdata)

To import your data into R, you must read the file in. R will accept many types of files. For help reading files into R, see http://cran.r-project.org/doc/manuals/R-data.html. The code to read in a tab-delimited text file is shown below as an example.

phenotype.dat <- read.table("/path/to/your/file.txt", 
                            header = TRUE)

Genome Scan

One can perform a genome scan using the DSPRscan() function for experimental designs where the inbred RILs are measured directly, crosses to a standard, or the pA-pB cross design. For users requiring more flexibility, see the Advanced Analysis section below. Handling interactive covariates and analyzing multiple phenotypes at once is in development. The DSPRscan() function performs a genome scan that regresses the 8 founder genotype probabilities (or 16 in the case of the pA-pB cross) on the measured phenotype at evenly spaced 10 kb positions across the genome:

DSPRscan(model, design, phenotype.dat, id.col, batch, sex)

For a single phenotype with the data package installed, a genome scan should take ~15--20 min.

Using the ADH example data:

data(ADHdata)
scan.results <- DSPRscan(adh ~ 1, 
                         design = "inbredA", 
                         phenotype.dat = ADHdata,
                         id.col='patRIL')

The output of DSPRscan() is a list containing:

There is example output from a finished genome scan of the ADH data in this package as well. Type:

data(ADHscan)

To extract the LOD scores data.frame:

ADH.lod.scores <- ADHscan$LODscores

ADH.lod.scores[100:105, ]

Permutation Test

Perform a permutation test using the function DSPRperm() to obtain a significance threshold. For initial data exploration, the value 6.8 can be used for inbred designs and 10.1 for the ABcross. This threshold seems to be fairly stable for multiple phenotypes that we have tested, but we recommend each user perform a permutation test for their specific data set.

DSPRperm(model, design, phenotype.dat, id.col, batch, niter, alpha, sex)

For model, design, phenotype.dat, id.col, batch, sex see above description for DSPRscan(). The two additional arguments to DSPRperm() are:

The output of DSPRperm() is a list containing:

After a permutation test is performed, the maxLODs can also be used to determine the significance threshold at another alpha level. For example,

perm.test <- DSPRperm(adh ~ 1,
                      design = "inbredA",
                      phenotype.dat = ADHdata)

For $\alpha = 0.01$:

quantile(perm.test$maxLODs, 1 - 0.01)

Identify Significant QTL

Get a summary of the significant peaks. Finding and summarizing significant peaks can be done in a single step using the function DSPRpeaks(). The individual functions to get the values are also available (see the DSPRqtl manual). After peaks are identified, it is important for the user to confirm these represent distinct peaks.

DSPRpeaks(qtldat, method, threshold, LODdrop, BCIprob)

The output of DSPRpeaks() is a list of peaks. Each peak is a list containing:

Using the ADHscan output:

load("peaks.rda")
peaks <- DSPRpeaks(ADHscan, threshold = 6.8, LODdrop = 2)

And the main QTL is found at position 26 in the list:

peaks[[26]]

The DSPRscan() results can be plotted using the DSPRplot() function. Multiple scan results can be plotted on the same plot. Pass the scan results as a list() to the DSPRscan() function.

DSPRplot(list(ADHscan), threshold=6.8)

Local Interval Mapping

The user may wish to perform local interval mapping to compare the peak locations and confidence intervals. The LocalInt() function will perform interval mapping for a range of positions given by the user. FindCI() can then be used to re-estimate confidence intervals. This function is only available for the "inbredA" or "inbredB" designs.

LocalInt(peakChr, peakPos, range, phenotype.dat, pheno.name, design)

The output of LocalInt is the same as the LODscores from DSPRscan but only for the specified set of positions.

Using the ADH sample data:

# The main QTL
main.peak <- peaks[[26]]
peakChr <- main.peak$peak$chr
peakPos <- main.peak$peak$Ppos
peak.int <- LocalInt(peakChr,
                     peakPos,
                     phenotype.dat = ADHdata,
                     pheno.name = "adh",
                     design = "inbredA")

Advanced Analysis {#AA}

Generate Genotype Information

This package helps automate analysis for simple DSPR designs. Many users (especially those with complex interactions or complex cross designs) will require more flexibility. To aid those users, the DSPRgenos() function is available to set up the genotype information using a given phenotype data file. This function can handle the following designs:

The function can handle multiple measurments, a mix of males and females, and a mix of cross directions in the case of the pA-pB cross (i.e. A males to B females and vice versa). The function outputs either a list or a 3 dimensional array and the user can then use the many built in analysis tools in R to fit a model across the genome.

DSPRgenos(design, phenotype.dat, id.col, output)

The output of DSPRgenos() is a list containing:

The user can easily use apply (in the case of an array), lapply (in the case of list), or a loop to do a genome scan of whatever type they wish. Future tutorials will provide sample code for specific senarios (e.g., the round robin design).

The DSPRgenos() function is somewhat time consuming to run (~30 min. - 1 hour). However genome scans using the output should be fast (2-10 min for simple designs). The user will likely wish to save the output especially if planning to run genome scans with multiple different models. Because the output is a large object (~11,000 positions in the genome), it is much more efficient to use the save function in R as shown below:

myGenos <- DSPRgenos(design, phenotype.dat, id.col, output)
save(myGenos, file = "my file path")

To return to these results later, use load:

load(file="my file path")
# The object will load into the workspace with the same name as
# when save was used. In this example, this is myGenos.

Questions?

If you have any questions or have trouble with this tutorial, please contact flyrils@gmail.com.



egking/DSPRqtl documentation built on May 16, 2019, 12:14 a.m.