R/qtl2 user guide

R/qtl2 (aka qtl2) is a reimplementation of the QTL analysis software R/qtl, to better handle high-dimensional data and complex cross designs.

Installation

R/qtl2 is early in development and so is not yet available on CRAN.

You can install R/qtl2 from its GitHub repository. You first need to install the devtools package.

install.packages("devtools")

Then install R/qtl2 using devtools::install_github().

library(devtools)
install_github("kbroman/qtl2")

Data file format

The input data file formats for R/qtl cannot handle complex crosses, and so for R/qtl2, we have defined a new format for the data files. We'll describe it here briefly; for details, see the separate vignette on the input file format.

QTL mapping data consists of a set of tables of data: marker genotypes, phenotypes, marker maps, etc. In the new format, these different tables are in separate comma-delimited (CSV) files. In each file, the first column is a set of IDs for the rows, and the first row is a set of IDs for the columns. For example, the phenotype data file will have individual IDs in the first column and phenotype names in the first row.

A few important changes in the tabular data:

In additional to the set of CSV files with the primary data, we need a separate “control” file with various control parameters (or metadata), including the names of all of the other data files and the genotype codes used in the genotype data file. The control file is in a specific YAML format. YAML is a human-readable text file for representing relatively complex data. (It's much like JSON, but much more readable.)

A big advantage of this control file scheme is that it greatly simplifies the function for reading in the data. That function, read_cross2(), has a single argument: the name (with path) of the control file. So you can read in data like this:

library(qtl2)
grav2 <- read_cross2("~/my_data/grav2.yaml")

The large number of files is a bit cumbersome, so we've made it possible to use a [zip file](http://en.wikipedia.org/wiki/Zip_(file_format) containing all of the data files, and to read that zip file directly. There's even a function for creating the zip file:

zip_datafiles("~/my_data/grav2.yaml")

This zip_datafiles() function will read the control file to identify all of the relevant data files and then zip them up into a file with the same name and location, but with the extension .zip rather than .yaml.

To read the data back in, we use the same read_cross2() function, providing the name (and path) of the zip file rather than the control file.

grav2 <- read_cross2("~/my_data/grav2.zip")

This can even be done with remote files.

grav2 <- read_cross2("http://kbroman.org/qtl2/assets/sampledata/grav2/grav2.zip")

Of course, the other advantage of the zip file is that it is compressed and so smaller than the combined set of CSV files.

The control file may be confusing for some users. To assist in its construction, there's a function write_control_file() that takes the large set of control parameters as input and then writes the YAML control file in the appropriate format.

Sample data sets

The R/qtl2 web site includes sample data files in the new format. Zipped versions of these datasets are included with the package and can be loaded into R using the read_cross2() function.

In the package source, the sample zip files are located in qtl2/inst/extdata. In the installed version of the package, they are in qtl2/extdata, within whatever directory your R packages were installed. The R function system.file() can be used to construct the path to these files.

For example, one of the sample data sets concerns a gravitropism phenotype in a set of Arabidopsis recombinant inbred lines (RIL), from Moore et al. (2013) Genetics 195:1077-1086. The data are in qtl2/extdata/grav2.zip, which can be loaded as follows:

library(qtl2)
grav2 <- read_cross2( system.file("extdata", "grav2.zip", package="qtl2") )



simecek/qtl2 documentation built on May 29, 2019, 10:01 p.m.