In byandell/qtl2shiny: Shiny interface for 'qtl2' Fine Mapping

knitr::opts_chunk$set(echo = TRUE)

This gives a brief look at the compressed data used in qtl2shiny. See qtl2shinyData.Rmd for details about how the data are constructed. This document is ShinyCompress.Rmd.

The key compressed data files use SQL or FST compression. For details on FST, see

For more info on fst, which was developed for R, see

https://github.com/fstpackage
https://github.com/fstpackage/fst/issues/223 (other languages)
https://github.com/xiaodaigh/FstFileFormat.jl (Julia)

Assume here that ResearchDrive has been mounted as /Volumes/byandell. The data for this project are in /Volumes/byandell/byandell/qtl2shinyApp/, which contains

app.R: The shiny app fill to execute with shiny::runApp()
qtl2shinyData: folder with data
CCmouse: taxa-specific data
projects.csv: master file for data used in the shiny app

In the CCmouse folder are the following files of import right here:

cc_variants.sqlite: SQL of DNA variants (3G)
mouse_genes.sqlite: SQL of mouse genes (677M)
Recla/genoprob: folder of genotype probabilities (1.4G)
Recla/genoprob/aprobs_fstindex.rds: (69K)
Recla/genoprob/aprobs_11.fst: (6.3M)

This is the small problem (Recla). The larger problem (AttieDOv2) is much larger

AttieDOv2/genoprob: folder of genotype probabilities (11G)
AttieDOv2/genoprob/aprobs_fstindex.rds: (644K)
AttieDOv2/genoprob/aprobs_11.fst: (116M)

datapath <- params$datapath
dir(datapath)

project_name <- as.character(params$projectName) # "Recla" or "AttieDOv2"
chri <- as.character(params$chr)
start_val <- as.numeric(params$start)
end_val <- as.numeric(params$end)

ccpath <- file.path(datapath, "qtl2shinyData", "CCmouse")

Size in GB of two project genoprob folders:

size_files <- NULL
for(i in c("Recla","AttieDOv2")) {
  files<-list.files(file.path(ccpath, i, "genoprob"),full.names = T)
  vect_size <- sapply(files, file.size)
  size_files[i] <- sum(vect_size)
}
round(size_files / 1e9, 2)

Here we show several ways to access the genotype probabilities, which are stored in FST files. We begin with a simple call to R/fst, followed by a call using R/qtl2fst, then a call using R/qtl2pattern.

map <- readRDS(file.path(file.path(ccpath, project_name), "pmap.rds"))
# Select chromosome chri
map <- map[chri]
# Select markers between start_val and end_val
wh <- which(map[[chri]] >= start_val & map[[chri]] <= end_val)
(map[[chri]] <- map[[chri]][wh])

Bare Bones Read From Compressed Files

Read FST file with `R/fst`

The FST files are compressed 2D tables. Fof qtl2 data, the 3D genotype probabilities are stored with rows having genotypes and subjects, and the columns having the markers. Here is a raw R/fst read of the file to pull out the r length(wh) markers (columns), and the first genotype and first 8 subjects (rows):

fstfile <- file.path(ccpath, project_name ,"genoprob",
                     paste0("aprobs_", chri, ".fst"))
fst::read_fst(fstfile, columns = names(map[[chri]]), from = 1, to = 8)

Read SQLite file with `R/RSQLite`

dbfile <- file.path(ccpath, "cc_variants.sqlite")
db <- RSQLite::dbConnect(RSQLite::SQLite(), dbfile)
start <- round(start_val * 1e+06)
end <- round(end_val * 1e+06)
# Shorten distance to reduce time for demo
ave <- (start + end) / 2
start <- ave + (start - ave) / 10
end <- ave + (end - ave) / 10
chrselect <- paste0("chr == '", chri, "'")
startselect <- paste0("pos >= ", start)
endselect <- paste0("pos <= ", end)
query <- paste0("SELECT * FROM variants WHERE ", 
                chrselect, " AND ", startselect, " AND ", endselect)
variants <- RSQLite::dbGetQuery(db, query)
RSQLite::dbDisconnect(db)

str(variants)

Genotype Probabilities

The following approaches subsets by chromosome r chri and by markers between r start_val and r end_val positions for project names r project_name.

Read with `R/qtl2fst`

A R/qtl2fst object has a master object typically stored in an RDS file. This does not read the genotype data compressed vai FST, but includes the information about what is in those FST files. It is good practice to replace the path to where data are stored. For information, see https://kbroman.org/qtl2/assets/vignettes/qtl2fst.html.

probs <- readRDS(file.path(ccpath, project_name, "genoprob", "aprobs_fstindex.rds"))
(probs <- qtl2fst::replace_path(probs, file.path(ccpath, project_name, "genoprob", "aprobs")))

print(object.size(probs), units = "MB")

We now subset by chromosome r chri, and by a range of positions on that chromosome. This changes the master object probs but does not change the underlying FST files.

# Which markers in saved data agree with map.
wh <- match(names(map[[chri]]), dimnames(probs)$mar[[chri]])
(probs <- qtl2fst::subset_fst_genoprob(probs, chr = chri, mar = wh))

print(object.size(probs), units = "MB")

This is the size of the FST master object in R, while the data still resides at `r project_name/genoprob/aprobs_r chri.fst`.

paste(
  round(
    file.size(
      file.path(ccpath, project_name, "genoprob",
                paste0("aprobs_", chri, ".fst"))) / 1e6,
    2),
  "Mb")

Extract data from `qtl2fst` object

The probs object is a small metadata file with information about the genoprob data, such as dimensions and dimnames of list elements, but no data. We can extract data from the object by accessing elements of the 3D array on chromosome chri. This is generally what is done in the R/qtl2 package when using this class of genotype probabilities. The following shows the 3 markers, first genotype, and first 8 subjects:

probs[[chri]][1:8,1,]

It is possible to extract the calc_genoprob from the metadata object and compressed FST file(s). If you try to do this for the whole chromosome, or all chromosomes, it will take some time.

probsq <- qtl2fst::fst_extract(probs)
class(probsq)

print(object.size(probsq), units = "MB")

Here is a fake (unevaluated) extraction of all of chromosome rchri` to show the system time and object size.

probs <- qtl2fst::subset_fst_genoprob(probs, chr = chri)
system.time(probsq <- qtl2fst::fst_extract(probs))

##    user  system elapsed 
##   0.102   0.251  44.674

print(object.size(probsq), units = "MB")

## 6.3 Mb

Read data with `R/qtl2pattern`

The following use functions in package qtl2pattern. The basics is the function read_probs to read the genotype probabilities. It actually gets both the probs and the map.

pr <- qtl2pattern:::read_probs(chr=chri, start_val=start_val, end_val=end_val,
                               file.path(ccpath, project_name),
                               allele = TRUE, method = "fast", 
                               probdir = "genoprob")

names(pr)

Again, we need to replace the path to access the data properly

(pr$probs <- 
   qtl2fst::replace_path(
     pr$probs,
     file.path(ccpath, project_name, "genoprob", "aprobs")))

print(object.size(pr$probs), units = "MB")

Query functions with `R/qtl2pattern`

Query functions provide a way to embed details about the data in a local function, and just focues on identifying the chromosome and marker positions.

(query_probs <- qtl2pattern::create_probs_query_func(file.path(ccpath, project_name)))

pr <- query_probs(chri, start_val, end_val)

The query function sets up a call to read_probs, which is an internal function in qtl2pattern package described earlier.

DNA Variants

Internally, we use a query function that embeds the read_probs function call:

(query_variants <- 
  qtl2::create_variant_query_func(file.path(ccpath, "cc_variants.sqlite")))

ls(envir = environment(query_variants))

get("dbfile", envir = environment(query_variants))

# Shorten distance to reduce time for demo
ave <- (start_val + end_val) / 2
start <- ave + (start_val - ave) / 10
end <- ave + (end_val - ave) / 10

variants <- query_variants(chri, start, end)

str(variants)

byandell/qtl2shiny documentation built on June 11, 2025, 4:54 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

byandell/qtl2shiny
Shiny interface for 'qtl2' Fine Mapping

In byandell/qtl2shiny: Shiny interface for 'qtl2' Fine Mapping

Bare Bones Read From Compressed Files

Read FST file with `R/fst`

Read SQLite file with `R/RSQLite`

Genotype Probabilities

Read with `R/qtl2fst`

Extract data from `qtl2fst` object

Read data with `R/qtl2pattern`

Query functions with `R/qtl2pattern`

DNA Variants

R Package Documentation

Browse R Packages

We want your feedback!

byandell/qtl2shiny Shiny interface for 'qtl2' Fine Mapping

In byandell/qtl2shiny: Shiny interface for 'qtl2' Fine Mapping

Bare Bones Read From Compressed Files

Read FST file with R/fst

Read SQLite file with R/RSQLite

Genotype Probabilities

Read with R/qtl2fst

Extract data from qtl2fst object

Read data with R/qtl2pattern

Query functions with R/qtl2pattern

DNA Variants

R Package Documentation

Browse R Packages

We want your feedback!

byandell/qtl2shiny
Shiny interface for 'qtl2' Fine Mapping

Read FST file with `R/fst`

Read SQLite file with `R/RSQLite`

Read with `R/qtl2fst`

Extract data from `qtl2fst` object

Read data with `R/qtl2pattern`

Query functions with `R/qtl2pattern`