knitr::opts_chunk$set(echo = TRUE)

Goal: Check small molecules data for qtl2shiny and other uses.

See Box folder MS data sets for DO project and subfolders

See also Dan Gatti's AttieMetabolomics GitHub and Dan Gatti's FTP site.

Types of molecular measurements:

Here are the files I think are relevant:

size | type | name ---- | -------------- | ------------ 961K | mouse by trait | attie_liver_lipids_zscore_normalized.rds 225K | mouse by trait | attie_liver_metabolites_zscore_normalized.rds 1.2M | mouse by trait | attie_plasma_lipids_zscore_normalized.rds 1.6M | mouse by trait | attie_cecum_lipids_zscore_normalized.rds 176K | peaks | cecum_lipids_jax_norm_qtl_summary_thresh_6.csv 132K | peaks | liver_lipids_jax_norm_qtl_summary_thresh_6.csv 20K | peaks | liver_metabolites_jax_norm_qtl_summary_thresh_6.csv 193K | peaks | plasma_lipids_jax_norm_qtl_summary_thresh_6.csv 55K | description | cecum_lipid_pheno.descr.rds 31K | description | liver_lipid_pheno.descr.rds 2.2K | description | liver_metabolite_pheno.descr.rds 41K | description | plasma_lipids_pheno.descr.rds | untransformed | attie_liver_lipids_normalized.rds | genome scan | liver_lipids_jax_norm_all_qtl.rds

library(stringr)
library(dplyr)
library(tidyr)
datapath <- "~/Documents/Research/attie_alan/DO/data/DerivedData/"

The pheno.descr files are different from what Karl used as dictionary, so will have to construct that manually. Phenotypes follow covariates in the mouse by trait tables, and the Mouse.ID is DO-nnn, as apposed to nn and nnn in Karl's work. All fixable but will take careful examination.

liverz <- readRDS(file.path(datapath, "molecule", 
                            "attie_liver_lipids_zscore_normalized.rds"))
metabz <- readRDS(file.path(datapath, "molecule",
                            "attie_liver_metabolites_zscore_normalized.rds"))
plasmaz <- readRDS(file.path(datapath, "molecule",
                            "attie_plasma_lipids_zscore_normalized.rds"))
cecumz <- readRDS(file.path(datapath, "molecule",
                            "attie_cecum_lipids_zscore_normalized.rds"))

Check agreement on covariates

covar <- readRDS(file.path(datapath, "covar.rds"))
if(!("DOwave" %in% colnames(covar))) {
  covar <- cbind(covar, DOwave = 1 + covar[,"DOwave2"] + 2 * covar[,"DOwave3"] +
    3 * covar[,"DOwave4"] + 4 * covar[,"DOwave5"])
}

Check sex.

sexl <- liverz$sex
names(sexl) <- as.numeric(str_replace(liverz$Mouse.ID, "DO-", ""))
tmp <- qtl2scan::get_common_ids(covar, sexl)
table(covar[tmp, "sex"], sexl[tmp])

check wave and generation

wave <- liverz$wave
names(wave) <- as.numeric(str_replace(liverz$Mouse.ID, "DO-", ""))
tmp <- qtl2scan::get_common_ids(covar, wave)
table(covar[tmp, "DOwave"], wave[tmp])
gen <- liverz$generation
names(gen) <- as.numeric(str_replace(liverz$Mouse.ID, "DO-", ""))
tmp <- qtl2scan::get_common_ids(covar, gen)
table(covar[tmp, "DOwave"], gen[tmp])
table(covar[,"DOwave"])
table(liverz$generation)
tmp2 <- covar[tmp, "DOwave"] < 3 & gen[tmp] == 19
tmp2 <- names(tmp2[tmp2])
covar[tmp2,]
liverz %>%
  filter(as.numeric(str_replace(Mouse.ID, "DO-", "")) %in% tmp2) %>%
  select(1:11)

Pull out relevant phenotypes and covariates and analyses stuff.

We need the following to tranform into Karl type files:

Details to be added here. Need to modify covar, save as data frame, and also modify analyses_tbl. Need to coordinate change to data frame in particular with use of DOwave as covariate.

Need to extract covariates, compare that Mouse.ID, sex, wave, chrM, chrY agree across all sets. Then need to give Batch unique names and combine.

Need to extract phenos and put in form to be reused.

Need to get analyses info right.

Need to go to CSV files to get peaks.

analyses table

Covariates matrix and Analyses table

Strategy is to make covar a data frame (convert from current format as numeric matrix) and use model.matrix. We will add columns for Batch (note names above) and for chrM and chrY, which might be useful in the future. Also add coat_color. All other covariates are ignored here.

Here, covariates are indicated in liverd$covar_list as colon-separated names, but all are sex:generation:Batch. Batch is distinct (I think) for each type of molecule. generation is DO generation, which should agree with DOwave (1=17, 2=18, 3=19, 4=21, 5=2_). Turns out generation has 5 discrepancies. We will use DOwave

The covariate values are with phenotypes as first several columns and some extra columns for mitochondria and chr Y (chrM, chrY).

liverz %>%
  count(chrM, chrY) %>%
  spread(chrM, n)

Check if all equal across small molecules

Comparisons are made below between liver lipids and the other three analytes.

all.equal(liverz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY),
          plasmaz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY))
table(liverz$Batch,plasmaz$Batch)
all.equal(liverz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY),
          metabz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY))
matchM <- qtl2scan::get_common_ids(liverz$Mouse.ID, metabz$Mouse.ID)
matchM2 <- which(diff(match(metabz$Mouse.ID, matchM)) != 1)
matchM2 <- sort(unique(c(matchM2 - 1, matchM2, matchM2 + 1)))
tmp <- rbind(metabz$Mouse.ID[matchM2], liverz$Mouse.ID[matchM2])
dimnames(tmp) <- list(c("metab","liver"), matchM2)
tmp
matchM2 <- match(matchM, metabz$Mouse.ID)
all.equal(liverz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY),
          metabz[matchM2,] %>% select(Mouse.ID,coat_color,Batch,chrM,chrY))
table(liverz$Batch,as.numeric(str_replace(metabz[matchM2,"Batch"], "Batch", "")))

Liver and metab differ in Batch for r sum(!(liverz$Batch == as.numeric(str_replace(metabz[matchM2,"Batch"], "Batch", "")))) mice. This is a different mouse: r liverz$Mouse.ID[which(!(liverz$Batch == as.numeric(str_replace(metabz[matchM2,"Batch"], "Batch", ""))))].

Cecum has 3 fewer mice than the other assays. Batch is substantially different for cecum.

all.equal(liverz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY),
          cecumz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY))
matchC <- qtl2scan::get_common_ids(liverz$Mouse.ID, cecumz$Mouse.ID)
matchC2 <- match(matchC, liverz$Mouse.ID)
all.equal(liverz[matchC2,] %>% select(Mouse.ID,coat_color,Batch,chrM,chrY),
          cecumz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY))
table(liverz[matchC2,"Batch"],cecumz$Batch)

Liver and cecum differ in Batch for r sum(!(liverz[matchC2,"Batch"] == cecumz$Batch)) mice.

Conclusions:

Need to add generation and Batch, but there will be four Batch (at least): Batch_liver_lipid, Batch_plasma_lipids, Batch_cecum_lipids, Batch_liver_metabolites.

analyses_tbl info

Now need to create analyses_tbl object here. First have to get peaks

Phenotypes

Phenotypes are in files with names like attie_liver_lipids_zscore_normalized.rds. That is, they are Z-scores of Jax-normalized data. These files have phenotypes prepended by covariates and some extra columns for mitochondria and chr Y (chrM, chrY).

Approach here is to match Mouse.ID and pick up phenotypes. These are large files (1Gb).

More detailed look at liver metab

library(ggplot2)
metabn <- readRDS(file.path(datapath, "molecule",
                            "attie_liver_metabolites_zscore_uwisc_normalized.rds"))
dat <- data.frame(x = unlist(metabn[104,-(1:13)]), 
                  y = unlist(metabz[104,-(1:13)]))
ggplot(dat, aes(x,y)) + geom_point()
dat <- data.frame(x = unlist(metabn[161,-(1:13)]), 
                  y = unlist(metabz[106,-(1:13)]))
ggplot(dat, aes(x,y)) + geom_point()
dat <- data.frame(x = unlist(metabn[106,-(1:13)]), 
                  y = unlist(metabz[106,-(1:13)]))
ggplot(dat, aes(x,y)) + geom_point()
dat <- data.frame(x = unlist(metabn[161,-(1:13)]), 
                  y = unlist(metabz[161,-(1:13)]))
ggplot(dat, aes(x,y)) + geom_point()


byandell/DOread documentation built on May 13, 2019, 9:26 a.m.