In byandell/DOread: Read DO data

knitr::opts_chunk$set(echo = TRUE)

Goal: Check small molecules data for qtl2shiny and other uses.

See Box folder MS data sets for DO project and subfolders

RDS files for all QTLs (genome scans)
Gatti R scripts
Raw and UW normalized data files

Types of molecular measurements:

liver lipids
cecum lipids
plasma lipids
liver metabolites

Here are the files I think are relevant:

size | type | name ---- | -------------- | ------------ 961K | mouse by trait | attie_liver_lipids_zscore_normalized.rds 225K | mouse by trait | attie_liver_metabolites_zscore_normalized.rds 1.2M | mouse by trait | attie_plasma_lipids_zscore_normalized.rds 1.6M | mouse by trait | attie_cecum_lipids_zscore_normalized.rds 176K | peaks | cecum_lipids_jax_norm_qtl_summary_thresh_6.csv 132K | peaks | liver_lipids_jax_norm_qtl_summary_thresh_6.csv 20K | peaks | liver_metabolites_jax_norm_qtl_summary_thresh_6.csv 193K | peaks | plasma_lipids_jax_norm_qtl_summary_thresh_6.csv 55K | description | cecum_lipid_pheno.descr.rds 31K | description | liver_lipid_pheno.descr.rds 2.2K | description | liver_metabolite_pheno.descr.rds 41K | description | plasma_lipids_pheno.descr.rds | untransformed | attie_liver_lipids_normalized.rds | genome scan | liver_lipids_jax_norm_all_qtl.rds

library(stringr)
library(dplyr)
library(tidyr)

datapath <- "~/Documents/Research/attie_alan/DO/data/DerivedData/"

The pheno.descr files are different from what Karl used as dictionary, so will have to construct that manually. Phenotypes follow covariates in the mouse by trait tables, and the Mouse.ID is DO-nnn, as apposed to nn and nnn in Karl's work. All fixable but will take careful examination.

liverz <- readRDS(file.path(datapath, "molecule", 
                            "attie_liver_lipids_zscore_normalized.rds"))
metabz <- readRDS(file.path(datapath, "molecule",
                            "attie_liver_metabolites_zscore_normalized.rds"))
plasmaz <- readRDS(file.path(datapath, "molecule",
                            "attie_plasma_lipids_zscore_normalized.rds"))
cecumz <- readRDS(file.path(datapath, "molecule",
                            "attie_cecum_lipids_zscore_normalized.rds"))

Check agreement on covariates

covar <- readRDS(file.path(datapath, "covar.rds"))
if(!("DOwave" %in% colnames(covar))) {
  covar <- cbind(covar, DOwave = 1 + covar[,"DOwave2"] + 2 * covar[,"DOwave3"] +
    3 * covar[,"DOwave4"] + 4 * covar[,"DOwave5"])
}

Check sex.

sexl <- liverz$sex
names(sexl) <- as.numeric(str_replace(liverz$Mouse.ID, "DO-", ""))
tmp <- qtl2scan::get_common_ids(covar, sexl)
table(covar[tmp, "sex"], sexl[tmp])

check wave and generation

wave <- liverz$wave
names(wave) <- as.numeric(str_replace(liverz$Mouse.ID, "DO-", ""))
tmp <- qtl2scan::get_common_ids(covar, wave)
table(covar[tmp, "DOwave"], wave[tmp])

gen <- liverz$generation
names(gen) <- as.numeric(str_replace(liverz$Mouse.ID, "DO-", ""))
tmp <- qtl2scan::get_common_ids(covar, gen)
table(covar[tmp, "DOwave"], gen[tmp])

table(covar[,"DOwave"])
table(liverz$generation)

tmp2 <- covar[tmp, "DOwave"] < 3 & gen[tmp] == 19
tmp2 <- names(tmp2[tmp2])

covar[tmp2,]

liverz %>%
  filter(as.numeric(str_replace(Mouse.ID, "DO-", "")) %in% tmp2) %>%
  select(1:11)

Pull out relevant phenotypes and covariates and analyses stuff.

We need the following to tranform into Karl type files:

analyses_tbl: c("pheno","output","pheno_group","longname","pheno_type","model","transf","offset","winsorize", "sex","diet_days","Ins_per_islet","Glu_0min","Ins_0min","weight_sac","oGTT_weight", "DOwave2","DOwave3","DOwave4","DOwave5","Gcg_content")
peaks_tbl: c("pheno","longname","output","pheno_group","pheno_type","chr","lod","pos")
pheno_tbl: data frame with mouseID as rownames, pheno as colnames
covar_mx: matrix with mouseID as rownmaes, covar as colnames

Details to be added here. Need to modify covar, save as data frame, and also modify analyses_tbl. Need to coordinate change to data frame in particular with use of DOwave as covariate.

Need to extract covariates, compare that Mouse.ID, sex, wave, chrM, chrY agree across all sets. Then need to give Batch unique names and combine.

Need to extract phenos and put in form to be reused.

Need to get analyses info right.

Need to go to CSV files to get peaks.

analyses table

analyses_tbl: c("pheno","output","pheno_group","longname","pheno_type","model","transf","offset","winsorize", "sex","diet_days","Ins_per_islet","Glu_0min","Ins_0min","weight_sac","oGTT_weight", "DOwave2","DOwave3","DOwave4","DOwave5","Gcg_content")

Covariates matrix and Analyses table

Strategy is to make covar a data frame (convert from current format as numeric matrix) and use model.matrix. We will add columns for Batch (note names above) and for chrM and chrY, which might be useful in the future. Also add coat_color. All other covariates are ignored here.

Here, covariates are indicated in liverd$covar_list as colon-separated names, but all are sex:generation:Batch. Batch is distinct (I think) for each type of molecule. generation is DO generation, which should agree with DOwave (1=17, 2=18, 3=19, 4=21, 5=2_). Turns out generation has 5 discrepancies. We will use DOwave

The covariate values are with phenotypes as first several columns and some extra columns for mitochondria and chr Y (chrM, chrY).

liverz %>%
  count(chrM, chrY) %>%
  spread(chrM, n)

Check if all equal across small molecules

Comparisons are made below between liver lipids and the other three analytes.

all.equal(liverz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY),
          plasmaz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY))

table(liverz$Batch,plasmaz$Batch)

all.equal(liverz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY),
          metabz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY))

matchM <- qtl2scan::get_common_ids(liverz$Mouse.ID, metabz$Mouse.ID)
matchM2 <- which(diff(match(metabz$Mouse.ID, matchM)) != 1)
matchM2 <- sort(unique(c(matchM2 - 1, matchM2, matchM2 + 1)))
tmp <- rbind(metabz$Mouse.ID[matchM2], liverz$Mouse.ID[matchM2])
dimnames(tmp) <- list(c("metab","liver"), matchM2)
tmp

matchM2 <- match(matchM, metabz$Mouse.ID)
all.equal(liverz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY),
          metabz[matchM2,] %>% select(Mouse.ID,coat_color,Batch,chrM,chrY))

table(liverz$Batch,as.numeric(str_replace(metabz[matchM2,"Batch"], "Batch", "")))

Liver and metab differ in Batch for r sum(!(liverz$Batch == as.numeric(str_replace(metabz[matchM2,"Batch"], "Batch", "")))) mice. This is a different mouse: r liverz$Mouse.ID[which(!(liverz$Batch == as.numeric(str_replace(metabz[matchM2,"Batch"], "Batch", ""))))].

Cecum has 3 fewer mice than the other assays. Batch is substantially different for cecum.

all.equal(liverz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY),
          cecumz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY))

matchC <- qtl2scan::get_common_ids(liverz$Mouse.ID, cecumz$Mouse.ID)
matchC2 <- match(matchC, liverz$Mouse.ID)
all.equal(liverz[matchC2,] %>% select(Mouse.ID,coat_color,Batch,chrM,chrY),
          cecumz %>% select(Mouse.ID,coat_color,Batch,chrM,chrY))

table(liverz[matchC2,"Batch"],cecumz$Batch)

Liver and cecum differ in Batch for r sum(!(liverz[matchC2,"Batch"] == cecumz$Batch)) mice.

Conclusions:

Liver and plasma have different Batch for r sum(!(liverz$Batch == plasmaz$Batch)) mice.
Metab has Mouse.ID DO-191 in the wrong place. Batch is coded as Batchnn rather than nn, and differs for mouse r liverz$Mouse.ID[which(!(liverz$Batch == as.numeric(str_replace(metabz[matchM2,"Batch"], "Batch", ""))))].
Cecum has 3 fewer mice than the other assays. Cecum Batch differs for r sum(!(liverz[matchC2,"Batch"] == cecumz$Batch)) mice.

Need to add generation and Batch, but there will be four Batch (at least): Batch_liver_lipid, Batch_plasma_lipids, Batch_cecum_lipids, Batch_liver_metabolites.

analyses_tbl info

Now need to create analyses_tbl object here. First have to get peaks

Phenotypes

Phenotypes are in files with names like attie_liver_lipids_zscore_normalized.rds. That is, they are Z-scores of Jax-normalized data. These files have phenotypes prepended by covariates and some extra columns for mitochondria and chr Y (chrM, chrY).

Approach here is to match Mouse.ID and pick up phenotypes. These are large files (1Gb).

More detailed look at liver metab

library(ggplot2)
metabn <- readRDS(file.path(datapath, "molecule",
                            "attie_liver_metabolites_zscore_uwisc_normalized.rds"))

dat <- data.frame(x = unlist(metabn[104,-(1:13)]), 
                  y = unlist(metabz[104,-(1:13)]))

ggplot(dat, aes(x,y)) + geom_point()

dat <- data.frame(x = unlist(metabn[161,-(1:13)]), 
                  y = unlist(metabz[106,-(1:13)]))

ggplot(dat, aes(x,y)) + geom_point()

dat <- data.frame(x = unlist(metabn[106,-(1:13)]), 
                  y = unlist(metabz[106,-(1:13)]))

ggplot(dat, aes(x,y)) + geom_point()

dat <- data.frame(x = unlist(metabn[161,-(1:13)]), 
                  y = unlist(metabz[161,-(1:13)]))

ggplot(dat, aes(x,y)) + geom_point()

byandell/DOread documentation built on May 13, 2019, 9:26 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

byandell/DOread
Read DO data

In byandell/DOread: Read DO data

Check agreement on covariates

Pull out relevant phenotypes and covariates and analyses stuff.

analyses table

Covariates matrix and Analyses table

Check if all equal across small molecules

analyses_tbl info

Phenotypes

More detailed look at liver metab

R Package Documentation

Browse R Packages

We want your feedback!

byandell/DOread Read DO data

In byandell/DOread: Read DO data

Check agreement on covariates

Pull out relevant phenotypes and covariates and analyses stuff.

analyses table

Covariates matrix and Analyses table

Check if all equal across small molecules

analyses_tbl info

Phenotypes

More detailed look at liver metab

R Package Documentation

Browse R Packages

We want your feedback!

byandell/DOread
Read DO data