This is a R Markdown Notebook for utilzing functions of the genotypeQC R library. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).
This library required R version >= 3.3.2.
#install.packages('devtools') library(devtools) install_github('xiaolicbs/samplyzer') library(samplyzer)
setwd('/Users/xiaoli/samplyzer/vignettes/data') bamQcMetr = read.csv('bamQcMetr.tsv', sep = '\t') vcfQcMetr = read.csv('vcfQcMetr.tsv', sep = '\t') annotations = read.csv('sampleAnnotations.tsv', sep = '\t') samplepc = read.csv('samplePCs.tsv', sep = '\t') refpc = read.csv('refPCs.tsv', sep ='\t') stratify = c('ANCESTRY', 'SeqTech')
sds = sampleDataset(bamQcMetr = bamQcMetr, vcfQcMetr = vcfQcMetr, annotations = annotations, primaryID = 'SampleID')
A Sample Data Set object contains several attributes, including a data frame, which contains all the data for samples and several other attributes that contains metadata, such as qcMetrics, bamQcMetr, vcfQcMetr and PrimaryID. To check attributes of your current SDS:
attributes(sds)
Function setAttr
can be used to modify attributes of SampleDataset. For example, to add genotype PCs to the SampleDataset:
sds = setAttr(sds, attributes = 'PC', data = samplepc, primaryID = 'SampleID')
Function 'getAttr' was designed to access attributes of a SampleDataset, for example, you can access sample annotations via:
getAttr(sds, 'PC', showID = T)
To know how many attributes are there, use:
attributes(sds) print(sds)
sds = inferAncestry(sds, trainSet = refpc[,c('PC1', 'PC2', 'PC3')], knownAncestry = refpc$group ) getAttr(sds, 'inferredAncestry', showID = T)
sds = calZscore(sds, strat = c('inferredAncestry', 'SeqTech'), qcMetrics = sds$vcfQcMetr)
cutoffs = data.frame( qcMetrics = c('Percent_lt_30bp_frags', 'Contamination_Estimation', 'Percent_Chimeric_Reads'), value = c(0.01, 0.02, 0.05), greater = c(T, T, T), stringsAsFactors = F ) sds = flagSamples(sds, cutoffs = cutoffs, zscore = 4) getAttr(sds, 'flaggedReason', showID = T) # show samples flagged
A S3 class function save
is used to save SampleDataset to different formats.
prefix = 'examples' save(sds, RDS = paste(prefix, 'RDS', sep = '.'), tsv = paste(prefix, 'tsv', sep = '.'), xls = paste(prefix, 'xls', sep = '.'))
plt = sampleQcPlot(sds, annotation = 'inferredAncestry', qcMetrics = 'nHets', geom = 'scatter', main = 'nHet by inferredAncestry', outliers = 'Sample-001', show = T) nhet = sampleQcPlot(sds, annotation = 'inferredAncestry', qcMetrics = 'nHets', geom = 'scatter', main = 'nHet by inferredAncestry', outliers = 'Sample-040', show = T)
plots = PCplots(sds)
grobList = sampleQcPlot(sds, qcMetrics = sds$bamQcMetr, annotation = 'LCSETMax', geom = 'scatter', ncols = 3, outliers = 'Sample-001') ggplot2::ggsave('QcPlot.pdf', grobList, width = 12, height = 6)
A shiny app was designed to interactively explore QC metrics
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.