knitr::opts_chunk$set(collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE, fig.align = 'center', results = 'asis', fig.show = 'hold', fig.width = 7, fig.height = 5)
breastCancerNKI
package. We'll covertranscripTools
.library(transcripTools, quietly = TRUE) library(Biobase, quietly = TRUE) data(nki)
assayData
, where the expression matrix is found, phenoData
where the phenotypic data is found, and featureData
where the feature data is found. We can see we have 337 samples and 24481 features to investigate. For the sake of this vignette we'll separate the nki expressionSet into individual pieces, though this is kind of against the fundamental principles of why an expressionSet is useful (which you can read up on here).xpr <- exprs(nki) pheno <- pData(nki) feats <- fData(nki)
n
most variable reduce the size/complexity of our subsequent investigation. To achieve this we use the mostVar()
function, which will take our expression matrix (xpr
) and the number of features we want to extract, and return us a subsetted matrix containing the n
features we specified.xpr_500 <- mostVar(xpr, 500) dim(xpr_500)
mds_arg <- mdsArrange(xpr_500, isAlreadyScaled = TRUE) #the nki data is scaled #around 0, not all data is like this mds_mrg <- merge(mds_arg, pheno, by.x = "ids", by.y = "samplename") head(mds_mrg)
mds_arg <- mdsArrange(xpr_500, isAlreadyScaled = TRUE) #the nki data is scaled #around 0, not all data is like this mds_mrg <- merge(mds_arg, pheno, by.x = "ids", by.y = "samplename") knitr::kable(head(mds_mrg))
ggplot
and specify our colours etc. using mds_mrg
's column headings. Unsurprisingly, when we colour by er
(oestrogen receptor status) we see that the 500 most variable genes capture and separate our samples by this factor.library(ggplot2, quietly = TRUE) ggplot(mds_mrg, aes(x = x, y = y, colour = as.factor(er))) + geom_point() + labs(x = "PC1", y = "PC2", colour = "ER-status", title = "NKI MDS")
mdsArrange
, though heatmapArrange
has a couple of extra arguments that allow for hierarchical clustering and scaling. Again, here's a little info on heatmaps and hierarchical clusteringhm_arg <- heatmapArrange(xpr_500, cluster_row = TRUE, cluster_col = TRUE, scale = TRUE, by_row = TRUE)
heatmapArrange
handily reminds us of the colour function to hand to ggplot to get a green-red (bad! not good for colour blindness!) heatmap.hm_mrg <- merge(hm_arg, pheno, by.x = "col_var", by.y = "samplename") head(hm_mrg)
hm_mrg <- merge(hm_arg, pheno, by.x = "col_var", by.y = "samplename") knitr::kable(head(hm_mrg))
p_hmap <- ggplot(hm_mrg, aes(x = col_var, y = row_var, fill = value)) + geom_tile() + scale_fill_gradient2(high = "#d73027", mid = "black", low = "#1a9850") + labs(title = "NKI heatmap") + theme_void()
plot_grid()
.p_bar <- ggplot(hm_mrg, aes(x = col_var, y = "ER", fill = as.factor(er))) + geom_tile() + theme_void() # cowplot annoying overrides the default theme so we combine # calling it with resetting the default theme library(cowplot); theme_set(theme_grey()) plot_grid(p_hmap, p_bar, ncol = 1, rel_heights = c(15, 1), align = 'v')
idReplace()
that will convert and replace the ids in the expression matrix, or idConvert()
that will simply convert a vector of ids. Here we'll use idConvert()
, the the arguments and usage are highly similar between the two functions.mv_10 <- row.names(xpr_500)[1:10] idConvert(mv_10, format_in = "refseq_mrna", format_out = "hgnc_symbol")
mv_10 <- row.names(xpr_500)[1:10] knitr::kable(idConvert(mv_10, format_in = "refseq_mrna", format_out = "hgnc_symbol"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.