This simple tutorial will guide you through a typical analysis workflow with NMFEM pipeline. At the end of this tutorial the reader should be able to do unsupervised clustering and important gene calling using non-negative matrix factorization, and important gene module calling using the spin-glass based algorithm FEM. Throughout the tutorial we will be using the mouse embryonic lung epithelial cell dataset [@treutlein2014reconstructing].
Let's begin by loading the NMFEM
package:
library(NMFEM)
and loading the dataset attached in the package:
data(fpkm)
the unsupervised clustering and important genes calling is nicely packed in a single function. If
you are running this on a multi-core computer, feel free to set a higher n_threads_
to take
advantage of parallelism. Depending on your hardware this might take while so grab a cup of coffee
while waiting for the results. :)
nmf_results <- nmf_subpopulation(fpkm, n_threads_ = 30)
The nmf_results
contains various dataframes and ggplot objects for different aspect of the analysis.
Let's take a look at a few important ones. We begin by looking at the clustering result:
predict(nmf_results$nmf_result)
Then we can ask "what are the genes that provided the evidence for the separation of these two clusters?"
nmf_results$gene_info
One way to understand the connection between the called clusters and the called important genes is that these two clusters express two distinct expression patterns. That is, some genes are expressed highly only in one cluster while some other genes are expressed highly only in the other cluster.
Let's see what these two expression patterns are:
nmf_results$coef_line_plot
We can also see the D-score distribution of these two patterns. This plot also gives you an idea the relative abundance of genes for each pattern.
nmf_results$d_score_frequency_plot
There are also other interesting plots that allow various insights into the clusters, important genes, and expression patterns. We encourage the reader to explore by looking into its struture (it's just a list):
str(nmf_results, max.level=1)
By the way, all the ggplot objects in the list have dataframe counter-parts for easy access of the underlying
data. For example, coef_line_dat
has the data for coef_line_plot
:
nmf_results$coef_line_dat
Again, everything you need is nicely packed in the function spinglass_procedure
:
module_results <- spinglass_procedure(fpkm, phe, leading_genes, mppi, 'mouse', n_threads_ = 30)
We can view the module graphs by:
module_results$graph_plot
A summary containing important information about these modules can be found using:
module_results$final_tb
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.