nmf_subpopulation: Detect subpopulations using NMF

Description Usage Arguments Value Examples

Description

Detect subpopulations using NMF

Usage

1
2
3
4
nmf_subpopulation(expr_matrix_, n_subpop_ = 2, log_transformation_ = T,
  verbose_level_ = 1, n_threads_ = 1, nrun_ = 30, method_ = "brunet",
  top_genes_facet_title_font_size_ = 24, .options_ = sprintf("p%dv%d",
  n_threads_, verbose_level_ - 1), seed_ = 12345)

Arguments

expr_matrix_

The expression matrix, each row is a gene and each column is a sample. The row names should be gene symbols and the column names should be sample ids. Use the correct capticalization for gene symbols. For mouse genes,only the first letter is capitalized (e.g. Tp53); for human genes, all letters are capitalized (e.g. TP53). The values will be log-transformed by default, this can be changed using the log_transofrmation_ option.

n_subpop_

The number of expected subpopulations. Default: 2.

log_transformation_

Whether to log-transform the expression matrix. A pseudo count 1 will be added to each value before log transformation to avoid infinity. That is, expr_matrix_ <- log(expr_matrix_+1). Default: True.

verbose_level_

How much information is printed. 0 = quiet, 1 = normal, 2 = with debug info, 3 = with extra debug info. Default: 1.

n_threads_, nrun_, method_, .options_, seed_

Parameters for nmf(...). Default values are supplied.

method_

Interative method for NMF. See the documentation of NMF::nmf.

seed_

Random seed.

Value

A list with the following elements

nmf_result

The NMFfitX1 object returned by the nmf call.

gene_info

A data_frame detailing the bases, weighted bases, and D-score for each gene. The data frame is sorted by their D-scores

d_score_frequency_plot

Kernal distribution for the D-scores, separated by the coef components they represent.

ordered_sample_ids

The pseudo-order of samples calculated by sorting the differences of coef values for each sample.

coef_line_dat

Plotting data for coef_line_plot

coef_line_plot

A line plot showing the trend formed by the coef values, using the order in ordered_sample_ids.

path_dat

Plotting data for pca_path_plot and mds_path_plot.

pca_path_plot

Connecting the PCA plot of the expression matrix using the order in ordered_sample_ids. The thickness of the lines indicate the jump of differences of the coef values.

mds_path_plot

Same as pca_path_plot, except for MDS plot. The distance is "1 - Pearson Correlation".

top_genes_dat

Plotting data for top_genes_free_y_plot and top_genes_fixed.

top_genes_free_y_plot

Faceted bar-plot showing the expression levels of the top 100 genes, across all samples. Samples are sorted using the order in ordered_sample_ids.

top_genes_fixed

Sample as top_genes_free_y_plot, except all y-axes are fixed across all faceted.

Examples

1

lanagarmire/NMFEM documentation built on May 20, 2019, 7:34 p.m.