smfishHmrf.hmrfem.multi.it.min: Perform HMRF on multivariate normal distributions. Accepts...

Description Usage Arguments Data preprocessing Betas Variations References Examples

View source: R/smfishHmrf.hmrfem.R

Description

This function performs HMRF \insertCiteZhu2018smfishHmrf for multi variate normal distributions. It takes minimum required inputs (inputs being file names). There are a couple of files required:

  1. a file containing expression matrix

  2. a file containing cell neighborhood matrix

  3. a file containing node (or cell) color. This is used for updating cells during HMRF iterations.

HMRF needs users to specify the initializations of parameters (mu and sigma). It is recommended to use the kmeans centroids as initializations (specified by kk parameter). Note: kmeans should be run prior to this function.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
smfishHmrf.hmrfem.multi.it.min(
  mem_file,
  nei_file,
  block_file,
  kk,
  par_k,
  name = "test",
  output_dir = ".",
  tolerance = 1e-05,
  beta = 0,
  beta_increment = 1,
  beta_num_iter = 10
)

Arguments

mem_file

expression file. The expression file should be a space-separated file. The rows are genes. The columns are cells. There is no header row. The first column is a gene index (ranges from 1 to the number of genes). Note the first column is not gene name. See section Data preprocessing for which form of expression works best.

nei_file

file containing cell neighborhood matrix. This should be a space-separated file. The rows are cells. The columns are neighbors. There is no header row. The first column is the cell index (1 to number of cells). Each row lists the indices of neighbor cells. The dimension of the cell neighborhood matrix is (num_cell, max_num_neighbors). If a cell does not have enough neighbors, the remaining entries of that row is padded with -1. The R package Giotto http://spatialgiotto.com \insertCiteDries701680smfishHmrf contains a number of functions for generating the cell neighborhood network.

block_file

file containing cell colors (which determines cell update order). The order of updating the state probabilities of each cell can matter the result. Cells (or nodes) and their immediate neighbors are not updated at the same time. This is akin to the vertex coloring problem. This file contains the color of each cell such that no two neighbor cells have the same color. The file is 2-column, space-separated. Column 1 is cell ID, and column 2 is the cell color (integer starting at 1). The python utility get_vertex_color.py https://bitbucket.org/qzhudfci/smfishhmrf-py/src/master/get_vertex_color.py (requires smfishHmrf-py package https://pypi.org/project/smfishHmrf/) can generate this file.

kk

kmeans results (object returned by kmeans). Kmeans (one of functions smfishHmrf.generate.centroid.it or smfishHmrf.generate.centroid) should be run before this function.

par_k

number of clusters

name

name for this run (eg test)

output_dir

output directory

tolerance

tolerance

beta, beta_increment, beta_num_iter

3 values specifying the range of betas to try: the initial beta, the beta increment, and the number of betas. Beta is the smoothness parameter. Example: beta=0, beta_increment=2, beta_num_iter=6 means to try betas: 0, 2, 4, 6, 8, 10. See section Betas for more information.

Data preprocessing

It assumes that the expression values follow a multivariate gaussian distribution. We generally recommend using log2 transformed counts further normalized by z-scores (in both x- and y- dimensions). Double z-scoring this way helps to remove the inherent bias of zscoring just one dimension (as the results might present a bias towards cell counts).

Betas

Beta is the smoothness parameter in HMRF. The higher the beta, the more the HMRF borrows information from the neighbors. This function runs HMRF across a range of betas. To decide which beta range, here are some guideline:

Within the range of betas, we recommend selecting the best beta by the Bayes information criterion. This requires first performing randomization of spatial positions to generate the null distribution of log-likelihood scores for randomly distributed cells for the same range of betas. Then find the beta where the difference between the observed and the null log-likelihood is maximized.

Variations

References

\insertRef

Zhu2018smfishHmrf

\insertRef

Eng2019smfishHmrf

\insertRef

Dries701680smfishHmrf

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
mem_file = system.file("extdata", "ftest.expression.txt", package="smfishHmrf")
nei_file = system.file("extdata", "ftest.adjacency.txt", package="smfishHmrf")
block_file = system.file("extdata", "ftest.blocks.txt", package="smfishHmrf")
par_k = 9
name = "test"
output_dir = tempdir()
    
## Not run: 
kk = smfishHmrf.generate.centroid.it(mem_file, par_k, par_seed=100, 
nstart=100, name=name, output_dir=output_dir)

## End(Not run)

# alternatively, if you already have run kmeans before, you can load it directly
kmeans_results = system.file("extdata", package="smfishHmrf")
kk = smfishHmrf.generate.centroid.use.exist(name=name, input_dir=kmeans_results, par_k)

smfishHmrf.hmrfem.multi.it.min(mem_file, nei_file, block_file, kk, par_k, 
name=name, output_dir=output_dir, tolerance=1e-5, 
beta=28, beta_increment=2, beta_num_iter=1)
    
## Not run: 
# alternatively, to test a larger set of beta's
smfishHmrf.hmrfem.multi.it.min(mem_file, nei_file, block_file, kk, par_k,
name=name, output_dir=output_dir, tolerance=1e-5, 
beta=0, beta_increment=2, beta_num_iter=20)

## End(Not run)
 

smfishHmrf documentation built on Jan. 13, 2021, 12:54 p.m.