smfishHmrf.hmrfem.multi.it.min: Perform HMRF on multivariate normal distributions. Accepts...
In smfishHmrf: Hidden Markov Random Field for Spatial Transcriptomic Data

Description Usage Arguments Data preprocessing Betas Variations References Examples

This function performs HMRF \insertCiteZhu2018smfishHmrf for multi variate normal distributions. It takes minimum required inputs (inputs being file names). There are a couple of files required:

a file containing expression matrix
a file containing cell neighborhood matrix
a file containing node (or cell) color. This is used for updating cells during HMRF iterations.

HMRF needs users to specify the initializations of parameters (mu and sigma). It is recommended to use the kmeans centroids as initializations (specified by kk parameter). Note: kmeans should be run prior to this function.

smfishHmrf.hmrfem.multi.it.min(
  mem_file,
  nei_file,
  block_file,
  kk,
  par_k,
  name = "test",
  output_dir = ".",
  tolerance = 1e-05,
  beta = 0,
  beta_increment = 1,
  beta_num_iter = 10
)

`mem_file`	expression file. The expression file should be a space-separated file. The rows are genes. The columns are cells. There is no header row. The first column is a gene index (ranges from 1 to the number of genes). Note the first column is not gene name. See section Data preprocessing for which form of expression works best.
`nei_file`	file containing cell neighborhood matrix. This should be a space-separated file. The rows are cells. The columns are neighbors. There is no header row. The first column is the cell index (1 to number of cells). Each row lists the indices of neighbor cells. The dimension of the cell neighborhood matrix is (num_cell, max_num_neighbors). If a cell does not have enough neighbors, the remaining entries of that row is padded with -1. The R package Giotto http://spatialgiotto.com \insertCiteDries701680smfishHmrf contains a number of functions for generating the cell neighborhood network.
`block_file`	file containing cell colors (which determines cell update order). The order of updating the state probabilities of each cell can matter the result. Cells (or nodes) and their immediate neighbors are not updated at the same time. This is akin to the vertex coloring problem. This file contains the color of each cell such that no two neighbor cells have the same color. The file is 2-column, space-separated. Column 1 is cell ID, and column 2 is the cell color (integer starting at 1). The python utility get_vertex_color.py https://bitbucket.org/qzhudfci/smfishhmrf-py/src/master/get_vertex_color.py (requires smfishHmrf-py package https://pypi.org/project/smfishHmrf/) can generate this file.
`kk`	kmeans results (object returned by kmeans). Kmeans (one of functions smfishHmrf.generate.centroid.it or smfishHmrf.generate.centroid) should be run before this function.
`par_k`	number of clusters
`name`	name for this run (eg test)
`output_dir`	output directory
`tolerance`	tolerance
`beta, beta_increment, beta_num_iter`	3 values specifying the range of betas to try: the initial beta, the beta increment, and the number of betas. Beta is the smoothness parameter. Example: `beta`=0, `beta_increment`=2, `beta_num_iter`=6 means to try betas: 0, 2, 4, 6, 8, 10. See section Betas for more information.

It assumes that the expression values follow a multivariate gaussian distribution. We generally recommend using log2 transformed counts further normalized by z-scores (in both x- and y- dimensions). Double z-scoring this way helps to remove the inherent bias of zscoring just one dimension (as the results might present a bias towards cell counts).

Beta is the smoothness parameter in HMRF. The higher the beta, the more the HMRF borrows information from the neighbors. This function runs HMRF across a range of betas. To decide which beta range, here are some guideline:

if the number of genes is from 10 to 50, the recommended range is 0 to 10 at beta increment of 0.5.
if the number of genes is below 50, the recommended range is 0 to 15 at beta increment of 1.
if the number of genes is between 50 to 100, the range is 0 to 50 at beta increment of 2.
if the number of genes is between 100 and 500, the range is 0 to 100 at beta increment of 5.

Within the range of betas, we recommend selecting the best beta by the Bayes information criterion. This requires first performing randomization of spatial positions to generate the null distribution of log-likelihood scores for randomly distributed cells for the same range of betas. Then find the beta where the difference between the observed and the null log-likelihood is maximized.

smfishHmrf.hmrfem.multi.it.min (this function): supports multiple betas; supports file names as inputs. Recommended.
smfishHmrf.hmrfem.multi.it: supports multiple betas; supports R data structures as inputs.
smfishHmrf.hmrfem.multi: supports a single beta; supports R data structures as inputs.

\insertRef

Zhu2018smfishHmrf

\insertRef

Eng2019smfishHmrf

\insertRef

Dries701680smfishHmrf

mem_file = system.file("extdata", "ftest.expression.txt", package="smfishHmrf")
nei_file = system.file("extdata", "ftest.adjacency.txt", package="smfishHmrf")
block_file = system.file("extdata", "ftest.blocks.txt", package="smfishHmrf")
par_k = 9
name = "test"
output_dir = tempdir()
    
## Not run: 
kk = smfishHmrf.generate.centroid.it(mem_file, par_k, par_seed=100, 
nstart=100, name=name, output_dir=output_dir)

## End(Not run)

# alternatively, if you already have run kmeans before, you can load it directly
kmeans_results = system.file("extdata", package="smfishHmrf")
kk = smfishHmrf.generate.centroid.use.exist(name=name, input_dir=kmeans_results, par_k)

smfishHmrf.hmrfem.multi.it.min(mem_file, nei_file, block_file, kk, par_k, 
name=name, output_dir=output_dir, tolerance=1e-5, 
beta=28, beta_increment=2, beta_num_iter=1)
    
## Not run: 
# alternatively, to test a larger set of beta's
smfishHmrf.hmrfem.multi.it.min(mem_file, nei_file, block_file, kk, par_k,
name=name, output_dir=output_dir, tolerance=1e-5, 
beta=0, beta_increment=2, beta_num_iter=20)

## End(Not run)