predict_opt: Search for the optimal parameters for SINTER

Description Usage Arguments Value Examples

View source: R/optimize_functions.R

Description

This function is used for searching for the optimal parameters used in SINTER. The goal is find parameters that maximize the average of p-values from the neighbor_test function.

Usage

1
2
3
4
5
6
predict_opt(atac_data, expr_data, DNase_train, RNA_train,
  num_predictor = c(25, 25, 30), cluster_scale = c(10, 20, 50),
  k_range = c(20:29), sigma_range = c(0.01, 1), k_in = 20,
  sigma_in = 0.1, dim = 3, dist_scale_in = 10, subsample = FALSE,
  MNN_opt = TRUE, fast = FALSE, MNN_ref = "scATAC", tol_er = 0.001,
  ncore = 10, seed = 12345)

Arguments

atac_data

scATAC-seq data for matching.

expr_data

scRNA-seq data for matching.

DNase_train

ENCODE cluster features from DNase-seq data for building the regression model.

RNA_train

Gene expression from ENCODE RNA-seq data for building the regression model.

num_predictor

Searching space for number of predictors used in the regression model.

cluster_scale

Searching space for the scale to determine the number of gene clusters.

k_range

Searching space for k, the number of mutual nearest neighbor in MNN if flag MNN_opt==TRUE.

sigma_range

Searching space for sigma, the bandwidth of the Gaussian smoothing kernel used to compute the correction vector if flag MNN_opt==TRUE.

k_in

Setting K, the number of mutual nearest neighbor in MNN if flag MNN_opt!=TRUE.

sigma_in

Setting sigma, the bandwidth of the Gaussian smoothing kernel used to compute the correction vector if flag MNN_opt!=TRUE.

dim

Number of dimension used for matching the single cells. For example, the number of principal components.

dist_scale_in

Scale used to define the radius of the region for testing.

subsample

A percentage value to determine whether the paramter searching should be done in a subset of cells instead of using all cells. Set subsample=FALSE to use all cells.

MNN_opt

A flag to determine whether the parameters search should be performed for MNN.

fast

A flag indicates whether or not to use a fast neighbor_test.

MNN_ref

A flag to determine which data type is used as reference in MNN. Select from "scATAC" and "scRNA".

tol_er

The desired accuracy in function optimize.

ncore

Number of CPU cores used for parallel processing. Use ncore = 1 to run the function without parallel processing.

seed

The seed used for subsampling if subsample!=FALSE.

Value

num_predictor_opt

The optimal value for number of predictors.

cluster_scale_opt

The optimal value for cluster scale.

k_opt

The optimal value for k.

sigma_opt

The optimal value for sigma.

max_obj

The average p-value based on the optimal parameters.

Examples

1
2
3
4
5
## Not run: 
result_opt <- predict_opt(atac_data,expr_data,DNase_train,RNA_train,num_predictor=c(25,25,30),cluster_scale=c(10,20,50),k_range=c(20:29),sigma_range=c(0.01,1),
k_in=20,sigma_in=0.1,dim=3,dist_scale_in=10,subsample=FALSE,MNN_opt=TRUE,MNN_ref="scATAC",tol_er=0.001,ncore=10,seed=12345)

## End(Not run)

WeiqiangZhou/SINTER documentation built on Sept. 11, 2019, 8:03 a.m.