predict_opt: Search for the optimal parameters for SINTER
In WeiqiangZhou/SINTER: Single-cell genomic data integration

Description Usage Arguments Value Examples

This function is used for searching for the optimal parameters used in SINTER. The goal is find parameters that maximize the average of p-values from the neighbor_test function.

predict_opt(atac_data, expr_data, DNase_train, RNA_train,
  num_predictor = c(25, 25, 30), cluster_scale = c(10, 20, 50),
  k_range = c(20:29), sigma_range = c(0.01, 1), k_in = 20,
  sigma_in = 0.1, dim = 3, dist_scale_in = 10, subsample = FALSE,
  MNN_opt = TRUE, fast = FALSE, MNN_ref = "scATAC", tol_er = 0.001,
  ncore = 10, seed = 12345)

`atac_data`	scATAC-seq data for matching.
`expr_data`	scRNA-seq data for matching.
`DNase_train`	ENCODE cluster features from DNase-seq data for building the regression model.
`RNA_train`	Gene expression from ENCODE RNA-seq data for building the regression model.
`num_predictor`	Searching space for number of predictors used in the regression model.
`cluster_scale`	Searching space for the scale to determine the number of gene clusters.
`k_range`	Searching space for k, the number of mutual nearest neighbor in MNN if flag MNN_opt==TRUE.
`sigma_range`	Searching space for sigma, the bandwidth of the Gaussian smoothing kernel used to compute the correction vector if flag MNN_opt==TRUE.
`k_in`	Setting K, the number of mutual nearest neighbor in MNN if flag MNN_opt!=TRUE.
`sigma_in`	Setting sigma, the bandwidth of the Gaussian smoothing kernel used to compute the correction vector if flag MNN_opt!=TRUE.
`dim`	Number of dimension used for matching the single cells. For example, the number of principal components.
`dist_scale_in`	Scale used to define the radius of the region for testing.
`subsample`	A percentage value to determine whether the paramter searching should be done in a subset of cells instead of using all cells. Set subsample=FALSE to use all cells.
`MNN_opt`	A flag to determine whether the parameters search should be performed for MNN.
`fast`	A flag indicates whether or not to use a fast neighbor_test.
`MNN_ref`	A flag to determine which data type is used as reference in MNN. Select from "scATAC" and "scRNA".
`tol_er`	The desired accuracy in function optimize.
`ncore`	Number of CPU cores used for parallel processing. Use ncore = 1 to run the function without parallel processing.
`seed`	The seed used for subsampling if subsample!=FALSE.

`num_predictor_opt`	The optimal value for number of predictors.
`cluster_scale_opt`	The optimal value for cluster scale.
`k_opt`	The optimal value for k.
`sigma_opt`	The optimal value for sigma.
`max_obj`	The average p-value based on the optimal parameters.

## Not run: 
result_opt <- predict_opt(atac_data,expr_data,DNase_train,RNA_train,num_predictor=c(25,25,30),cluster_scale=c(10,20,50),k_range=c(20:29),sigma_range=c(0.01,1),
k_in=20,sigma_in=0.1,dim=3,dist_scale_in=10,subsample=FALSE,MNN_opt=TRUE,MNN_ref="scATAC",tol_er=0.001,ncore=10,seed=12345)

## End(Not run)