gpatterns.filter_loci: Filter loci with low variance across samples

View source: R/cluster.R

gpatterns.filter_lociR Documentation

Filter loci with low variance across samples

Description

Clusters the loci with kmeans with a relatively high k, calculates the sd of each cluster centers, and then removes loci that are within clusters with low variance.

Usage

gpatterns.filter_loci(
  avgs,
  k = NULL,
  center_sd_thresh = NULL,
  min_loci_frac = 0.3,
  seed = NULL,
  tidy = TRUE,
  avg_col = "avg",
  plot_cluster_sd = FALSE,
  ret_clust_sd = FALSE
)

Arguments

avgs

'tidy' output of gpatterns.get_avg_meth. In order to filter using the regularized (imputed) values (output of gpatterns.impute), set avg_col to 'avg_reg'

k

number of clusters to divide to. if NULL - k would be chosen as number of loci divided by 150, in order to have ~100 loci per cluster.

center_sd_thresh

minimal sd of the cluster centers. if NULL - center_sd_thresh would be set by taking the sd of the cluster with the largest difference to the next cluster sd that still leaves min_loci_frac loci.

min_loci_frac

minimal fraction of loci to return in case center_sd_thesh == NULL.

seed

seed to use in TGL_kmeans_tidy

tidy

return tidy output

avg_col

column of average methylation in avgs.

plot_cluster_sd

plot the standard deviation of clusters.

ret_clust_sd

return a list with the filtered loci (under 'avgs'), and the cluster sd plot (under 'clust_sd_p')

Value

see ret_clust_sd. if tidy: avgs data frame without the filtered loci. if tidy == FALSE: intervals set with additional fields with regularized average methylation for each sample.


tanaylab/gpatterns documentation built on May 15, 2023, 6:23 p.m.