select_variables: Select variables based on mean, var, and dispersion

Description Usage Arguments Value Author(s) Examples

View source: R/dispersion.R

Description

This function calculates the mean, variance, dispersion, and z-score of normalized dispersion for each variable (row). Then, select variables that meet all of criteria given by filter.disp, filter.zdisp, filter.var, and filter.mean. It's highly recommended to investigate the distribution of these statistics, because making thresholds.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
select_variables(
  dat,
  meta = NULL,
  filter.zdisp = NULL,
  filter.disp = NULL,
  filter.var = NULL,
  filter.mean = NULL,
  n_bins = 10,
  verbose = FALSE,
  seed = NULL,
  ...
)

select_rows(
  dat,
  meta = NULL,
  filter.zdisp = NULL,
  filter.disp = NULL,
  filter.var = NULL,
  filter.mean = NULL,
  n_bins = 10,
  verbose = FALSE,
  seed = NULL,
  ...
)

Arguments

dat

a time-series data matrix with m biomarkers as rows, over n time points (columns).

meta

a data frame consisted of meta information about m variables. If given, this data frame will be thresholded accordingly.

filter.zdisp

a range of z-scores of dispersion to select the variables. Variables with z-scores of dispersion that are outside of this range will be excluded.

filter.disp

a range of dispersion values to select the variables. Variables with dispersions that are outside of this range will be excluded.

filter.var

a range of variances to select the variables. Variables with variances that are outside of this range will be excluded.

filter.mean

a range of means to select the variables. Variables with means that are outside of this range will be excluded.

verbose

a logical specifying to print the computational progress. By default, FALSE.

seed

a seed for the random number generator.

...

optional arguments.

data.only

a logical specifying to return only the new data matrix with selected variables. By default, TRUE.

Value

If data.only=TRUE, select_variables returns a new data matrix containing only the selected variables.

If data.only=FALSE, select_variables returns a list, consisted of a new data matrix and a data frame of calculated statistics for variables (see dispersion).

Author(s)

Neo Christopher Chung nchchung@gmail.com

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 
data(cys_optm_missing)
meta <- cys_optm_missing[,1:3]
optm <- log(cys_optm_missing[,4:9])
days <- as.numeric(colnames(optm_missing))

disp_optm <- dispersion(optm)
disp_optm <- cbind(meta, disp_optm)
# make a histogram of dispersion statistics
hist(disp_optm$disp, 100)
# make a histogram of z-score of normalized dispersion
hist(disp_optm$zdisp, 100)

select_variable(optm, filter.zdisp = c(-2,2))
# library(readr)
# write_excel_csv(disp_optm, file="~/coptm_dispersion.csv")

## End(Not run)

UCLA-BD2K/CV.Signature.TCP documentation built on May 15, 2020, 11:27 p.m.