screen_variant_mi: Mutual Information based feature screening of variants from a...
In c7rishi/hidgenclassifier: Functions for Bayesian hierarchical hidden genome classifier

screen_variant_mi

R Documentation

Mutual Information based feature screening of variants from a mutation annotation file

Description

Mutual Information based feature screening of variants from a mutation annotation file

Usage

screen_variant_mi(
  maf,
  variant_col = "variant",
  cancer_col = "cancer",
  sample_id_col = "sample",
  equal_cancer_prob_mi = TRUE,
  return_prob_mi = TRUE,
  mi_rank_thresh = 250,
  normalize_mi = FALSE,
  do_freq_screen = FALSE,
  thresh_freq_screen = 1/length(unique(maf[[sample_id_col]])),
  ...
)

variant_screen_mi(
  maf,
  variant_col = "variant",
  cancer_col = "cancer",
  sample_id_col = "sample",
  equal_cancer_prob_mi = TRUE,
  return_prob_mi = TRUE,
  mi_rank_thresh = 250,
  normalize_mi = FALSE,
  do_freq_screen = FALSE,
  thresh_freq_screen = 1/length(unique(maf[[sample_id_col]])),
  ...
)

Arguments

`maf`	mutation annotation file – a data frame-like object with at least three columns containing variant labels, sample IDs, and cancer sites associated with the sample IDs. NOTE: uniqueness of rows of maf is assumed.
`variant_col`	name of the column in `maf` containing variant labels.
`cancer_col`	name of the column in `maf` that corresponds to cancer sites for the tumor samples.
`sample_id_col`	name of the column in `maf` containing tumor sample IDs.
`equal_cancer_prob_mi`	logical. Should the marginal probabilities of cancer sites be assumed equal (i.e., uniform) while computing mutual information? If `FALSE`, the relative frequencies of cancer sites in maf are used. CAUTION: the (sample) relative frequencies of cancer sites in `maf` may not necessarily be good approximations of the truth.
`return_prob_mi`	logical. Should the computed mutual information and the cancer site specific probabilities for these screened variants be returned? Defaults to TRUE.
`mi_rank_thresh`	rank threshold for screening variants. The top variants with rank(MI_values) <= mi_rank_thresh is returned. Defaults to 250.
`normalize_mi`	logical. Should mutual information be normalized by product of square-roots of marginal Shannon entropies? Defaults to FALSE.
`do_freq_screen`	logical. Should an overall (relative) frequency-based screening be performed prior to MI based screening? This may reduce the computation load substantially for whole genome data where potentially tens of millions of variants are observed only once. Defaults to FALSE.
`thresh_freq_screen`	Threshold for overall pan-cancer relative frequency to use if a frequency-based screening is performed before mi based screening. Defaults to 1/n_sample where n_sample is the pan-cancer total number of tumors. Ignored if `do_freq_screen = FALSE`.
`...`	Unused.

Details

The function first estimates via relative frequencies the cancer site specific probabilities of encountering EACH variant in the maf file. Then using these estimated probabilities and the marginal probabilities of cancer sites, the (possibly normalized) mutual information between (a) the occurrence of a variant-"j" in randomly chosen tumor and (b) the cancer site of the associated tumor is computed for each variant-j in maf. These MIs are then ranked and the variant labels associated with with mi rank <= mi_rank_thresh are returned.

Value

a character vector listing the screened variant labels (sorted with the first one having the highest MI) with ranks <= mi_rank_thresh. Optionally, if return_prob_mi = TRUE, then a data table named prob_mi listing cancer site specific probabilities of ALL variants and the associated MIs are returned.

Examples

data("impact")
top_v <- screen_variant_mi(
  maf = impact,
  variant_col = "Variant",
  cancer_col = "CANCER_SITE",
  sample_id_col = "patient_id",
  mi_rank_thresh = 200,
  return_prob_mi = FALSE
)
top_v

c7rishi/hidgenclassifier documentation built on June 14, 2024, 11:10 a.m.

c7rishi/hidgenclassifier index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

c7rishi/hidgenclassifier
Functions for Bayesian hierarchical hidden genome classifier

screen_variant_mi: Mutual Information based feature screening of variants from a...
In c7rishi/hidgenclassifier: Functions for Bayesian hierarchical hidden genome classifier

Mutual Information based feature screening of variants from a mutation annotation file

Description

Usage

Arguments

Details

Value

Examples

Related to screen_variant_mi in c7rishi/hidgenclassifier...

R Package Documentation

Browse R Packages

We want your feedback!

c7rishi/hidgenclassifier Functions for Bayesian hierarchical hidden genome classifier

screen_variant_mi: Mutual Information based feature screening of variants from a... In c7rishi/hidgenclassifier: Functions for Bayesian hierarchical hidden genome classifier

Mutual Information based feature screening of variants from a mutation annotation file

Description

Usage

Arguments

Details

Value

Examples

Related to screen_variant_mi in c7rishi/hidgenclassifier...

R Package Documentation

Browse R Packages

We want your feedback!

c7rishi/hidgenclassifier
Functions for Bayesian hierarchical hidden genome classifier

screen_variant_mi: Mutual Information based feature screening of variants from a...
In c7rishi/hidgenclassifier: Functions for Bayesian hierarchical hidden genome classifier