gene_frequency_fisher: Compute Fisher's exact test on gene frequencies.

View source: R/analysis-functions.R

gene_frequency_fisherR Documentation

Compute Fisher's exact test on gene frequencies.

Description

[Experimental] Provided 2 data frames with calculations for CIS, via CIS_grubbs(), computes Fisher's exact test. Results can be plotted via fisher_scatterplot().

Usage

gene_frequency_fisher(
  cis_x,
  cis_y,
  min_is_per_gene = 3,
  gene_set_method = c("intersection", "union"),
  onco_db_file = "proto_oncogenes",
  tumor_suppressors_db_file = "tumor_suppressors",
  species = "human",
  known_onco = known_clinical_oncogenes(),
  suspicious_genes = clinical_relevant_suspicious_genes(),
  significance_threshold = 0.05,
  remove_unbalanced_0 = TRUE
)

Arguments

cis_x

A data frame obtained via CIS_grubbs()

cis_y

A data frame obtained via CIS_grubbs()

min_is_per_gene

Used for pre-filtering purposes. Genes with a number of distinct integration less than this number will be filtered out prior calculations. Single numeric or integer.

gene_set_method

One between "intersection" and "union". When merging the 2 data frames, intersection will perform an inner join operation, while union will perform a full join operation.

onco_db_file

Uniprot file for proto-oncogenes (see details). If different from default, should be supplied as a path to a file.

tumor_suppressors_db_file

Uniprot file for tumor-suppressor genes. If different from default, should be supplied as a path to a file.

species

One between "human", "mouse" and "all"

known_onco

Data frame with known oncogenes. See details.

suspicious_genes

Data frame with clinical relevant suspicious genes. See details.

significance_threshold

Significance threshold for the Fisher's test p-value

remove_unbalanced_0

Remove from the final output those pairs in which there are no IS for one group or the other and the number of IS of the non-missing group are less than the mean number of IS for that group

Details

Oncogene and tumor suppressor genes files

These files are included in the package for user convenience and are simply UniProt files with gene annotations for human and mouse. For more details on how this files were generated use the help ?tumor_suppressors, ?proto_oncogenes

Known oncogenes

The default values are included in this package and it can be accessed by doing:

known_clinical_oncogenes()

If the user wants to change this parameter the input data frame must preserve the column structure. The same goes for the suspicious_genes parameter (DOIReference column is optional):

clinical_relevant_suspicious_genes()

Value

A data frame

Required tags

The function will explicitly check for the presence of these tags:

  • gene_symbol

See Also

Other Analysis functions: CIS_grubbs(), HSC_population_size_estimate(), compute_abundance(), cumulative_is(), is_sharing(), iss_source(), sample_statistics(), top_integrations(), top_targeted_genes()

Examples

data("integration_matrices", package = "ISAnalytics")
data("association_file", package = "ISAnalytics")
aggreg <- aggregate_values_by_key(
    x = integration_matrices,
    association_file = association_file,
    value_cols = c("seqCount", "fragmentEstimate")
)
cis <- CIS_grubbs(aggreg, by = "SubjectID")
fisher <- gene_frequency_fisher(cis$cis$PT001, cis$cis$PT002,
    min_is_per_gene = 2
)
fisher

calabrialab/ISAnalytics documentation built on Dec. 10, 2024, 10:50 p.m.