top_targeted_genes: Top n targeted genes based on number of IS.

View source: R/analysis-functions.R

top_targeted_genesR Documentation

Top n targeted genes based on number of IS.

Description

[Experimental] Produces a summary of the number of integration events per gene, orders the table in decreasing order and slices the first n rows - either on all the data frame or by group.

Usage

top_targeted_genes(
  x,
  n = 20,
  key = c("SubjectID", "CellMarker", "Tissue", "TimePoint"),
  consider_chr = TRUE,
  consider_gene_strand = TRUE,
  as_df = TRUE
)

Arguments

x

An integration matrix - must be annotated

n

Number of rows to slice

key

If slice has to be performed for each group, the character vector of column names that identify the groups. If NULL considers the whole input data frame.

consider_chr

Logical, should the chromosome be taken into account? See details.

consider_gene_strand

Logical, should the gene strand be taken into account? See details.

as_df

If computation is performed by group, TRUE returns all groups merged in a single data frame with a column containing the group id. If FALSE returns a named list.

Details

Gene grouping

When producing a summary of IS by gene, there are different options that can be chosen. The argument consider_chr accounts for the fact that some genes (same gene symbol) may span more than one chromosome: if set to TRUE counts of IS will be separated for those genes that span 2 or more chromosomes - in other words they will be in 2 different rows of the output table. On the contrary, if the argument is set to FALSE, counts will be produced in a single row.

NOTE: the function counts DISTINCT integration events, which logically corresponds to a union of sets. Be aware of the fact that counts per group and counts with different arguments might be different: if for example counts are performed by considering chromosome and there is one gene symbol with 2 different counts, the sum of those 2 will likely not be equal to the count obtained by performing the calculations without considering the chromosome.

The same reasoning can be applied for the argument consider_gene_strand, that takes into account the strand of the gene.

Value

A data frame or a list of data frames

Required tags

The function will explicitly check for the presence of these tags:

  • chromosome

  • locus

  • gene_symbol

  • gene_strand

Note that the tags "gene_strand" and "chromosome" are explicitly required only if consider_chr = TRUE and/or consider_gene_strand = TRUE.

See Also

Other Analysis functions: CIS_grubbs(), HSC_population_size_estimate(), compute_abundance(), cumulative_is(), gene_frequency_fisher(), is_sharing(), iss_source(), sample_statistics(), top_integrations()

Examples

data("integration_matrices", package = "ISAnalytics")
top_targ <- top_targeted_genes(
    integration_matrices,
    key = NULL
)
top_targ

calabrialab/ISAnalytics documentation built on Dec. 10, 2024, 10:50 p.m.