extract_matrix: Extract a matrix from a CellTypeDataset
In NathanSkene/EWCE: Expression Weighted Celltype Enrichment

extract_matrix

R Documentation

Extract a matrix from a CellTypeDataset

Description

Extracts a particular matrix (e.g., mean_exp, specificity) from a CellTypeDataset object.

Usage

extract_matrix(
  ctd,
  dataset,
  level = 1,
  input_species = NULL,
  output_species = "human",
  metric = "specificity",
  non121_strategy = "drop_both_species",
  method = "homologene",
  numberOfBins = 40,
  remove_unlabeled_clusters = FALSE,
  force_new_quantiles = FALSE,
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  rename_columns = TRUE,
  make_columns_unique = FALSE,
  verbose = TRUE,
  ...
)

Arguments

`ctd`	Input CellTypeData.
`dataset`	CellTypeData. name.
`level`	CTD level to extract from.
`input_species`	Which species the gene names in `exp` come from. See list_species for all available species.
`output_species`	Which species' genes names to convert `exp` to. See list_species for all available species.
`metric`	Name of the matrix to extract.
`non121_strategy`	How to handle genes that don't have 1:1 mappings between `input_species`:`output_species`. Options include: `"drop_both_species" or "dbs" or 1` : Drop genes that have duplicate mappings in either the `input_species` or `output_species` (DEFAULT). `"drop_input_species" or "dis" or 2` : Only drop genes that have duplicate mappings in the `input_species`. `"drop_output_species" or "dos" or 3` : Only drop genes that have duplicate mappings in the `output_species`. `"keep_both_species" or "kbs" or 4` : Keep all genes regardless of whether they have duplicate mappings in either species. `"keep_popular" or "kp" or 5` : Return only the most "popular" interspecies ortholog mappings. This procedure tends to yield a greater number of returned genes but at the cost of many of them not being true biological 1:1 orthologs. `"sum","mean","median","min" or "max"` : When `gene_df` is a matrix and `gene_output="rownames"`, these options will aggregate many-to-one gene mappings (`input_species`-to-`output_species`) after dropping any duplicate genes in the `output_species`.
`method`	R package to use for gene mapping: `"gprofiler"` : Slower but more species and genes. `"homologene"` : Faster but fewer species and genes. `"babelgene"` : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.
`numberOfBins`	Number of non-zero quantile bins.
`remove_unlabeled_clusters`	Remove any samples that have numeric column names.
`force_new_quantiles`	By default, quantile computation is skipped if they have already been computed. Set `=TRUE` to override this and generate new quantiles.
`as_sparse`	Convert to sparse matrix.
`as_DelayedArray`	Convert to `DelayedArray`.
`rename_columns`	Remove `replace_chars` from column names.
`make_columns_unique`	Rename each columns with the prefix `dataset.species.celltype`.
`verbose`	Print messages. Set `verbose=2` if you want to print all messages from internal functions as well.
`...`	Arguments passed on to `orthogene::convert_orthologs` `gene_df` Data object containing the genes (see `gene_input` for options on how the genes can be stored within the object). Can be one of the following formats: `matrix` : A sparse or dense matrix. `data.frame` : A `data.frame`, `data.table`. or `tibble`. codelist : A `list` or character `vector`. Genes, transcripts, proteins, SNPs, or genomic ranges can be provided in any format (HGNC, Ensembl, RefSeq, UniProt, etc.) and will be automatically converted to gene symbols unless specified otherwise with the `...` arguments. Note: If you set `method="homologene"`, you must either supply genes in gene symbol format (e.g. "Sox2") OR set `standardise_genes=TRUE`. `gene_input` Which aspect of `gene_df` to get gene names from: `"rownames"` : From row names of data.frame/matrix. `"colnames"` : From column names of data.frame/matrix. `<column name>` : From a column in `gene_df`, e.g. `"gene_names"`. `gene_output` How to return genes. Options include: `"rownames"` : As row names of `gene_df`. `"colnames"` : As column names of `gene_df`. `"columns"` : As new columns "input_gene", "ortholog_gene" (and "input_gene_standard" if `standardise_genes=TRUE`) in `gene_df`. `"dict"` : As a dictionary (named list) where the names are input_gene and the values are ortholog_gene. `"dict_rev"` : As a reversed dictionary (named list) where the names are ortholog_gene and the values are input_gene. `standardise_genes` If `TRUE` AND `gene_output="columns"`, a new column "input_gene_standard" will be added to `gene_df` containing standardised HGNC symbols identified by gorth. `drop_nonorths` Drop genes that don't have an ortholog in the `output_species`. `agg_fun` Aggregation function passed to aggregate_mapped_genes. Set to `NULL` to skip aggregation step (default). `mthreshold` Maximum number of ortholog names per gene to show. Passed to gorth. Only used when `method="gprofiler"` (DEFAULT : `Inf`). `sort_rows` Sort `gene_df` rows alphanumerically. `gene_map` A data.frame that maps the current gene names to new gene names. This function's behaviour will adapt to different situations as follows: `gene_map=<data.frame>` : When a data.frame containing the gene key:value columns (specified by `input_col` and `output_col`, respectively) is provided, this will be used to perform aggregation/expansion. `gene_map=NULL` and `input_species!=output_species` : A `gene_map` is automatically generated by map_orthologs to perform inter-species gene aggregation/expansion. `gene_map=NULL` and `input_species==output_species` : A `gene_map` is automatically generated by map_genes to perform within-species gene gene symbol standardization and aggregation/expansion. `input_col` Column name within `gene_map` with gene names matching the row names of `X`. `output_col` Column name within `gene_map` with gene names that you wish you map the row names of `X` onto.