standardise_ctd: Convert a CellTypeDataset into standardized format
In NathanSkene/EWCE: Expression Weighted Celltype Enrichment

standardise_ctd

R Documentation

Convert a CellTypeDataset into standardized format

Description

This function will take a CTD, drop all genes without 1:1 orthologs with the output_species ("human" by default), convert the remaining genes to gene symbols, assign names to each level, and convert all matrices to sparse matrices and/or DelayedArray.

Usage

standardise_ctd(
  ctd,
  dataset,
  input_species = NULL,
  output_species = "human",
  sctSpecies_origin = input_species,
  non121_strategy = "drop_both_species",
  method = "homologene",
  force_new_quantiles = TRUE,
  force_standardise = FALSE,
  remove_unlabeled_clusters = FALSE,
  numberOfBins = 40,
  keep_annot = TRUE,
  keep_plots = TRUE,
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  rename_columns = TRUE,
  make_columns_unique = FALSE,
  verbose = TRUE,
  ...
)

Arguments

`ctd`	Input CellTypeData.
`dataset`	CellTypeData. name.
`input_species`	Which species the gene names in `exp` come from. See list_species for all available species.
`output_species`	Which species' genes names to convert `exp` to. See list_species for all available species.
`sctSpecies_origin`	Species that the `sct_data` originally came from, regardless of its current gene format (e.g. it was previously converted from mouse to human gene orthologs). This is used for computing an appropriate backgrund.
`non121_strategy`	How to handle genes that don't have 1:1 mappings between `input_species`:`output_species`. Options include: `"drop_both_species" or "dbs" or 1` : Drop genes that have duplicate mappings in either the `input_species` or `output_species` (DEFAULT). `"drop_input_species" or "dis" or 2` : Only drop genes that have duplicate mappings in the `input_species`. `"drop_output_species" or "dos" or 3` : Only drop genes that have duplicate mappings in the `output_species`. `"keep_both_species" or "kbs" or 4` : Keep all genes regardless of whether they have duplicate mappings in either species. `"keep_popular" or "kp" or 5` : Return only the most "popular" interspecies ortholog mappings. This procedure tends to yield a greater number of returned genes but at the cost of many of them not being true biological 1:1 orthologs. `"sum","mean","median","min" or "max"` : When `gene_df` is a matrix and `gene_output="rownames"`, these options will aggregate many-to-one gene mappings (`input_species`-to-`output_species`) after dropping any duplicate genes in the `output_species`.
`method`	R package to use for gene mapping: `"gprofiler"` : Slower but more species and genes. `"homologene"` : Faster but fewer species and genes. `"babelgene"` : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.
`force_new_quantiles`	By default, quantile computation is skipped if they have already been computed. Set `=TRUE` to override this and generate new quantiles.
`force_standardise`	If `ctd` has already been standardised, whether to rerun standardisation anyway (Default: `FALSE`).
`remove_unlabeled_clusters`	Remove any samples that have numeric column names.
`numberOfBins`	Number of non-zero quantile bins.
`keep_annot`	Keep the column annotation data if provided.
`keep_plots`	Keep the dendrograms if provided.
`as_sparse`	Convert to sparse matrix.
`as_DelayedArray`	Convert to `DelayedArray`.
`rename_columns`	Remove `replace_chars` from column names.
`make_columns_unique`	Rename each columns with the prefix `dataset.species.celltype`.
`verbose`	Print messages. Set `verbose=2` if you want to print all messages from internal functions as well.
`...`	Arguments passed on to `orthogene::convert_orthologs` `gene_df` Data object containing the genes (see `gene_input` for options on how the genes can be stored within the object). Can be one of the following formats: `matrix` : A sparse or dense matrix. `data.frame` : A `data.frame`, `data.table`. or `tibble`. codelist : A `list` or character `vector`. Genes, transcripts, proteins, SNPs, or genomic ranges can be provided in any format (HGNC, Ensembl, RefSeq, UniProt, etc.) and will be automatically converted to gene symbols unless specified otherwise with the `...` arguments. Note: If you set `method="homologene"`, you must either supply genes in gene symbol format (e.g. "Sox2") OR set `standardise_genes=TRUE`. `gene_input` Which aspect of `gene_df` to get gene names from: `"rownames"` : From row names of data.frame/matrix. `"colnames"` : From column names of data.frame/matrix. `<column name>` : From a column in `gene_df`, e.g. `"gene_names"`. `gene_output` How to return genes. Options include: `"rownames"` : As row names of `gene_df`. `"colnames"` : As column names of `gene_df`. `"columns"` : As new columns "input_gene", "ortholog_gene" (and "input_gene_standard" if `standardise_genes=TRUE`) in `gene_df`. `"dict"` : As a dictionary (named list) where the names are input_gene and the values are ortholog_gene. `"dict_rev"` : As a reversed dictionary (named list) where the names are ortholog_gene and the values are input_gene. `standardise_genes` If `TRUE` AND `gene_output="columns"`, a new column "input_gene_standard" will be added to `gene_df` containing standardised HGNC symbols identified by gorth. `drop_nonorths` Drop genes that don't have an ortholog in the `output_species`. `agg_fun` Aggregation function passed to aggregate_mapped_genes. Set to `NULL` to skip aggregation step (default). `mthreshold` Maximum number of ortholog names per gene to show. Passed to gorth. Only used when `method="gprofiler"` (DEFAULT : `Inf`). `sort_rows` Sort `gene_df` rows alphanumerically. `gene_map` A data.frame that maps the current gene names to new gene names. This function's behaviour will adapt to different situations as follows: `gene_map=<data.frame>` : When a data.frame containing the gene key:value columns (specified by `input_col` and `output_col`, respectively) is provided, this will be used to perform aggregation/expansion. `gene_map=NULL` and `input_species!=output_species` : A `gene_map` is automatically generated by map_orthologs to perform inter-species gene aggregation/expansion. `gene_map=NULL` and `input_species==output_species` : A `gene_map` is automatically generated by map_genes to perform within-species gene gene symbol standardization and aggregation/expansion. `input_col` Column name within `gene_map` with gene names matching the row names of `X`. `output_col` Column name within `gene_map` with gene names that you wish you map the row names of `X` onto.

Value

Standardised CellTypeDataset.

Examples

ctd <- ewceData::ctd()
ctd_std <- EWCE::standardise_ctd(
    ctd = ctd,
    input_species = "mouse",
    dataset = "Zeisel2016"
)

NathanSkene/EWCE documentation built on Feb. 17, 2025, 7:52 a.m.

NathanSkene/EWCE index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

NathanSkene/EWCE
Expression Weighted Celltype Enrichment

standardise_ctd: Convert a CellTypeDataset into standardized format
In NathanSkene/EWCE: Expression Weighted Celltype Enrichment

Convert a CellTypeDataset into standardized format

Description

Usage

Arguments

Value

Examples

Related to standardise_ctd in NathanSkene/EWCE...

R Package Documentation

Browse R Packages

We want your feedback!

NathanSkene/EWCE Expression Weighted Celltype Enrichment

standardise_ctd: Convert a CellTypeDataset into standardized format In NathanSkene/EWCE: Expression Weighted Celltype Enrichment

Convert a CellTypeDataset into standardized format

Description

Usage

Arguments

Value

Examples

Related to standardise_ctd in NathanSkene/EWCE...

R Package Documentation

Browse R Packages

We want your feedback!

NathanSkene/EWCE
Expression Weighted Celltype Enrichment

standardise_ctd: Convert a CellTypeDataset into standardized format
In NathanSkene/EWCE: Expression Weighted Celltype Enrichment