standardise_ctd | R Documentation |
Convert a CellTypeDataset into standardized format
Description
This function will take a CTD,
drop all genes without 1:1 orthologs with the
output_species
("human" by default),
convert the remaining genes to gene symbols,
assign names to each level,
and convert all matrices to sparse matrices and/or DelayedArray
.
Usage
standardise_ctd(
ctd,
dataset,
input_species = NULL,
output_species = "human",
sctSpecies_origin = input_species,
non121_strategy = "drop_both_species",
method = "homologene",
force_new_quantiles = TRUE,
force_standardise = FALSE,
remove_unlabeled_clusters = FALSE,
numberOfBins = 40,
keep_annot = TRUE,
keep_plots = TRUE,
as_sparse = TRUE,
as_DelayedArray = FALSE,
rename_columns = TRUE,
make_columns_unique = FALSE,
verbose = TRUE,
...
)
Arguments
ctd |
Input CellTypeData.
|
dataset |
CellTypeData. name.
|
input_species |
Which species the gene names in exp come from.
See list_species for all available species.
|
output_species |
Which species' genes names to convert exp to.
See list_species for all available species.
|
sctSpecies_origin |
Species that the sct_data
originally came from, regardless of its current gene format
(e.g. it was previously converted from mouse to human gene orthologs).
This is used for computing an appropriate backgrund.
|
non121_strategy |
How to handle genes that don't have
1:1 mappings between input_species :output_species .
Options include:
"drop_both_species" or "dbs" or 1 :
Drop genes that have duplicate
mappings in either the input_species or output_species
(DEFAULT).
"drop_input_species" or "dis" or 2 :
Only drop genes that have duplicate
mappings in the input_species .
"drop_output_species" or "dos" or 3 :
Only drop genes that have duplicate
mappings in the output_species .
"keep_both_species" or "kbs" or 4 :
Keep all genes regardless of whether
they have duplicate mappings in either species.
"keep_popular" or "kp" or 5 :
Return only the most "popular" interspecies ortholog mappings.
This procedure tends to yield a greater number of returned genes
but at the cost of many of them not being true biological 1:1 orthologs.
"sum","mean","median","min" or "max" :
When gene_df is a matrix and gene_output="rownames" ,
these options will aggregate many-to-one gene mappings
(input_species -to-output_species )
after dropping any duplicate genes in the output_species .
|
method |
R package to use for gene mapping:
"gprofiler" : Slower but more species and genes.
"homologene" : Faster but fewer species and genes.
"babelgene" : Faster but fewer species and genes.
Also gives consensus scores for each gene mapping based on a
several different data sources.
|
force_new_quantiles |
By default, quantile computation is
skipped if they have already been computed.
Set =TRUE to override this and generate new quantiles.
|
force_standardise |
If ctd has already been standardised, whether
to rerun standardisation anyway (Default: FALSE ).
|
remove_unlabeled_clusters |
Remove any samples that have
numeric column names.
|
numberOfBins |
Number of non-zero quantile bins.
|
keep_annot |
Keep the column annotation data if provided.
|
keep_plots |
Keep the dendrograms if provided.
|
as_sparse |
Convert to sparse matrix.
|
as_DelayedArray |
Convert to DelayedArray .
|
rename_columns |
Remove replace_chars from column names.
|
make_columns_unique |
Rename each columns with the prefix
dataset.species.celltype .
|
verbose |
Print messages.
Set verbose=2 if you want to print all messages
from internal functions as well.
|
... |
Arguments passed on to orthogene::convert_orthologs
gene_df Data object containing the genes
(see gene_input for options on how
the genes can be stored within the object).
Can be one of the following formats:
matrix : A sparse or dense matrix.
data.frame : A data.frame ,
data.table . or tibble .
codelist : A list or character vector .
Genes, transcripts, proteins, SNPs, or genomic ranges
can be provided in any format
(HGNC, Ensembl, RefSeq, UniProt, etc.) and will be
automatically converted to gene symbols unless
specified otherwise with the ... arguments.
Note: If you set method="homologene" , you
must either supply genes in gene symbol format (e.g. "Sox2")
OR set standardise_genes=TRUE .
gene_input Which aspect of gene_df to
get gene names from:
"rownames" : From row names of data.frame/matrix.
"colnames" : From column names of data.frame/matrix.
<column name> : From a column in gene_df ,
e.g. "gene_names" .
gene_output How to return genes.
Options include:
"rownames" : As row names of gene_df .
"colnames" : As column names of gene_df .
"columns" : As new columns "input_gene", "ortholog_gene"
(and "input_gene_standard" if standardise_genes=TRUE )
in gene_df .
"dict" : As a dictionary (named list) where the names
are input_gene and the values are ortholog_gene.
"dict_rev" : As a reversed dictionary (named list)
where the names are ortholog_gene and the values are input_gene.
standardise_genes If TRUE AND
gene_output="columns" , a new column "input_gene_standard"
will be added to gene_df containing standardised HGNC symbols
identified by gorth.
drop_nonorths Drop genes that don't have an ortholog
in the output_species .
agg_fun Aggregation function passed to
aggregate_mapped_genes.
Set to NULL to skip aggregation step (default).
mthreshold Maximum number of ortholog names per gene to show.
Passed to gorth.
Only used when method="gprofiler" (DEFAULT : Inf ).
sort_rows Sort gene_df rows alphanumerically.
gene_map A data.frame that maps the current gene names
to new gene names.
This function's behaviour will adapt to different situations as follows:
gene_map=<data.frame> : When a data.frame containing the
gene key:value columns
(specified by input_col and output_col , respectively)
is provided, this will be used to perform aggregation/expansion.
gene_map=NULL and input_species!=output_species :
A gene_map is automatically generated by
map_orthologs to perform inter-species
gene aggregation/expansion.
gene_map=NULL and input_species==output_species :
A gene_map is automatically generated by
map_genes to perform within-species
gene gene symbol standardization and aggregation/expansion.
input_col Column name within gene_map with gene names matching
the row names of X .
output_col Column name within gene_map with gene names
that you wish you map the row names of X onto.
|
Value
Standardised CellTypeDataset.
Examples
ctd <- ewceData::ctd()
ctd_std <- EWCE::standardise_ctd(
ctd = ctd,
input_species = "mouse",
dataset = "Zeisel2016"
)