convert_se_gene_ids: convert_se_gene_ids
In celaref: Single-cell RNAseq cell cluster labelling by reference

Description Usage Arguments Value See Also Examples

View source: R/loading_helper_functions.r

Change the gene IDs in in the supplied datatset_se object to some other id already present in the gene info (as seen with rowData())

1	convert_se_gene_ids(dataset_se, new_id, eval_col, find_max = TRUE)

`dataset_se`	Summarised experiment object containing count data. Also requires 'ID' and 'group' to be set within the cell information (see `colData()`)
`new_id`	A column within the feature information (view `colData(dataset_se)`)) of the dataset_se, which will become the new ID column. Non-uniqueness of this column is handled gracefully! Any NAs will be dropped.
`eval_col`	Which column to use to break ties of duplicate new_id. Must be a column within the feature information (view `colData(dataset_se)`)) of the dataset_se. Total reads per gene feature is a good choice.
`find_max`	If false, this will choose the minimal eval_col instead of max. Default = TRUE

A modified dataset_se - ID will now be new_id, and unique. It will have fewer genes if old ID to new ID was not a 1:1 mapping. The selected genes will be according to the eval col max (or min). should pick the alphabetical first on ties, but could change.

SummarizedExperiment For general doco on the SummarizedExperiment objects.

load_se_from_files For reading data from flat files (not 10X cellRanger output)

# The demo dataset doesn't have other names, so make some up 
# (don't do this)
dataset_se <- demo_ref_se
rowData(dataset_se)$dummyname <- toupper(rowData(dataset_se)$ID)

# If not already present, define a column to evaluate, 
# typically total reads/gene.
rowData(dataset_se)$total_count <- rowSums(assay(dataset_se))

dataset_se <- convert_se_gene_ids(dataset_se, new_id='dummyname', eval_col='total_count')