load_dataset_10Xdata: load_dataset_10Xdata
In celaref: Single-cell RNAseq cell cluster labelling by reference

Description Usage Arguments Details Value See Also Examples

View source: R/loading_helper_functions.r

Convenience function to create a SummarizedExperiment object (dataset_se) from a the output of 10X cell ranger pipeline run.

1
2
3

load_dataset_10Xdata(dataset_path, dataset_genome, clustering_set,
  gene_id_cols_10X = c("ensembl_ID", "GeneSymbol"),
  id_to_use = gene_id_cols_10X[1])

`dataset_path`	Path to the directory of 10X data, as generated by the cellRanger pipeline (versions 2.1.0 and 2.0.1). The directory should have subdirecotires analysis, filtered_gene_bc_matrices and raw_gene_bc_matrices (only the first 2 are read).
`dataset_genome`	The genome that the reads were aligned against, e.g. GRCh38. Check for this as a directory name under the filtered_gene_bc_matrices subdirectory if unsure.
`clustering_set`	The 10X cellRanger pipeline produces several different cluster definitions per dataset. Specify which one to use e.g. kmeans_10_clusters Find them as directory names under analysis/clustering/
`gene_id_cols_10X`	Vector of the names of the columns in the gene description file (filtered_gene_bc_matrices/GRCh38/genes.csv). The first element of this will become the ID. Default = c("ensembl_ID","GeneSymbol")
`id_to_use`	Column from gene_id_cols_10X that defines the gene identifier to use as 'ID' in the returned SummarisedExperiment object. Many-to-one relationships betwen the assumed unique first element of gene_id_cols_10X and id_to_use will be handled gracefully by `convert_se_gene_ids`. Defaults to first element of gene_id_cols_10X

This function makes a SummarizedExperiment object in a form that should work for celaref functions. Specifically, that means it will have an 'ID' feild for genes (view with rowData(dataset_se)), and both 'cell_sample' and 'group' feild for cells (view with colData(dataset_se)). See parameters for detail. Additionally, the counts will be an integer matrix (not a sparse matrix), and the group feild (but not cell_sample or ID) will be a factor.

The clustering information can be read from whichever cluster is specified, usually there will be several choices.

This funciton is designed to work with output of version 2.0.1 of the cellRanger pipeline, may not work with others (will not work for 1.x).

A SummarisedExperiment object containing the count data, cell info and gene info.

SummarizedExperiment For general doco on the SummarizedExperiment objects.

convert_se_gene_ids describes method for converting IDs.

Other Data loading functions: contrast_each_group_to_the_rest_for_norm_ma_with_limma, load_se_from_tables

example_10X_dir <- system.file("extdata", "sim_cr_dataset", package = "celaref")
dataset_se <- load_dataset_10Xdata(example_10X_dir, dataset_genome="GRCh38", 
    clustering_set="kmeans_4_clusters", gene_id_cols_10X=c("gene")) 

## Not run: 
dataset_se <- load_dataset_10Xdata('~/path/to/data/10X_pbmc4k', 
    dataset_genome="GRCh38", 
    clustering_set="kmeans_7_clusters") 

## End(Not run)