load_dataset_10Xdata: load_dataset_10Xdata

Description Usage Arguments Details Value See Also Examples

View source: R/loading_helper_functions.r

Description

Convenience function to create a SummarizedExperiment object (dataset_se) from a the output of 10X cell ranger pipeline run.

Usage

1
2
3
load_dataset_10Xdata(dataset_path, dataset_genome, clustering_set,
  gene_id_cols_10X = c("ensembl_ID", "GeneSymbol"),
  id_to_use = gene_id_cols_10X[1])

Arguments

dataset_path

Path to the directory of 10X data, as generated by the cellRanger pipeline (versions 2.1.0 and 2.0.1). The directory should have subdirecotires analysis, filtered_gene_bc_matrices and raw_gene_bc_matrices (only the first 2 are read).

dataset_genome

The genome that the reads were aligned against, e.g. GRCh38. Check for this as a directory name under the filtered_gene_bc_matrices subdirectory if unsure.

clustering_set

The 10X cellRanger pipeline produces several different cluster definitions per dataset. Specify which one to use e.g. kmeans_10_clusters Find them as directory names under analysis/clustering/

gene_id_cols_10X

Vector of the names of the columns in the gene description file (filtered_gene_bc_matrices/GRCh38/genes.csv). The first element of this will become the ID. Default = c("ensembl_ID","GeneSymbol")

id_to_use

Column from gene_id_cols_10X that defines the gene identifier to use as 'ID' in the returned SummarisedExperiment object. Many-to-one relationships betwen the assumed unique first element of gene_id_cols_10X and id_to_use will be handled gracefully by convert_se_gene_ids. Defaults to first element of gene_id_cols_10X

Details

This function makes a SummarizedExperiment object in a form that should work for celaref functions. Specifically, that means it will have an 'ID' feild for genes (view with rowData(dataset_se)), and both 'cell_sample' and 'group' feild for cells (view with colData(dataset_se)). See parameters for detail. Additionally, the counts will be an integer matrix (not a sparse matrix), and the group feild (but not cell_sample or ID) will be a factor.

The clustering information can be read from whichever cluster is specified, usually there will be several choices.

This funciton is designed to work with output of version 2.0.1 of the cellRanger pipeline, may not work with others (will not work for 1.x).

Value

A SummarisedExperiment object containing the count data, cell info and gene info.

See Also

SummarizedExperiment For general doco on the SummarizedExperiment objects.

convert_se_gene_ids describes method for converting IDs.

Other Data loading functions: contrast_each_group_to_the_rest_for_norm_ma_with_limma, load_se_from_tables

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
example_10X_dir <- system.file("extdata", "sim_cr_dataset", package = "celaref")
dataset_se <- load_dataset_10Xdata(example_10X_dir, dataset_genome="GRCh38", 
    clustering_set="kmeans_4_clusters", gene_id_cols_10X=c("gene")) 

## Not run: 
dataset_se <- load_dataset_10Xdata('~/path/to/data/10X_pbmc4k', 
    dataset_genome="GRCh38", 
    clustering_set="kmeans_7_clusters") 

## End(Not run) 

celaref documentation built on Nov. 8, 2020, 5:03 p.m.