README.md

bcbioSingleCell

Travis CI Codecov Project Status: Active - The project has reached a stable, usable state and is being actively developed.

R package for bcbio single-cell RNA-seq analysis.

Installation

Bioconductor method

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("devtools")
biocLite("remotes")
biocLite("GenomeInfoDbData")
biocLite(
    "hbc/bcbioSingleCell",
    dependencies = c("Depends", "Imports", "Suggests")
)

Load bcbio run

library(bcbioSingleCell)
bcb <- bcbioSingleCell(
    uploadDir = "bcbio_indrop/final",
    interestingGroups = c("genotype", "treatment"),
    sampleMetadataFile = "sample_metadata.csv",
    organism = "Homo sapiens",
    ensemblRelease = 90L
)
# Back up all data inside bcbioSingleCell object
flat <- flatFiles(bcb)
saveData(bcb, flat, dir="data")

This will return a bcbioSingleCell object, which is an extension of the Bioconductor SingleCellExperiment container class.

Parameters:

Consult help("bcbioSingleCell", "bcbioSingleCell") for additional documentation.

Sample metadata examples

FASTQ files with samples multiplexed by index barcode

This is our current recommended method for analyzing an inDrops dataset. The sample index barcodes are multiplexed per FASTQ set. For Illumina sequencing data, the raw binary base call (BCL) data must be converted into FASTQs (split into R1-R4 files) using bcl2fastq.

The inDrops library version is automatically detected by bcbio, but ensure that the sample index sequences provided match the library version when attempting to create a bcbioSingleCell object. A current list of inDrops v3 index barcodes is available from seqcloud.

Consult the bcbio documentation for more information on how to configure an inDrops run prior to loading into R with the bcbioSingleCell() function.

| description | index | sequence | sampleName | |-------------|-------|----------|------------| | indrops1 | 17 | GGAGGTAA | sample1 | | indrops1 | 18 | CATAACTG | sample2 | | indrops2 | 12 | GCGTAAGA | sample3 | | indrops2 | 13 | CTATTAAG | sample4 | | indrops2 | 14 | AAGGCTAT | sample5 | | indrops2 | 15 | GAGCCTTA | sample6 | | indrops2 | 16 | TTATGCGA | sample7 |

FASTQ files demultiplexed per sample

This is our current method for handling 10X Genomics Cell Ranger output (using readCellRanger()) and Illumina SureCell sample data.

| description | genotype | |-------------|----------| | sample1 | wildtype | | sample2 | knockout | | sample3 | wildtype | | sample4 | knockout |

Troubleshooting

Maximal number of DLLs reached

Error: package or namespace load failed for 'bcbioSingleCell' in dyn.load(file, DLLpath = DLLpath, ...):
  maximal number of DLLs reached...

Depending on your operating system, you may encounter this error about hitting the DLL limit in R. This issue is becoming more common as RNA-seq analysis packages grow increasingly complex. Luckily, we can configure R to increase the DLL limit. Append this line to your ~/.Renviron file:

R_MAX_NUM_DLLS=150

For more information on this issue, consult help("dyn.load") in the R documentation. The number of loaded DLLs in an R session can be obtained with getLoadedDLLs().

References

The papers and software cited in our workflows are available as a shared library on Paperpile.



WeiSong-bio/roryk-bcbioSinglecell documentation built on July 6, 2019, 12:03 a.m.