R/data.R

#' Single-cell RNA-seq allele counts for the Marinov et al dataset
#'
#' A dataset containing alternative and reference allele read counts per cell
#' and heterozygous variant, derived from a single-cell RNA-seq dataset of the
#' lymphoblastoid cell-line of HapMap individual NA12878.
#' 
#' Allele counts were generated by alignment of the RNA-seq data to the haplo-
#' genomes of NA12878 and subsequently running samtools mpileup 
#' using variants called as heterozygous within the DNA-seq data of the
#' individual. Variants were filtered on being within RefSeq genes, in the dbSNP
#' database and successfully phased (via transmission from the parental genome
#' data). Variants were further filtered using the allele count data to not
#' monoallelically express the same allele across cells (see
#' \code{\link{filter_homovars}}) and on having imbalanced allelic expression in
#' at least 3 cells (see \code{\link{filter_var_gt}}. Features were filtered on
#' having at least two such variants (see \code{\link{filter_feat_nminvar}}).
#' For further details on the generation of the allele count data see the
#' Supplemental Data in Edsgard et al, scphaser: Haplotype Inference Using
#' Single-Cell RNA-Seq Data, Bioinformatics, 2016.
#' 
#' @format An acset with four elements, featdata, refcount, altcount and
#' phenodata, see \code{\link{new_acset}} for a description of these elements.
#' The acset contains data for 2809 variants and 28 single cells.
#' 
#' @source \itemize{
#' \item RNA-seq fastq files \url{ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP018/SRP018838/}
#' \item{RNA-seq meta-info}{\url{ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE44nnn/GSE44618}}
#' \item{Haplo-genomes} {\url{http://sv.gersteinlab.org/NA12878_diploid}}
#' \item{DNA-seq variants} {\url{http://sv.gersteinlab.org/NA12878_diploid/NA12878_diploid_2012_dec16/CEUTrio.HiSeq.WGS.b37.bestPractices.phased.hg19.vcf.gz}}
#' }
#' @usage data(marinov)
"marinov"

#' Single-cell RNA-seq allele counts for a mouse-hybrid dataset
#'
#' A dataset containing alternative and reference allele read counts per cell
#' and heterozygous variant, derived from a single-cell RNA-seq dataset of 
#' fibroblast and liver cells from crossed CAST/EiJ x C57BL/6J mouse strains.
#' The dataset has been subsetted to 300 genes to restrict its size.
#' 
#' Allele counts were generated by alignment of the RNA-seq data to each of the
#' genomes of the two mouse strains and subsequently running samtools mpileup 
#' using variants that were homozygous within each strain and differed
#' between the strain-genomes. Variants were filtered on being within RefSeq
#' genes. Variants were further filtered using the allele count data to not
#' monoallelically express the same allele across cells (see
#' \code{\link{filter_homovars}}) and on having imbalanced allelic expression in
#' at least 3 cells (see \code{\link{filter_var_gt}}. Features were filtered on
#' having at least two such variants (see \code{\link{filter_feat_nminvar}}).
#' For additional filters and further details on the generation of the allele
#' count data see the Supplemental Data in Edsgard et al, scphaser: Haplotype
#' Inference Using Single-Cell RNA-Seq Data, Bioinformatics, 2016.
#' 
#' @format An acset with four elements, featdata, refcount, altcount and
#' phenodata, see \code{\link{new_acset}} for a description of these elements.
#' The acset contains data for 3313 variants and 336 single cells.
#' 
#' @source RNA-seq data can be found at \url{http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75659}
#' 
"mousehybrid"

Try the scphaser package in your browser

Any scripts or data that you put into this service are public.

scphaser documentation built on May 29, 2017, 3:49 p.m.