marinov: Single-cell RNA-seq allele counts for the Marinov et al

A dataset containing alternative and reference allele read counts per cell and heterozygous variant, derived from a single-cell RNA-seq dataset of the lymphoblastoid cell-line of HapMap individual NA12878.




An acset with four elements, featdata, refcount, altcount and phenodata, see new_acset for a description of these elements. The acset contains data for 2809 variants and 28 single cells.


Allele counts were generated by alignment of the RNA-seq data to the haplo- genomes of NA12878 and subsequently running samtools mpileup using variants called as heterozygous within the DNA-seq data of the individual. Variants were filtered on being within RefSeq genes, in the dbSNP database and successfully phased (via transmission from the parental genome data). Variants were further filtered using the allele count data to not monoallelically express the same allele across cells (see filter_homovars) and on having imbalanced allelic expression in at least 3 cells (see filter_var_gt. Features were filtered on having at least two such variants (see filter_feat_nminvar). For further details on the generation of the allele count data see the Supplemental Data in Edsgard et al, scphaser: Haplotype Inference Using Single-Cell RNA-Seq Data, Bioinformatics, 2016.


