mousehybrid: Single-cell RNA-seq allele counts for a mouse-hybrid dataset

A dataset containing alternative and reference allele read counts per cell and heterozygous variant, derived from a single-cell RNA-seq dataset of fibroblast and liver cells from crossed CAST/EiJ x C57BL/6J mouse strains. The dataset has been subsetted to 300 genes to restrict its size.




An acset with four elements, featdata, refcount, altcount and phenodata, see new_acset for a description of these elements. The acset contains data for 3313 variants and 336 single cells.


Allele counts were generated by alignment of the RNA-seq data to each of the genomes of the two mouse strains and subsequently running samtools mpileup using variants that were homozygous within each strain and differed between the strain-genomes. Variants were filtered on being within RefSeq genes. Variants were further filtered using the allele count data to not monoallelically express the same allele across cells (see filter_homovars) and on having imbalanced allelic expression in at least 3 cells (see filter_var_gt. Features were filtered on having at least two such variants (see filter_feat_nminvar). For additional filters and further details on the generation of the allele count data see the Supplemental Data in Edsgard et al, scphaser: Haplotype Inference Using Single-Cell RNA-Seq Data, Bioinformatics, 2016.


RNA-seq data can be found at

