This data set gives sample read counts in 1000 genomic ranges for 16
exome sequencing samples from the PUR population of the 1000 Genomes
Project, along with the GC-content in the ranges. For instructions on
how to prepare read count and covariate data, please see the example
code in the man pages for
The genomic ranges are generated from small portion of the CCDS regions of
chromosome 1 (hg19). The CCDS regions are subdivided evenly into
ranges around 100bp using the
with default settings. Only ranges with positive counts across samples
are retained. These regions were downloaded as a BED file from the
UCSC Genome Browser
(http://genome.ucsc.edu/cgi-bin/hgGateway). The mapping files
for the exome sequencing data and descriptions of the experiments are
available at the 1000 Genomes Project website
(http://www.1000genomes.org/data). The directories used are
listed in the file
1000Genomes_files.txt in the
The column names are the sample names from the 1000 Genomes Project. Library format is paired-end reads and sample counts reflect both sequenced reads counted in their respective genomic ranges.
A RangedData object.
1000 Genomes Project and Consensus Coding Sequence Project
1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061-1073 (2010). http://dx.doi.org/10.1038/nature09534.
1000 Genomes Project: Release of phase 1 exome alignments http://www.1000genomes.org/announcements/release-phase-1-exome-alignments-2011-07-19
Pruitt, K. D. et al. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome research 19, 1316-1323 (2009). http://dx.doi.org/10.1101/gr.080531.108.