Description Usage Arguments Details Value Examples
View source: R/extraction_1KG.R
Get data for a genomic region from a remote VCF file.
1 |
chrom |
a chromosome name (1-22,X) without "chr" |
start |
a positive integer indicating the start of a genomic region |
end |
a positive integer indicating the end of a genomic region |
pop |
the name of a 1000 Genomes population (AMR,AFR,ASN,EUR,...) |
Returns a list with three dataframes for individuals, SNPs, and genotypes.
Currently, this is hard-coded to access 1000 Genomes phase3 data hosted by Brian Browning (author of BEAGLE):
http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a/
This implementation discards multi-allelic markers that have a "," in the ALT column.
The pop
can be any of: ACB, ASW, BEB, CDX, CEU, CHB, CHS, CLM, ESN,
FIN, GBR, GIH, GWD, IBS, ITU, JPT, KHV, LWK, MSL, MXL, PEL, PJL, PUR, STU,
TSI, YRI. It can also be any super-population: AFR, AMR, EAS, EUR, SAS.
Find more details here: http://www.1000genomes.org/faq/which-populations-are-part-your-study
A list with three dataframes:
A dataframe with information about individuals: Family.ID, Individual.ID, Paternal.ID, Maternal.ID, Gender, Population, Relationship, Siblings, Second.Order, Third.Order, Other.Comments, SuperPopulation
First 8 columns of the VCF file: CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO
Columns 10 onward of the VCF file. All genotypes are converted to 0s and 1s representing REF and ALT alleles. This dataframe has two columns for each individual.
1 2 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.