ADMIXTOOLS 2 can read genotype data in three formats.
PACKEDANCESTRYMAP and EIGENSTRAT are described here
In all three formats a dataset consists of three files: one file for the genotype matrix, one file for the SNP metadata, and one file for the sample metadata.
The following shows what a data set for three SNPs and five samples grouped into two populations (French.DG, Onge.DG) could look like:
PLINK .fam
files, have six columns, but only the first two (family/population ID and sample ID) are relevant in ADMIXTOOLS 2
French.DG S_French-1.DG 0 0 M -9 French.DG S_French-2.DG 0 0 F -9 French.DG B_French-3.DG 0 0 M -9 Onge.DG BR_Onge-2.DG 0 0 F -9 Onge.DG BR_Onge-1.DG 0 0 F -9
PLINK .bim
file:
1 rs3094315 0.02013 752566 G A 1 rs12124819 0.020242 776546 A G 1 rs28765502 0.022137 832918 T C
EIGENSTRAT and PACKEDANCESTRYMAP use the same format for the metadata. The only difference is that the genotype data is stored in a text file in EIGENSTRAT, but in a binary file in PACKEDANCESTRYMAP.
The .ind
file has 3 out of the 6 columns in the .fam
file, in different order:
S_French-1.DG M French.DG S_French-2.DG F French.DG B_French-3.DG M French.DG BR_Onge-2.DG F Onge.DG BR_Onge-1.DG F Onge.DG
.snp
files are the same as .bim
files, except that the first two columns are swapped:
rs3094315 1 0.02013 752566 G A rs12124819 1 0.020242 776546 A G rs28765502 1 0.022137 832918 T C
EIGENSTRAT .geno
files look like this:
29101 10211 22901
One SNP per row, one sample per column, and missing data is denoted by 9
.
While there are other R packages for reading and writing binary PLINK files (plink2R, genio), no such packages exist to date for PACKEDANCESTRYMAP format files, so some of the functions shown here may be useful outside of the ADMIXTOOLS framework.
Note that the functions below impose the restriction than individual IDs may not be duplicated across populations or families.
library(magrittr) library(tidyverse) library(admixtools)
geno = read_plink("plink/prefix") geno = read_eigenstrat("eigenstrat/prefix") geno = read_packedancestrymap("packedancestrymap/prefix") # read SNPs 1000 to 2000 geno = read_packedancestrymap("packedancestrymap/prefix", first = 1000, last = 2000) # read only ind1 and ind2 geno = read_packedancestrymap("packedancestrymap/prefix", inds = c('ind1', 'ind2'))
Populations can be extracted from PLINK files (and written to another set of PLINK files) like this:
extract_samples("input/prefix", "output/prefix", pops = c("pop1", "pop2"))
extract_samples
can also be used to extract samples, similar to plink --keep
.
ADMIXTOOLS 2 can't merge data sets. Data in PLINK format can be merged using plink --bmerge
.
afs = plink_to_afs("plink/prefix") afs = eigenstrat_to_afs("eigenstrat/prefix") afs = packedancestrymap_to_afs("packedancestrymap/prefix")
This function converts genotype files from PACKEDANCESTRYMAP or EIGENSTRAT to PLINK format. If inds
or pops
is specified, only those samples will be extracted.
ADMIXTOOLS 2 currently doesn't provide the option to write binary PACKEDANCESTRYMAP files, so the conversion only goes one way.
eigenstrat_to_plink("eigenstrat_input_prefix", "plink_output_prefix") packedancestrymap_to_plink("packedancestrymap_input_prefix", "plink_output_prefix")
There are several ways in which admixture graphs can be represented. Some of them are useful because they can be stored as human readable text files, and others are useful because they make it easier for R to process graphs. ADMIXTOOLS 2 has several function for importing, exporting and converting between formats.
Before fitting an admixture graph, the graph only has nodes and edges. After fitting a graph, each edge has a weight associated with it, and maybe some other information. Here I mostly show examples of graphs that are not fitted yet.
The simplest way to represent a graph in a text file is by writing each edge in the graph in a separate line:
R O R n1 n1 n2 n1 n3 n2 A n2 admix n3 B n3 n4 n4 C n4 admix admix D
We can save this in a text file graph.txt
and read it into R like this:
graph_el = read_table2('~/Downloads/graph.tsv', col_names = F) plotly_graph(graph_el)
graph_el = tribble(~from, ~to, 'R', 'O', 'R', 'n1', 'n1', 'n2', 'n1', 'n3', 'n2', 'A', 'n2', 'admix', 'n3', 'B', 'n3', 'n4', 'n4', 'C', 'n4', 'admix', 'admix', 'D') plotly_graph(graph_el)
qpgraph()
uses this format to represent fitted graphs. In addition to the first two columns (from
and to
), there will be additional columns to indicate the type of edge, its estimated weight, and possibly more information.
qpgraph(example_f2_blocks, example_igraph)$edges %>% head
Many ADMIXTOOLS 2 functions use the igraph
package, which has its own graph format. The most important ADMIXTOOLS 2 functions work with graphs in either the edge list or the igraph
format, but some only work with graphs in the igraph
format. It's easy to convert between the two formats:
class(graph_el) graph = edges_to_igraph(graph_el) class(graph) graph_el2 = as.data.frame(igraph::as_edgelist(graph)) class(graph_el2)
The text representation of the graph above in the original ADMIXTOOLS software would look like this:
vertex R vertex O vertex n1 vertex n2 vertex n3 vertex A vertex admix vertex B vertex n4 vertex C vertex D label O O label A A label B B label C C label D D edge R_O R O edge R_n1 R n1 edge n1_n2 n1 n2 edge n1_n3 n1 n3 edge n2_A n2 A edge n3_B n3 B edge n3_n4 n3 n4 edge n4_C n4 C edge admix_D admix D admix admix n2 n4
If this is stored in a file called graph.txt
, it can be imported into R like this:
graph_el = parse_qpgraph_graphfile('graph.txt')
DOT format is often used for plotting admixture graphs. You can export a fitted admixture graph like this:
fit = qpgraph(example_f2_blocks, example_igraph)$edges write_dot(fit, 'graph.dot')
ADMIXTOOLS 2 currently doesn't have a function for reading DOT files.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.