In uqrmaie1/admixtools: Inferring demographic history from genetic data

Genotype data formats

ADMIXTOOLS 2 can read genotype data in three formats.

Binary PLINK format (PACKEDPED), described here
Binary PACKEDANCESTRYMAP format
Text based EIGENSTRAT format

PACKEDANCESTRYMAP and EIGENSTRAT are described here

In all three formats a dataset consists of three files: one file for the genotype matrix, one file for the SNP metadata, and one file for the sample metadata.

The following shows what a data set for three SNPs and five samples grouped into two populations (French.DG, Onge.DG) could look like:

PLINK

PLINK .fam files, have six columns, but only the first two (family/population ID and sample ID) are relevant in ADMIXTOOLS 2

French.DG   S_French-1.DG   0   0   M   -9
French.DG   S_French-2.DG   0   0   F   -9
French.DG   B_French-3.DG   0   0   M   -9
Onge.DG BR_Onge-2.DG    0   0   F   -9
Onge.DG BR_Onge-1.DG    0   0   F   -9

PLINK .bim file:

1   rs3094315   0.02013 752566  G   A
1   rs12124819  0.020242    776546  A   G
1   rs28765502  0.022137    832918  T   C

EIGENSTRAT/PACKEDANCESTRYMAP

EIGENSTRAT and PACKEDANCESTRYMAP use the same format for the metadata. The only difference is that the genotype data is stored in a text file in EIGENSTRAT, but in a binary file in PACKEDANCESTRYMAP.

The .ind file has 3 out of the 6 columns in the .fam file, in different order:

S_French-1.DG M French.DG
S_French-2.DG F French.DG
B_French-3.DG M French.DG
BR_Onge-2.DG F Onge.DG
BR_Onge-1.DG F Onge.DG

.snp files are the same as .bim files, except that the first two columns are swapped:

rs3094315 1 0.02013 752566  G   A
rs12124819 1    0.020242    776546  A   G
rs28765502 1    0.022137    832918  T   C

EIGENSTRAT .geno files look like this:

29101
10211
22901

One SNP per row, one sample per column, and missing data is denoted by 9.

Reading genotype files

While there are other R packages for reading and writing binary PLINK files (plink2R, genio), no such packages exist to date for PACKEDANCESTRYMAP format files, so some of the functions shown here may be useful outside of the ADMIXTOOLS framework.

Note that the functions below impose the restriction than individual IDs may not be duplicated across populations or families.

library(magrittr)
library(tidyverse)
library(admixtools)

geno = read_plink("plink/prefix")
geno = read_eigenstrat("eigenstrat/prefix")
geno = read_packedancestrymap("packedancestrymap/prefix")

# read SNPs 1000 to 2000
geno = read_packedancestrymap("packedancestrymap/prefix", first = 1000, last = 2000)
# read only ind1 and ind2
geno = read_packedancestrymap("packedancestrymap/prefix", inds = c('ind1', 'ind2'))

Extracting populations

Populations can be extracted from PLINK files (and written to another set of PLINK files) like this:

extract_samples("input/prefix", "output/prefix", pops = c("pop1", "pop2"))

extract_samples can also be used to extract samples, similar to plink --keep.

Combining data sets

ADMIXTOOLS 2 can't merge data sets. Data in PLINK format can be merged using plink --bmerge.

Computing allele frequencies

afs = plink_to_afs("plink/prefix")
afs = eigenstrat_to_afs("eigenstrat/prefix")
afs = packedancestrymap_to_afs("packedancestrymap/prefix")

Converting PACKEDANCESTRYMAP to PLINK

This function converts genotype files from PACKEDANCESTRYMAP or EIGENSTRAT to PLINK format. If inds or pops is specified, only those samples will be extracted.

ADMIXTOOLS 2 currently doesn't provide the option to write binary PACKEDANCESTRYMAP files, so the conversion only goes one way.

eigenstrat_to_plink("eigenstrat_input_prefix", "plink_output_prefix")
packedancestrymap_to_plink("packedancestrymap_input_prefix", "plink_output_prefix")

Admixture graph formats

There are several ways in which admixture graphs can be represented. Some of them are useful because they can be stored as human readable text files, and others are useful because they make it easier for R to process graphs. ADMIXTOOLS 2 has several function for importing, exporting and converting between formats.

Before fitting an admixture graph, the graph only has nodes and edges. After fitting a graph, each edge has a weight associated with it, and maybe some other information. Here I mostly show examples of graphs that are not fitted yet.

Edge list format

The simplest way to represent a graph in a text file is by writing each edge in the graph in a separate line:

R      O
R      n1
n1     n2
n1     n3
n2     A
n2     admix
n3     B
n3     n4
n4     C
n4     admix
admix  D

We can save this in a text file graph.txt and read it into R like this:

graph_el = read_table2('~/Downloads/graph.tsv', col_names = F)
plotly_graph(graph_el)

graph_el = tribble(~from, ~to,
'R', 'O',
'R', 'n1',
'n1', 'n2',
'n1', 'n3',
'n2', 'A',
'n2', 'admix',
'n3', 'B',
'n3', 'n4',
'n4', 'C',
'n4', 'admix',
'admix', 'D')
plotly_graph(graph_el)

qpgraph() uses this format to represent fitted graphs. In addition to the first two columns (from and to), there will be additional columns to indicate the type of edge, its estimated weight, and possibly more information.

qpgraph(example_f2_blocks, example_igraph)$edges %>% head

igraph format

Many ADMIXTOOLS 2 functions use the igraph package, which has its own graph format. The most important ADMIXTOOLS 2 functions work with graphs in either the edge list or the igraph format, but some only work with graphs in the igraph format. It's easy to convert between the two formats:

class(graph_el)
graph = edges_to_igraph(graph_el)
class(graph)
graph_el2 = as.data.frame(igraph::as_edgelist(graph))
class(graph_el2)

Original ADMIXTOOLS format

The text representation of the graph above in the original ADMIXTOOLS software would look like this:

vertex              R
vertex              O
vertex              n1
vertex              n2
vertex              n3
vertex              A
vertex              admix
vertex              B
vertex              n4
vertex              C
vertex              D
label               O                   O
label               A                   A
label               B                   B
label               C                   C
label               D                   D
edge                R_O                 R                   O
edge                R_n1                R                   n1
edge                n1_n2               n1                  n2
edge                n1_n3               n1                  n3
edge                n2_A                n2                  A
edge                n3_B                n3                  B
edge                n3_n4               n3                  n4
edge                n4_C                n4                  C
edge                admix_D             admix               D
admix               admix               n2                  n4

If this is stored in a file called graph.txt, it can be imported into R like this:

graph_el = parse_qpgraph_graphfile('graph.txt')

DOT format

DOT format is often used for plotting admixture graphs. You can export a fitted admixture graph like this:

fit = qpgraph(example_f2_blocks, example_igraph)$edges
write_dot(fit, 'graph.dot')

ADMIXTOOLS 2 currently doesn't have a function for reading DOT files.

uqrmaie1/admixtools documentation built on March 30, 2025, 12:49 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

uqrmaie1/admixtools
Inferring demographic history from genetic data

In uqrmaie1/admixtools: Inferring demographic history from genetic data

Genotype data formats

PLINK

EIGENSTRAT/PACKEDANCESTRYMAP

Reading genotype files

Extracting populations

Combining data sets

Computing allele frequencies

Converting PACKEDANCESTRYMAP to PLINK

Admixture graph formats

Edge list format

igraph format

Original ADMIXTOOLS format

DOT format

R Package Documentation

Browse R Packages

We want your feedback!

uqrmaie1/admixtools Inferring demographic history from genetic data

In uqrmaie1/admixtools: Inferring demographic history from genetic data

Genotype data formats

PLINK

EIGENSTRAT/PACKEDANCESTRYMAP

Reading genotype files

Extracting populations

Combining data sets

Computing allele frequencies

Converting PACKEDANCESTRYMAP to PLINK

Admixture graph formats

Edge list format

igraph format

Original ADMIXTOOLS format

DOT format

R Package Documentation

Browse R Packages

We want your feedback!

uqrmaie1/admixtools
Inferring demographic history from genetic data