README.md
In mailund/admixture_graph: Admixture Graph Manipulation and Fitting

Admixture Graph Manipulation and Fitting

The package provides functionality to analyse and test admixture graphs against the f statistics described in the paper Ancient Admixture in Human History, Patterson et al., Genetics, Vol. 192, 1065--1093, 2012.

The f statistics --- f2, f3, and f4 --- extract information about correlations between gene frequencies in different populations (or single diploid genome samples), which can be informative about patterns of gene flow between these populations in form of admixture events. If a graph is constructed as a hypothesis for the relationship between the populations, equations for the expected values of the f statistics can be extracted, as functions of edge lenghs --- representing genetic drift --- and admixture proportions.

This package provides functions for extracting these equations and for fitting them against computed f statistics. It does not currently provide functions for computing the f statistics --- for that we refer to the ADMIXTOOLS software package.

Below is a quick example of how the package can be used. The example uses data from polar bears and brown bears with a black bear as outgroup and is taken from Genomic evidence of geographically widespread effect of gene flow from polar bears into brown bears.

The BLK sample is the black bear, the PB sample is a polar bear, and the rest are brown bears.

I have taken the f statistics from Table 1 in the paper:

data(bears)
bears
#>      W  X      Y      Z      D Z.value
#> 1  BLK PB Sweden   Adm1 0.1258    12.8
#> 2  BLK PB  Kenai   Adm1 0.0685     5.9
#> 3  BLK PB Denali   Adm1 0.0160     1.3
#> 4  BLK PB Sweden   Adm2 0.1231    12.2
#> 5  BLK PB  Kenai   Adm2 0.0669     6.1
#> 6  BLK PB Denali   Adm2 0.0139     1.1
#> 7  BLK PB Sweden    Bar 0.1613    14.7
#> 8  BLK PB  Kenai    Bar 0.1091     8.9
#> 9  BLK PB Denali    Bar 0.0573     4.3
#> 10 BLK PB Sweden   Chi1 0.1786    17.7
#> 11 BLK PB  Kenai   Chi1 0.1278    11.3
#> 12 BLK PB Denali   Chi1 0.0777     6.4
#> 13 BLK PB Sweden   Chi2 0.1819    18.3
#> 14 BLK PB  Kenai   Chi2 0.1323    12.1
#> 15 BLK PB Denali   Chi2 0.0819     6.7
#> 16 BLK PB Sweden Denali 0.1267    14.3
#> 17 BLK PB  Kenai Denali 0.0571     5.6
#> 18 BLK PB Sweden  Kenai 0.0719     9.6

The D column is the f4(W,X;Y,Z) statistic and the Z column is the Z-values obtained from a blocked jacknife (see Patterson et al. for details).

From the statistics we can see that the ABC bears (Adm, Bar and Chi) are closer related to the polar bears compared to the other brown bears. The paper explains this with gene flow from polar bears into the ABC bears and going further out from there, but we can also explain this by several waves of admixture from ancestral polar bears into brown bears:

leaves <- c("BLK", "PB",
            "Bar", "Chi1", "Chi2", "Adm1", "Adm2",
            "Denali", "Kenai", "Sweden") 
inner_nodes <- c("R", "PBBB",
                 "Adm", "Chi", "BC", "ABC",
                 "x", "y", "z",
                 "pb_a1", "pb_a2", "pb_a3", "pb_a4",
                 "bc_a1", "abc_a2", "x_a3", "y_a4")

edges <- parent_edges(c(edge("BLK", "R"),
                        edge("PB", "pb_a1"),
                        edge("pb_a1", "pb_a2"),
                        edge("pb_a2", "pb_a3"),
                        edge("pb_a3", "pb_a4"),
                        edge("pb_a4", "PBBB"),

                        edge("Chi1", "Chi"),
                        edge("Chi2", "Chi"),
                        edge("Chi", "BC"),
                        edge("Bar", "BC"),
                        edge("BC", "bc_a1"),

                        edge("Adm1", "Adm"),
                        edge("Adm2", "Adm"),

                        admixture_edge("bc_a1", "pb_a1", "ABC", "a"),
                        edge("Adm", "ABC"),

                        edge("ABC", "abc_a2"),
                        admixture_edge("abc_a2", "pb_a2", "x", "b"),

                        edge("Denali", "x"),
                        edge("x", "x_a3"),
                        admixture_edge("x_a3", "pb_a3", "y", "c"),

                        edge("Kenai", "y"),
                        edge("y", "y_a4"),                        
                        admixture_edge("y_a4", "pb_a4", "z", "d"),

                        edge("Sweden", "z"),

                        edge("z", "PBBB"),
                        edge("PBBB", "R")))

bears_graph <- agraph(leaves, inner_nodes, edges)
plot(bears_graph, show_admixture_labels = TRUE)
#> fminbnd:  Exiting: Maximum number of function evaluations has been exceeded
#>          - increase MaxFunEvals option.
#>          Current function value: 3027.37262644651

The graph makes predictions on how the f4 statistics should look. The graph parameters can be fit to observed statistics using the fit_graph function:

fit <- fit_graph(bears, bears_graph)
fit
#> 
#> Call: inner_fit_graph(data, graph, point, Z.value, concentration, optimisation_options, 
#>     parameters, iteration_multiplier, qr_tol)
#> 
#> None of the admixture proportions are properly fitted!
#> Not all of the admixture proportions are properly fitted!
#> See summary.agraph_fit for a more detailed analysis.
#> 
#> Minimal error: 12.98523

You can get detailsabout the fit by calling the summary.agraph_fit function:

summary(fit)
#> 
#> Call: inner_fit_graph(data, graph, point, Z.value, concentration, optimisation_options, 
#>     parameters, iteration_multiplier, qr_tol)
#> 
#> None of the proportions {a, b, c, d} affect the quality of the fit!
#> 
#> Optimal admixture proportions:
#>         a         b         c         d 
#> 0.3666992 0.4977105 0.9565926 0.7986799 
#> 
#> Optimal edge lengths:
#>        edge_R_BLK       edge_R_PBBB       edge_PBBB_z   edge_PBBB_pb_a4 
#>        0.00000000        0.00000000        0.00000000        0.07852837 
#>     edge_Adm_Adm1     edge_Adm_Adm2     edge_Chi_Chi1     edge_Chi_Chi2 
#>        0.00000000        0.00000000        0.00000000        0.00000000 
#>       edge_BC_Bar       edge_BC_Chi      edge_ABC_Adm    edge_ABC_bc_a1 
#>        0.00000000        0.00000000        0.00000000        0.00000000 
#>     edge_x_Denali     edge_x_abc_a2      edge_y_Kenai       edge_y_x_a3 
#>        0.00000000        0.00000000        0.00000000        0.00000000 
#>     edge_z_Sweden       edge_z_y_a4     edge_pb_a1_PB  edge_pb_a1_bc_a1 
#>        0.00000000        0.00000000        0.00000000        0.00000000 
#>  edge_pb_a2_pb_a1 edge_pb_a2_abc_a2  edge_pb_a3_pb_a2   edge_pb_a3_x_a3 
#>        0.13643125        0.00000000        0.02156832        0.00000000 
#>  edge_pb_a4_pb_a3   edge_pb_a4_y_a4     edge_bc_a1_BC   edge_abc_a2_ABC 
#>        0.04010857        0.00000000        0.00000000        0.00000000 
#>       edge_x_a3_x       edge_y_a4_y 
#>        0.00000000        0.00000000 
#> 
#> Solution to a homogeneous system of edge lengths with the optimal admixture proportions:
#> Adding any such solution to the optimal one will not affect the error.
#> 
#> Free edge lengths:
#> edge_R_BLK
#> edge_R_PBBB
#> edge_PBBB_z
#> edge_Adm_Adm1
#> edge_Adm_Adm2
#> edge_Chi_Chi1
#> edge_Chi_Chi2
#> edge_BC_Bar
#> edge_BC_Chi
#> edge_ABC_Adm
#> edge_ABC_bc_a1
#> edge_x_Denali
#> edge_x_abc_a2
#> edge_y_Kenai
#> edge_y_x_a3
#> edge_z_Sweden
#> edge_z_y_a4
#> edge_pb_a1_PB
#> edge_pb_a1_bc_a1
#> edge_pb_a2_abc_a2
#> edge_pb_a3_x_a3
#> edge_pb_a4_y_a4
#> edge_bc_a1_BC
#> edge_abc_a2_ABC
#> edge_x_a3_x
#> edge_y_a4_y
#> 
#> Bounded edge lengths:
#> edge_PBBB_pb_a4 = 0
#> edge_pb_a2_pb_a1 = 0
#> edge_pb_a3_pb_a2 = 0
#> edge_pb_a4_pb_a3 = 0
#> 
#> Minimal error:
#> 12.98523

You can make a plot of the fit against the data by calling the plot.agraph_fit function: