fit_graph: Fit the graph parameters to a data set.

Description Usage Arguments Value See Also Examples

View source: R/fitting.R

Description

Given a table of observed f statistics and a graph, uses Nelder-Mead algorithm to find the graph parameters (edge lengths and admixture proportions) that minimize the value of cost_function, i. e. maximizes the likelihood of a graph with parameters given the observed data. Like fast_fit but outputs a more detailed analysis on the results.

Usage

1
2
3
4
5
6
fit_graph(data, graph, point = list(rep(1e-05,
  length(extract_graph_parameters(graph)$admix_prop)), rep(1 - 1e-05,
  length(extract_graph_parameters(graph)$admix_prop))), Z.value = TRUE,
  concentration = calculate_concentration(data, Z.value),
  optimisation_options = NULL, parameters = extract_graph_parameters(graph),
  iteration_multiplier = 3, qr_tol = 1e-08)

Arguments

data

The data table, must contain columns W, X, Y, Z for sample names and D for the observed f_4(W, X; Y, Z). May contain an optional column Z.value for the Z scores (the f statistics divided by the standard deviations).

graph

The admixture graph (an agraph object).

point

If the user wants to restrict the admixture proportions somehow, like to fix some of them. A list of two vectors: the lower and the upper bounds. As a default the bounds are just it little bit more than zero and less than one; this is because sometimes the infimum of the values of cost function is at a point of non-continuity, and zero and one have reasons to be problematic values in this respect.

Z.value

Whether we calculate the default concentration from Z scores (the default option TRUE) or just use the identity matrix.

concentration

The Cholesky decomposition of the inverted covariance matrix. Default matrix determined by the parameter Z.value.

optimisation_options

Options to the Nelder-Mead algorithm.

parameters

In case one wants to tweak something in the graph.

iteration_multiplier

Given to mynonneg.

qr_tol

Given to examine_edge_optimisation_matrix.

Value

A class agraph_fit list containing a lot of information about the fit:
data is the input data,
graph is the input graph,
matrix is the output of build_edge_optimisation_matrix, containing the full matrix, the column_reduced matrix without zero columns, and graph parameters,
complaint coding wchich subsets of admixture proportions are trurly fitted,
best_fit is the optimal admixture proportions (might not be unique if they are not trurly fitted),
best_edge_fit is an example of optimal edge lengths,
homogeneous is the reduced row echelon form of the matrix describing when a vector of edge lengths have no effect on the prediced statistics F,
free_edges is one way to choose a subset of edge lengths in such a vector as free variables,
bounded_edges is how we calculate the reamining edge lengths from the free ones,
best_error is the minimum value of the cost_function,
approximation is the predicted statistics F with the optimal graph parameters,
parameters is jsut a shortcut for the graph parameters.
See summary.agraph_fit for the interpretation of some of these results.

See Also

cost_function

agraph

calculate_concentration

optimset

fast_fit

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# For example, let's fit the following two admixture graph to an example data on bears:

data(bears)
print(bears)

leaves <- c("BLK", "PB", "Bar", "Chi1", "Chi2", "Adm1", "Adm2", "Denali", "Kenai", "Sweden") 
inner_nodes <- c("R", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "M", "N")
edges <- parent_edges(c(edge("BLK", "R"),
                        edge("PB", "v"),
                        edge("Bar", "x"),
                        edge("Chi1", "y"),
                        edge("Chi2", "y"),
                        edge("Adm1", "z"),
                        edge("Adm2", "z"),
                        edge("Denali", "t"),
                        edge("Kenai", "s"),
                        edge("Sweden", "r"),
                        edge("q", "R"),
                        edge("r", "q"),
                        edge("s", "r"),
                        edge("t", "s"),
                        edge("u", "q"),
                        edge("v", "u"),
                        edge("w", "M"),
                        edge("x", "N"),
                        edge("y", "x"),
                        edge("z", "w"),
                        admixture_edge("M", "u", "t"),
                        admixture_edge("N", "v", "w")))
admixtures <- admixture_proportions(c(admix_props("M", "u", "t", "a"),
                                      admix_props("N", "v", "w", "b")))
bears_graph <- agraph(leaves, inner_nodes, edges, admixtures)
plot(bears_graph, show_admixture_labels = TRUE)

fit <- fit_graph(bears, bears_graph)
summary(fit)

# It turned out the values of admixture proportions had no effect on the cost function. This is not
# too surprising because the huge graph contains a lot of edge variables compared to the tiny 
# amount of data we used! Note however that the mere existence of the admixture event with non- 
# trivial (not zero or one) admixture proportion might still decrease the cost function.

mailund/admixture_graph documentation built on May 21, 2019, 11:06 a.m.