fit_graph: Fit the graph parameters to a data set.
In mailund/admixture_graph: Admixture Graph Manipulation and Fitting

Description Usage Arguments Value See Also Examples

Given a table of observed f statistics and a graph, uses Nelder-Mead algorithm to find the graph parameters (edge lengths and admixture proportions) that minimize the value of cost_function, i. e. maximizes the likelihood of a graph with parameters given the observed data. Like fast_fit but outputs a more detailed analysis on the results.

fit_graph(data, graph, point = list(rep(1e-05,
  length(extract_graph_parameters(graph)$admix_prop)), rep(1 - 1e-05,
  length(extract_graph_parameters(graph)$admix_prop))), Z.value = TRUE,
  concentration = calculate_concentration(data, Z.value),
  optimisation_options = NULL, parameters = extract_graph_parameters(graph),
  iteration_multiplier = 3, qr_tol = 1e-08)

`data`	The data table, must contain columns `W`, `X`, `Y`, `Z` for sample names and `D` for the observed f_4(W, X; Y, Z). May contain an optional column `Z.value` for the Z scores (the f statistics divided by the standard deviations).
`graph`	The admixture graph (an `agraph` object).
`point`	If the user wants to restrict the admixture proportions somehow, like to fix some of them. A list of two vectors: the lower and the upper bounds. As a default the bounds are just it little bit more than zero and less than one; this is because sometimes the infimum of the values of cost function is at a point of non-continuity, and zero and one have reasons to be problematic values in this respect.
`Z.value`	Whether we calculate the default concentration from Z scores (the default option `TRUE`) or just use the identity matrix.
`concentration`	The Cholesky decomposition of the inverted covariance matrix. Default matrix determined by the parameter `Z.value`.
`optimisation_options`	Options to the Nelder-Mead algorithm.
`parameters`	In case one wants to tweak something in the graph.
`iteration_multiplier`	Given to `mynonneg`.
`qr_tol`	Given to `examine_edge_optimisation_matrix`.

A class agraph_fit list containing a lot of information about the fit:
data is the input data,
graph is the input graph,
matrix is the output of build_edge_optimisation_matrix, containing the full matrix, the column_reduced matrix without zero columns, and graph parameters,
complaint coding wchich subsets of admixture proportions are trurly fitted,
best_fit is the optimal admixture proportions (might not be unique if they are not trurly fitted),
best_edge_fit is an example of optimal edge lengths,
homogeneous is the reduced row echelon form of the matrix describing when a vector of edge lengths have no effect on the prediced statistics F,
free_edges is one way to choose a subset of edge lengths in such a vector as free variables,
bounded_edges is how we calculate the reamining edge lengths from the free ones,
best_error is the minimum value of the cost_function,
approximation is the predicted statistics F with the optimal graph parameters,
parameters is jsut a shortcut for the graph parameters.
See summary.agraph_fit for the interpretation of some of these results.

cost_function

agraph

calculate_concentration

optimset

fast_fit

# For example, let's fit the following two admixture graph to an example data on bears:

data(bears)
print(bears)

leaves <- c("BLK", "PB", "Bar", "Chi1", "Chi2", "Adm1", "Adm2", "Denali", "Kenai", "Sweden") 
inner_nodes <- c("R", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "M", "N")
edges <- parent_edges(c(edge("BLK", "R"),
                        edge("PB", "v"),
                        edge("Bar", "x"),
                        edge("Chi1", "y"),
                        edge("Chi2", "y"),
                        edge("Adm1", "z"),
                        edge("Adm2", "z"),
                        edge("Denali", "t"),
                        edge("Kenai", "s"),
                        edge("Sweden", "r"),
                        edge("q", "R"),
                        edge("r", "q"),
                        edge("s", "r"),
                        edge("t", "s"),
                        edge("u", "q"),
                        edge("v", "u"),
                        edge("w", "M"),
                        edge("x", "N"),
                        edge("y", "x"),
                        edge("z", "w"),
                        admixture_edge("M", "u", "t"),
                        admixture_edge("N", "v", "w")))
admixtures <- admixture_proportions(c(admix_props("M", "u", "t", "a"),
                                      admix_props("N", "v", "w", "b")))
bears_graph <- agraph(leaves, inner_nodes, edges, admixtures)
plot(bears_graph, show_admixture_labels = TRUE)

fit <- fit_graph(bears, bears_graph)
summary(fit)

# It turned out the values of admixture proportions had no effect on the cost function. This is not
# too surprising because the huge graph contains a lot of edge variables compared to the tiny 
# amount of data we used! Note however that the mere existence of the admixture event with non- 
# trivial (not zero or one) admixture proportion might still decrease the cost function.