# fit_graph: Fit the graph parameters to a data set. In mailund/admixture_graph: Admixture Graph Manipulation and Fitting

## Description

Given a table of observed f statistics and a graph, uses Nelder-Mead algorithm to find the graph parameters (edge lengths and admixture proportions) that minimize the value of `cost_function`, i. e. maximizes the likelihood of a graph with parameters given the observed data. Like `fast_fit` but outputs a more detailed analysis on the results.

## Usage

 ```1 2 3 4 5 6``` ```fit_graph(data, graph, point = list(rep(1e-05, length(extract_graph_parameters(graph)\$admix_prop)), rep(1 - 1e-05, length(extract_graph_parameters(graph)\$admix_prop))), Z.value = TRUE, concentration = calculate_concentration(data, Z.value), optimisation_options = NULL, parameters = extract_graph_parameters(graph), iteration_multiplier = 3, qr_tol = 1e-08) ```

## Arguments

 `data` The data table, must contain columns `W`, `X`, `Y`, `Z` for sample names and `D` for the observed f_4(W, X; Y, Z). May contain an optional column `Z.value` for the Z scores (the f statistics divided by the standard deviations). `graph` The admixture graph (an `agraph` object). `point` If the user wants to restrict the admixture proportions somehow, like to fix some of them. A list of two vectors: the lower and the upper bounds. As a default the bounds are just it little bit more than zero and less than one; this is because sometimes the infimum of the values of cost function is at a point of non-continuity, and zero and one have reasons to be problematic values in this respect. `Z.value` Whether we calculate the default concentration from Z scores (the default option `TRUE`) or just use the identity matrix. `concentration` The Cholesky decomposition of the inverted covariance matrix. Default matrix determined by the parameter `Z.value`. `optimisation_options` Options to the Nelder-Mead algorithm. `parameters` In case one wants to tweak something in the graph. `iteration_multiplier` Given to `mynonneg`. `qr_tol` Given to `examine_edge_optimisation_matrix`.

## Value

A class `agraph_fit` list containing a lot of information about the fit:
`data` is the input data,
`graph` is the input graph,
`matrix` is the output of `build_edge_optimisation_matrix`, containing the `full` matrix, the `column_reduced` matrix without zero columns, and graph `parameters`,
`complaint` coding wchich subsets of admixture proportions are trurly fitted,
`best_fit` is the optimal admixture proportions (might not be unique if they are not trurly fitted),
`best_edge_fit` is an example of optimal edge lengths,
`homogeneous` is the reduced row echelon form of the matrix describing when a vector of edge lengths have no effect on the prediced statistics F,
`free_edges` is one way to choose a subset of edge lengths in such a vector as free variables,
`bounded_edges` is how we calculate the reamining edge lengths from the free ones,
`best_error` is the minimum value of the `cost_function`,
`approximation` is the predicted statistics F with the optimal graph parameters,
`parameters` is jsut a shortcut for the graph parameters.
See `summary.agraph_fit` for the interpretation of some of these results.

`cost_function`

`agraph`

`calculate_concentration`

`optimset`

`fast_fit`

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41``` ```# For example, let's fit the following two admixture graph to an example data on bears: data(bears) print(bears) leaves <- c("BLK", "PB", "Bar", "Chi1", "Chi2", "Adm1", "Adm2", "Denali", "Kenai", "Sweden") inner_nodes <- c("R", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "M", "N") edges <- parent_edges(c(edge("BLK", "R"), edge("PB", "v"), edge("Bar", "x"), edge("Chi1", "y"), edge("Chi2", "y"), edge("Adm1", "z"), edge("Adm2", "z"), edge("Denali", "t"), edge("Kenai", "s"), edge("Sweden", "r"), edge("q", "R"), edge("r", "q"), edge("s", "r"), edge("t", "s"), edge("u", "q"), edge("v", "u"), edge("w", "M"), edge("x", "N"), edge("y", "x"), edge("z", "w"), admixture_edge("M", "u", "t"), admixture_edge("N", "v", "w"))) admixtures <- admixture_proportions(c(admix_props("M", "u", "t", "a"), admix_props("N", "v", "w", "b"))) bears_graph <- agraph(leaves, inner_nodes, edges, admixtures) plot(bears_graph, show_admixture_labels = TRUE) fit <- fit_graph(bears, bears_graph) summary(fit) # It turned out the values of admixture proportions had no effect on the cost function. This is not # too surprising because the huge graph contains a lot of edge variables compared to the tiny # amount of data we used! Note however that the mere existence of the admixture event with non- # trivial (not zero or one) admixture proportion might still decrease the cost function. ```

mailund/admixture_graph documentation built on April 3, 2018, 9:28 p.m.