# estimate_prob_group_pairing_and_linked: 'estimate_prob_group_pairing_and_linked' Estimates joint... In bumblebee: Quantify Disease Transmission Within and Between Population Groups

## Description

This function computes the joint probability that a pair of pathogen sequences is from a specific population group pairing and linked.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13``` ```estimate_prob_group_pairing_and_linked( df_counts_and_p_hat, individuals_population_in, ... ) ## Default S3 method: estimate_prob_group_pairing_and_linked( df_counts_and_p_hat, individuals_population_in, verbose_output = FALSE, ... ) ```

## Arguments

 `df_counts_and_p_hat` A data.frame returned by function: `estimate_p_hat()` `individuals_population_in` A numeric vector of the estimated number of individuals per population group `...` Further arguments. `verbose_output` A boolean value to display intermediate output. (Default is `FALSE`)

## Details

For a population group pairing (u,v), the joint probability that a pair is from groups (u,v) and is linked is computed as

(N_uv / N_choose_2) * p_hat_uv ,

where,

• N_uv = N_u * N_v: maximum distinct possible (u,v) pairs in population

• p_hat_uv: probability of linkage between two individuals randomly sampled from groups u and v

• N choose 2 or (N * (N - 1))/2 : all distinct possible pairs in population.

See bumblebee website for more details https://magosil86.github.io/bumblebee/.

## Value

Returns a data.frame containing:

• H1_group, Name of population group 1

• H2_group, Name of population group 2

• number_hosts_sampled_group_1, Number of individuals sampled from population group 1

• number_hosts_sampled_group_2, Number of individuals sampled from population group 2

• number_hosts_population_group_1, Estimated number of individuals in population group 1

• number_hosts_population_group_2, Estimated number of individuals in population group 2

• max_possible_pairs_in_sample, Number of distinct possible transmission pairs between individuals sampled from population groups 1 and 2

• max_possible_pairs_in_population, Number of distinct possible transmission pairs between individuals in population groups 1 and 2

• num_linked_pairs_observed, Number of observed directed transmission pairs between samples from population groups 1 and 2

• p_hat, Probability that pathogen sequences from two individuals randomly sampled from their respective population groups are linked

• prob_group_pairing_and_linked, Probability that a pair of pathogen sequences is from a specific population group pairing and is linked

## Methods (by class)

• `default`: Estimates joint probability of linkage

## References

1. Magosi LE, et al., Deep-sequence phylogenetics to quantify patterns of HIV transmission in the context of a universal testing and treatment trial – BCPP/ Ya Tsie trial. To submit for publication, 2021.

2. Carnegie, N.B., et al., Linkage of viral sequences among HIV-infected village residents in Botswana: estimation of linkage rates in the presence of missing data. PLoS Computational Biology, 2014. 10(1): p. e1003430.

See `estimate_p_hat` to prepare input data to estimate `prob_group_pairing_and_linked`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29``` ```library(bumblebee) library(dplyr) # Estimate joint probability that a pair is from a specific group pairing and linked # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs and ?sampling_frequency # Load and view data # # The input data comprises counts of observed directed HIV transmission pairs # within and between intervention and control communities in the BCPP/Ya Tsie # trial, sampling information and the probability of linkage between individuals # sampled from intervention and control communities (i.e. \code{p_hat}) # # See ?estimate_p_hat() for details on estimating p_hat results_estimate_p_hat <- estimated_hiv_transmission_flows[, c(1:10)] results_estimate_p_hat # Estimate prob_group_pairing_and_linked results_prob_group_pairing_and_linked <- estimate_prob_group_pairing_and_linked( df_counts_and_p_hat = results_estimate_p_hat, individuals_population_in = sampling_frequency\$number_population) # View results results_prob_group_pairing_and_linked ```