optimize_combinations: Find a set of barcode combinations with least heterogeneity...

optimize_combinationsR Documentation

Find a set of barcode combinations with least heterogeneity in barcode usage

Description

This function uses the Shannon Entropy to identify a set of compatible barcode combinations with least heterogeneity in barcode usage.

Usage

optimize_combinations(combination_m, nb_lane, index_number,
thrs_size_comb, max_iteration, method)

Arguments

combination_m

A matrix of compatible barcode combinations.

nb_lane

The number of lanes to be use for sequencing (i.e. the number of libraries divided by the multiplex level).

index_number

The total number of distinct DNA barcodes in the dataset.

thrs_size_comb

The maximum size of the set of compatible combinations to be used for the greedy optimization.

max_iteration

The maximum number of iterations during the optimizing step.

method

The choice of the greedy search: 'greedy_exchange' or 'greedy_descent'.

Details

N/k compatible combinations are then selected using a Shannon entropy maximization approach. It can be shown that the maximum value of the entropy that can be attained for a selection of N barcodes among n, with possible repetitions, reads:

S_{max}=-(n-r)\frac{\lfloor N/n\rfloor}{N} \log(\frac{\lfloor N/n\rfloor}{N})-r\frac{\lceil N/n\rceil}{N} \log(\frac{\lceil N/n\rceil}{N})

where r denotes the rest of the division of N by n, while

\lfloor N/n\rfloor

and

\lceil N/n\rceil

denote the lower and upper integer parts of N/n, respectively.

Case 1: number of lanes < number of compatible DNA-barcode combinations

This function seeks for compatible DNA-barcode combinations of highest entropy. In brief this function uses a randomized greedy descent algorithm to find an optimized selection. Note that the resulting optimized selection may not be globally optimal. It is actually close to optimal and much improved in terms of non-redundancy of DNA barcodes used, compared to a randomly chosen set of combinations of compatible barcodes.

Case 2: number of lanes >= number of compatible DNA-barcode combinations

In such a case, there are not enough compatible DNA-barcode combinations and redundancy is inevitable.

Value

A matrix containing an optimized set of combinations of compatible barcodes.

See Also

get_all_combinations, get_random_combinations, experiment_design

Examples

m <- get_random_combinations(DNABarcodeCompatibility::IlluminaIndexes, 3, 4)
optimize_combinations(m, 12, 48)



comoto-pasteur-fr/DNABarcodeCompatibility documentation built on Sept. 17, 2024, 3:28 p.m.