knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(maxRgain)
Polyclonal selection is a breeding strategy developed for ancient grapevine varieties, but of more widespread interest. It seeks to select a group of different genotypes that collectively meet predefined desirable criteria for quantitative traits. Polyclonal selection, as opposed to the selection of a single genotype, helps to preserve genetic diversity and provides greater environmental stability, while still securing high genetic gains.
This approach relies on the predictors of the genetic effects obtained from the fitting of a mixed model. Rather than focusing on individual genotypes, polyclonal selection considers the group as the unit of selection, with the goal of maximizing the overall predicted genetic gain across multiple traits of interest. Identifying such a superior group is an optimization problem, in which the objective is to select a subset of genotypes that jointly maximize the expected genetic value, subject to constraints which satisfy agronomic and enological interests.
Most existing methods for genetic selection based on several traits, such as selection indices, are designed to maximize the genetic gain of selection of a single genotype. When the objective is to select a group of genotypes, rather than a single genotype, the conventional approach has been to simply choose the top-ranking genotypes. However, this strategy does not ensure that the selected group, as a whole, satisfies predefined breeding objectives when multiple traits are considered simultaneously.
This package implements a novel Integer Programming-based method developed by Surgy et al. (2025), designed specifically for group-based selection under multiple trait criteria. While the methodology is demonstrated using data from grapevine breeding trials, it is broadly applicable across different species and breeding programs, especially in scenarios where group performance is more relevant than individual merit alone.
For a complete description of the methodology, please refer to:
Surgy, S., Cadima, J. & Gonçalves, E. Integer programming as a powerful tool for polyclonal selection in ancient grapevine varieties. Theor Appl Genet 138, 122 (2025). https://doi.org/10.1007/s00122-025-04885-0
Each function in the package includes an example that reproduces scenarios described in the original publication, thereby facilitating both practical understanding and a direct application of the method. This document provides a more detailed explanation of the outputs generated by the various functions and offers guidance on how to interpret them effectively.
The package includes a real dataset - Gouveio - to ensure the reproducibility of the examples and to allow users to explore additional scenarios. This dataset contains Empirical Best Linear Unbiased Predictors (EBLUPs) of genotypic effects, obtained from the fitting of a linear mixed model to phenotypic data from a selection field trial involving 150 genotypes of the grapevine variety Gouveio, established according to a resolvable row-column design.
head(Gouveio)
The traits evaluated were:
The selection criteria are defined by the breeder. In this example, they are: to increase yield, potential alcohol, and total acidity, while decreasing pH and berry weight.
Below is a table describing the common arguments used across the functions in this package:
| Argument | Description | Used in functions |
|--------------|------------------------------|-------------|
| traits
| A vector with the names of the traits to be optimized, i.e., those included in the objective function| All |
| ref
| Name of the reference column (e.g., genotype ID). Defaults to the first column. | All |
| clmin
| Integer, indicating the minimum group size (default is 1). | All |
| clmax
| Integer, indicating the maximum group size. If omitted, equal to clmin
| All |
| dmg
| Can be of one of two types. In the standard form, it will be a dataframe with three columns defining constraints: trait names, constraint relations (">="
, "<="
), and right-hand side values. Alternatively, it can be given the text value dmg = "base"
which sets the right-hand side of the constraints of all traits to zero. |polyclonal()
|
| constraints
| Named numeric vector with traits to which constraints apply. If omitted, all except ref
are used. |rmaxa()
|
| meanvec
| Named numeric vector of the means for each trait. If omitted, data are assumed to be normalized (divided) by the mean. |All |
| criteria
| Named numeric vector with the criterion for each trait: 1 to increase, -1 to decrease. If omitted, all traits are assumed to be increasing. |All |
| data
| A dataframe with the predictors of genotypic effects for the several traits. Rows are genotypes; columns are traits. | All |
If the gain for a group of a certain size is not displayed, it means that it was not possible to satisfy all the constraints for that group size. It is possible that constraints are not feasible for some group sizes but are feasible for others.
The genotypes selected in a group of a given size may differ substantially from those selected in a group of another size. Changing any condition — including the group size — results in a new optimisation problem.
The dataframe used for the dmg
argument can have any column names, but the order must be respected: first the trait, then the relation (e.g., ">="), and finally the right-hand side value, that is, the value for the minimum desired gain in percentage of the overall mean of the trait. It should be noted that positive values are always understood as improvements, even for a criterion value of -1. Thus, >= 3 with criterion value -1 is understood as a decrease of at least 3% Consequently, minimum desired gains should be defined consistently, regardless of the selection criterion used.
If a loss in a trait is admissible, the maximum admissible loss must be
indicated as a negative number on the right-hand side of the dmg
argument. On the other hand, if the goal is to establish an upper bound on the increase of the gain of a trait, the relation in the dmg
argument must be specified as "<=".
If no group of any requested size satisfies the specified conditions, the message 'No possible solution!' is returned.
polyclonal()
This is the main function of this package.
polyclonal(traits, ref = NULL, clmin = 2, clmax, dmg = NULL, meanvec = NULL, criteria = NULL, data)
In the example for the polyclonal()
function, the objective is to maximize the genetic gain of groups with sizes ranging from 7 to 20 genotypes, while enforcing desired minimum gains for the five traits (expressed as percentages of the overall mean of each trait in the field trial): 20% for yield (yd), 3% for potential alcohol (pa), 3% for total acidity (ta), 1% for pH, and 2% for berry weight (bw).
# Named numeric vector with the trait means mymeanvec <- c(yd = 3.517, pa = 12.760, ta = 4.495, ph = 3.927, bw = 1.653) # Vector with the traits to be optimized mytraits <- c("yd", "pa", "ta", "ph", "bw") # Named numeric vector with the selection criteria mycriteria <- c(yd = 1, pa = 1, ta = 1, ph = -1, bw = -1) # Name of the reference column with the genotype labels myref = "Clone"
If meanvec
is omitted, it is assumed that all trait values have already been standardized by dividing by their respective overall means. If criteria
is omitted, the default assumption is that an increase is desired for all traits. If ref
is omitted, the first column in the dataset is assumed to contain the genotype labels.
In the polyclonal()
function the constraints are indicated by argument dmg
(desired minimum gains). The standard form of this argument is a data frame with three columns: the first specifying the traits to be constrained; the second with the constraint relation; the second with the relation of the constraint (">=" or "<="); and the third with the right-hand side values of the constraints. Column names are arbitrary, but their order must be preserved.
# dataframe with the desired minimum gains mydmg <- data.frame( lhs = c("yd", "pa", "ta", "ph", "bw"), rel = c(">=", ">=", ">=", ">=", ">="), rhs = c(20, 3, 3, 1, 2) )
Note that the positive values of 1 and 2 for pH (ph) and berry weight (bw), which have selection criteria -1, mean a decrease of at least 1% in pH and a decrease of at least 3% in berry weight.
mydmg
Now, the polyclonal
function is called to perform the selection based on the specified constraints:
# Using polyclonal() function with_dmg <- polyclonal( traits = mytraits, clmin = 7, clmax = 20, dmg = mydmg, meanvec = mymeanvec, criteria = mycriteria, data = Gouveio )
Groups ranging from 7 to 20 genotypes were requested, as these cover the most appropriate group sizes for practical applications of the polyclonal methodology for grapevine selection. However, other group sizes can also be specified. In fact, it is possible to request a single group size by assigning the same value to both clmin
and clmax
or by omitting clmax
. Analysing the output reveals how the imposed constraints influence the selection results.
# Results
with_dmg
The output displays the genetic gain achieved for each trait across the different group sizes. All values are expressed as percentages of the trait mean. For example, in the 10-genotype group, the gains obtained were approximately 21.4%, 3.0%, 3.3%, 1.1% and 2.3% for yield (yd), potential alcohol (pa), total acidity (ta), pH (ph), and berry weight (bw), respectively. Positive values indicate gains in accordance with the selection criteria.
In this case, although group sizes between 7 and 20 genotypes were requested, the object gain
only provides results for groups of 7, 8, and 10 genotypes. This indicates that the desired minimum gains could not be achieved for other group sizes within the specified range.
Examining the lists of genotypes within each group reveals that genotype groups of different sizes are not necessarily nested. For example, genotype GV060 is not included in the group of 7 genotypes, appears in the group of 8 genotypes, and is then excluded from the group of 10 genotypes.
The summary()
method can be used with polyclonal()
object to retrieve a summary of the initial conditions and results.
# Summary results summary(with_dmg)
The summary()
indicates how many groups were achieved for those conditions and the related dimensions. In the table, the first column gives the trait names, followed by the means, the criteria and the desired minimum gain (DMG). The Last two columns, "MaxGain" and "MaxGroup" show the maximum gain achieved for each trait and the group size where that gain occurs, respectively.
The so-called base situation, as described in the original publication of the method (Surgy et al., 2025), consists in maximizing the predicted genetic gain of the target traits while ensuring that no trait decreases. To implement this, the right-hand side of all constraints is set to zero.
To simplify the computation of this scenario, the argument dmg
can be set to "base"
. In that case, all traits present in the dataset are included as constraints, each required to be greater than or equal to zero, except for the reference trait specified in ref
. If ref
is omitted, the first column of the dataset is assumed to correspond to the reference trait.
It is important to note that when using the "base" mode, the user must provide values for both meanvec
and criteria
for all traits in the dataset. If the user wishes to exclude any traits from the constraint set, the full specification of dmg
must be provided instead.
The following example illustrates the application of the base situation. Groups of 7 to 12 clones are selected under these conditions. The same values for traits
, meanvec
, and criteria
are used throughout.
# Named numeric vector with the trait means mymeanvec <- c(yd = 3.517, pa = 12.760, ta = 4.495, ph = 3.927, bw = 1.653) # Vector with the traits to be optimized mytraits <- c("yd", "pa","ta", "ph", "bw") # Named numeric vector with the selection criteria mycriteria <- c(yd = 1, pa = 1, ta = 1, ph = -1, bw = -1) base_sit <- polyclonal( traits = mytraits, clmin = 7, clmax = 12, dmg = "base", meanvec = mymeanvec, criteria = mycriteria, data = Gouveio )
Unlike what happens with a selection index, where increasing the group size typically involves the sequential inclusion of genotypes according to a fixed ranking, the method implemented here allows each group size to be independently optimized. As a result, groups of different sizes are not necessarily nested.
base_sit
For example, consider the groups of 7 and 8 genotypes. In the 8-genotype group, not just one but two genotypes differ from the groups of 7: genotypes GV089 and GV137. On the other hand, genotype GV132, which is included in the group of 7 genotypes, is not present in the group of 8. This illustrates how the inclusion of constraints can lead to changes in group composition, rather than simply adding the next best-ranked individual.
In the base situation, summary()
show all DMG constraints as 0.
summary(base_sit)
rmaxp()
rmaxp(traits, ref = NULL, clmin = 2 , clmax, meanvec = NULL, criteria = NULL, data)
The package includes a function rmaxp()
, which computes the maximum possible gain for each trait. The maximum possible gain corresponds to the value obtained by selecting the top genotypes for each trait individually, without applying any constraints. The function allows the user to request the gain for one or multiple traits simultaneously. The output displays one column per trait; however, the values in different columns are not linked to the same solution and should be interpreted independently.
In the following example, it is asked the maximum possible gain for yd
and pa
for group sizes from 9 to 20 genotypes.
# Named numeric vector with the trait means mymeanvec <- c(yd = 3.517, pa = 12.760) # Vector with the traits to be optimized mytraits <- c("yd", "pa")
The vector specifying the selection criteria can be omitted, as the default value of 1 is used for both traits. With this function, no constraints are applied.
# Using rmaxp() max_pos_gain <- rmaxp( traits = mytraits, clmin = 9, clmax = 20, meanvec = mymeanvec, data = Gouveio ) # Results max_pos_gain
As in previous outputs, one column per trait is shown. However, in this case, the gain reported in each column refers to an independent solution. For example, the group of 20 genotypes achieving a yield gain of 39.5% is not composed of the same genotypes as the group achieving a potential alcohol gain of 5.9%. This can be verified by examining the output stored in the selected object.
Since no constraints are applied in this scenario, the output represents a ranking of genotypes for each trait. Increasing the group size simply adds the next genotype in the ranking. Thus, each column corresponds to the top genotypes for a given trait. The final column, Entry, indicates the smallest group size at which each genotype is first included. It means that, for example, for yield, the group size of 9 is composed by genotypes from GV084 to GV145. Genotype GV137 appears for the first time in yield when the group size reaches 10 genotypes. When the group size increases to 11, all the previous 10 genotypes remain in the group and genotype GV079 is added, and so on.
No customized summary()
is provided, given that the conditions do not vary and no constraints are applied.
rmaxa()
rmaxa(traits, ref = NULL, clmin = 2, clmax, constraints = NULL, meanvec = NULL, criteria = NULL, data)
The package includes a function rmaxa()
, which computes the maximum admissible gains. The maximum admissible gain corresponds to the value obtained by maximizing one trait while avoiding losses in all others, enforced through constraints—predefined in this case as >= 0. As with rmaxp()
, the user may request gains for one or multiple traits simultaneously. The output displays one column per trait; however, the values in different columns correspond to independent solutions and should be interpreted separately. This distinguishes the output of the gain object from this function from that of the polyclonal()
function with dmg = "base"
, where the sum of the values related to all traits are optimized.
# Named numeric vector with the trait means mymeanvec <- c(yd = 3.517, pa = 12.760, ta = 4.495, ph = 3.927, bw = 1.653) # Vector with the traits to be optimized mytraits <- c("yd", "pa") # Named numeric vector with the selection criteria mycriteria <- c(yd = 1, pa = 1, ta = 1, ph = -1, bw = -1)
In this function, if the traits to which the constraints apply are not specified, all the traits are assumed to be greater or equal than zero. If this vector is omitted, all columns in the dataset are considered as constrained traits, except the one indicated in ref
or the first column if ref
is omitted. The criteria
as well as the meanvec
arguments must indicate the criteria for all the traits involved in both maximization and the constraints.
This example requests groups with 12 to 20 genotypes.
# Using rmaxa() max_adm_gain <- rmaxa( traits = mytraits, clmin = 12, clmax = 20, meanvec = mymeanvec, data = Gouveio ) # Results max_adm_gain
As in the rmaxp()
gain output, one column per trait is shown; however, the gain reported in each column corresponds to an independent solution. For example, the group of 20 genotypes achieving a gain of 26.2% for yd is not composed of the same genotypes as the group achieving a gain of 5.7% for pa. Furthermore, the gains obtained with rmaxa()
for any trait are always less than or equal to those obtained with rmaxp()
, since rmaxa()
incorporates constraints during optimization.
Regarding the selected
object, due to the constraints applied in this function, the genotypes composing one group differ from those in other groups. Consequently, there is one selected object per trait, named as selected_<trait>
.
This package offers a ready-to-use implementation of the integer programming method for maximizing the predicted genetic gains in polyclonal selection developed by Surgy et al. (2025). It allowing breeders to efficiently and consistently perform group-based multi-trait selection, which supports a balanced genetic gain across multiple traits in breeding programs. A dataset and examples are provided to assist in understanding and reproducing results.
Surgy, S., Cadima, J. & Gonçalves, E., 2025. Integer programming as a powerful tool for polyclonal selection in ancient grapevine varieties. Theor Appl Genet 138, 122. https://doi.org/10.1007/s00122-025-04885-0
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.