seagull_data: A simulated data set to get quickly started
In seagull: Lasso, Group Lasso, and Sparse-Group Lasso for Mixed Models

This data set contains a genotype matrix, phenotype vectors for three traits (stored in a matrix with three columns) and a vector of groups. The data resembles a sample of a dairy cattle population. The sample includes 1000 individuals and 466 genotypes. For the simulation, a mixed model without fixed effects was used.

A data set which is based on 1000 individuals and 466 explanatory variables.

genotypes: a genotype matrix that contains information from single nucleotide polymorphisms (SNPs). Dimensions are 1000 rows, 466 columns. The data was simulated using the software AlphaSim of which an R-package is available. Each row corresponds to a single individual. 1000 individuals were simulated, where 10 half sib families were created. Each family consists of 100 half sibs. The half sibs share a common Sire. 466 SNPs are available, distributed over 2 chromosomes to an equal amount, i.e., the first 233 SNPs are located on chromosome 1, the remaining SNPs are on the second chromosome. The complementary homozygote genotypes are coded as 0 and 2, respectively. The heterozygote genotype as 1.
groups: a vector of integers which assigns each variable (genotype marker) to a particular group. The clustering was performed via the R package BALD. This package uses linkage disequilibrium as a measure of proximity. In total, 98 groups are available. Group sizes vary from 1 to 23. The median of group sizes is equal to 3. For more details about the distribution of the group sizes, please check out the example on this page: groups.
phenotypes: a matrix consisting of 1000 rows and 3 columns. Each row corresponds to a different individual. Each column corresponds to a different trait. The different traits were simulated to be uncorrelated to one another. The trait in the first, second, and third column have a heritability equal to 0.1, 0.3, and 0.5, respectively.