There are six versions of the datasets in the folder.
dataDictionary.pdf
has information for all of the covariates - we use this in an exercise to understand the datasets.
full
means the full patient cohort (~500K patients).
fullPatientData.csv
- the entire patient cohortfullDataTrainSet.csv
- a subset of the patients subsetted for training a modelfullDataTestSet.csv
- a subset of the patients subsetted for testing a modelgeno
means a smaller cohort (~50K patients) who have additional SNP covariates:
genoData.csv
- the entire genotyped cohortgenoDataTestSet.csv
- subset of genotyped cohort for training a modelgenoDataTrainSet.csv
- subset of genotyped cohort for testing a modelAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.