Now that you've installed and loaded the mvGWAS
package (instructions here), let's run through a typical workflow.
To create an mvGWAS
object, you'll need two things on your filesystem:
library(mvGWAS) my_first_gwas <- mvGWAS$new(phenotype_file = 'my_phenotype_file.csv', genotype_directory = 'genotype_dir')
The first thing to note here is that we didn't define mvGWAS
.
That's because it's defined in the package mvGWAS
and exported.
It is a "reference object generator".
When it's used as above, it generates a reference object.
We've saved the reference object it generated as my_first_gwas
.
This object contains all the methods for conducting an mvGWAS.
So, going forward, my_first_gwas
will be where all the action is.
If you don't want to scan all the VCF files in genotype_directory
, you can also pass in genotype_file_pattern
.
If this argument is set, it restricts the VCF files that will be scanned to only those that match the pattern.
For example, if you only want to scan chromosome 17 and you know that only the VCF files that relate to chromosome 17 will match the pattern 'chr17':
my_second_gwas <- mvGWAS$new(phenotype_file = 'my_phenotype_file.csv', genotype_directory = 'genotype_dir', genotype_file_pattern = 'chr17')
We'll imagine that the handful of VCF files in genotype_dir
represent the genome.
Since there's not much data, we can conduct the whole scan on the local machine.
We'll use the conduct_scan
function, which takes two arguments: mean_formula
and var_formula
.
They use the model formula system that is typical in R.
mean_formula
is a two-sided formula.
Its right side is just the phenotype to be analyzed.
Its left side is covariates and keywords, separated by +
.
There are three keywords available:
var_formula
is a one-sided formula.
It has no left side.
Its right side follows the same rules as mean_formula
.
my_gwas$conduct_scan(mean_formula = my_trait ~ PC1 + PC2 + sex + age + DS, var_formula = ~ PC1 + PC2 + sex + GT)
By default, conduct_scan()
runs on the local machine, using as many cores as are available (based on parallel::detectCores()
).
If you have a multi-core computer but don't want to use them all in conduct_scan()
, you can pass in the maximum number of cores to use as num_cores
.
For example:
my_gwas$conduct_scan(mean_formula = my_trait ~ PC1 + PC2 + sex + age + DS, var_formula = ~ PC1 + PC2 + sex + GT, num_cores = 2)
If conduct_scan()
returns TRUE
, it worked without error and the results are stored in my_gwas$results
.
There are currently two plotting functions, qq_plot
and manhattan_plot
.
They are used as follows:
my_gwas$qq_plot() my_gwas$manhattan_plot()
Because there are up to three hypotheses being tested, there are up to three null models. Only the null model for the joint test
TODO...
# todo
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.