ds.PRS: Get Ploygenic Risk Score
In isglobal-brge/dsOmicsClient: DataSHIELD client site Omics association functions.

ds.PRS

R Documentation

Get Ploygenic Risk Score

Description

Get Ploygenic Risk Score

Usage

ds.PRS(
  resources,
  pgs_id = NULL,
  prs_table = NULL,
  table = NULL,
  table_id_column = NULL,
  table_prs_name = NULL,
  snp_threshold = 90,
  snp_assoc = FALSE,
  datasources = NULL
)

Arguments

`resources`	`list` of all the VCF resources with biallelic genotype information. It is advised to have one VCF resource per chromosome, a big VCF file with all the information is always slower to use.
`pgs_id`	`character` (default `NULL`) ID of the PGS catalog to be used to calculate the polygenic risk score. Polygenic Score ID & Name from https://www.pgscatalog.org/browse/scores/
`table`	`character` (default `NULL`) If not `NULL`, it is the name of the table (on the server(s)) that will be used to merge the PRS results (typically a phenotypes table).
`table_id_column`	`character` (default `NULL`) Argument only used when the `table` argument is supplied, it corresponds to the column name of the `table` that contains the individual IDs to perform the merge.
`table_prs_name`	`character` (default `NULL`) If not `NULL` it's the name that will be used to design the column names added to `table`. Read the details for further information.
`snp_threshold`	`numeric` (default `90`) Threshold to drop individuals. See details for further information.
`datasources`	a list of `DSConnection-class` (default `NULL`) objects obtained after login

Details

This function resolves a list of resources subsetting them by the SNPs of risk, this does not ensure that all the SNPs of risk will be found on the data. From all the found SNPs of risk, if an individual has less than 'snp_threshold' (percetage) of SNPs with data, it will be dropped (SNP with no data is marked on the VCF as ./.). If an individual passes this threshold filter but still has SNPs with no data, those SNPs will be counted on the polygenic risk score as non-risk-alleles, to take this infomation into account, the number of SNPs with data for each individual is returned as 'n_snps'.

When using a user provided prs_table table instead of a PGS catalog ID to calculate the PRS, it is important to note that the provided data.frame has to have a very strict structure regarding column names (order is not relevant). Please follow one of this two schemas:
- Schema 1 (provide SNP positions):
+ "chr_name", "chr_position", "effect_allele", "reference_allele", "effect_weight"

- Schema 2 (provide SNP id's):
+ "rsID", "effect_allele", "reference_allele", "effect_weight"

It is important to note that this "effect_weight" corresponds to the beta value of the SNP (log(OR)).

As a rule of thumb, it is advised to use when possible the Schema 1 (provide SNP positions), as the implementation to subset the VCF files is miles faster.

Since the actual results of the PRS is sensitive information, the results are not returned to the client, however they can be merged into a table on the server(s). The main use of that is to add the PRS results to a phenotypes table and assess relationships between PRS scores and the phenotypes. This merge is performed via the individuals ID, specified on the argument (table_id_column); the table is specified using the argument table. When merging the results to a table, by default the column names will be:
- When using pgs_id:
+ prs_pgs_id
+ prs_nw_pgs_id
- When using prs_table:
+ prs_prs_custom_results
+ prs_nw_prs_custom_results

If another designation is desired, make use of the table_prs_name argument, which by default is NULL. Note that this parameter only changes the tail of the names, the columns added (2) will begin by prs_ and prs_nw_. This columns correspond to the actual PRS calculated and the PRS without weights (or PRS where all weights equal 1).