Description Usage Arguments Details Value Author(s) References Examples

When a gene or a genetic region is significantly associated with a
disease/trait, the rvsel procedure can distinguish causal (risk or
protective) rare variants from noncausal rare variants located within
the same gene or the same genetic region.

The most outcome-related rare variants are selected within a gene or a
genetic region, considering all possible combinations of rare variants.
First, genetic data of each individual is combined into one dimensional
numeric vector based on one of the following methods; a weighted linear
combination of a subset of rare variants, an adaptive weighted linear
combination of a subset of rare variants and a combined multivariate and
collapsing method.

Next, one of the selection procedures such as exhaustive search, forward
selection, backward selection, and forward based both risk and protective
search is conducted to identify causal rare variants within a gene or a
genetic region.

1 2 3 4 |

`x` |
The number of genetic mutations with |

`y` |
A phenotype outcome is coded as 1 for cases and 0 for controls if the phenotype is case-control binary data. Otherwise, it is considered as a quantitative outcome. |

`cx` |
Covariates such as gender and age. It should be a |

`weight` |
User defined weights for |

`family` |
A type of phenotype data. " |

`method` |
A way to combine genetic data. " |

`selection` |
A type of selection procedure. " |

`ad.alpha` |
A significance level of a marginal association test
to detect potential protective variants when " |

`lambda` |
A tuning parameter value used for a stopping rule, when
" |

The method "`sum`

" employs a weighted linear combination of the
subset of *p* rare variants to combine the rare variants. The weighted
linear combination of the *i*th individual is
defined as

*z_i=∑_{j=1}^p ξ_j w_j x_{ij},*

where *ξ_j=1* if the *j*th variant is included in a model,
otherwise *ξ_j=0*. *w_j* is a user defined weight of the
*j*th variant. The method "`asum`

" replace *x_{ij}*
by *x_{ij}^**, where *x_{ij}^*=1-x_{ij}* if the *j*th variant
is potentially protective variant. Otherwise, *x_{ij}^*=x_{ij}*.
If the p-value of an marginal association test between the *j*th
variant and a phenotype outcome is less than "`ad.alpha`

" and they
have a negative relationship, the *j*th variant is considered as
potentially protective. The method "`cmc`

" combines the *p* rare
variants such as

*z_i=I≤ft(∑_{j=1}^p ξ_j x_{ij} > 0\right),*

where *I(\cdot)* is an indicator function.

The selection procedure "`exhaustive`

" generates *2^p-1* subsets of
the power set of *p* rare variants, where an empty set is excluded
since the selection procedure assumes that at least one variant is causal.
The best combination of rare variants among the *2^p-1* subsets
that can maximize the association with a phenotype outcome is selected as
a final model. When *p* is relatively large, computational time of
"`exhaustive`

" is exponentially increased. Either "`forward`

"
or "`backward`

" selection is desirable for a relatively large *p*.
`Fsel`

is a different selection procedure from others based on a
weighted linear combination. It defines a weighted linear combination
of the subset of *p* rare variants as

*z_i=∑_{j=1}^p ξ_j w_j x_{ij}^*,*

where
*ξ_j=1*, *-1* or *0* if the *j*th variant is
risk, protective or noncausal variant, respectively. Also,
*x_{ij}^*=-(1-x_{ij})* if the *j*th variant is protective, otherwise,
*x_{ij}^*=x_{ij}*. `Fsel`

can be performed based only on forward
selection procedure.

`model` |
Types of " |

`selection` |
The selection result of |

`score` |
The largest sample correlation between the combined genotypes and a phenotypic outcome, which can be replaced by a regression residual if a covariate exists). |

`sequence` |
When " |

Hokeun Sun <[email protected]>

S. Kim, K. Lee, and H. Sun (2015)
*Statistical Selection Strategy for Risk and Protective Rare Variants
Associated with Complex Traits*, Journal of Computational Biology 22(11),
1034–1043

H. Sun and S. Wang (2014)
*A Power Set Based Statistical Selection Procedure to Locate
Susceptible Rare Variants Associated with Complex Traits with Sequencing
Data*, Bioinformatics 30(16), 2317–2323

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | ```
# Generate simulation data
n <- 2000
p <- 10
MAF <- runif(p,0.001,0.01)
geno.prob <- rbind((1-MAF)^2,2*(1-MAF)*MAF,MAF^2)
x <- apply(geno.prob,2,function(x) sample(0:2,n,prob=x,replace=TRUE))
cx <- cbind(rnorm(n),sample(0:1,n,replace=TRUE))
beta <- c(rep(1,4),rep(0,6))
y <- cx %*% c(0.5,0.5)+ x %*% beta+rnorm(n)
# method = 'asum' and selection = 'exhaustive'
g <- rvsel(x,y,cx=cx)
# selection = 'Fsel'
g <- rvsel(x,y,cx=cx,selection="Fsel")
# Both risk and protective variants are present
n <- 2000
p <- 10
MAF <- runif(p,0.001,0.01)
geno.prob <- rbind((1-MAF)^2,2*(1-MAF)*MAF,MAF^2)
x <- apply(geno.prob,2,function(x) sample(0:2,n,prob=x,replace=TRUE))
cx <- cbind(rnorm(n),sample(0:1,n,replace=TRUE))
beta <- c(rep(1,2),rep(-1,2), rep(0,6))
y <- cx %*% c(0.5,0.5)+ x %*% beta+rnorm(n)
# method = 'cmc' and selection = 'exhaustive'
g <- rvsel(x,y,cx=cx,method="cmc")
# selection = 'Fsel'
g <- rvsel(x,y,cx=cx,selection="Fsel")
# A big gene simulation
n <- 2000
p <- 50
MAF <- runif(p,0.001,0.01)
geno.prob <- rbind((1-MAF)^2,2*(1-MAF)*MAF,MAF^2)
x <- apply(geno.prob,2,function(x) sample(0:2,n,prob=x,replace=TRUE))
cx <- cbind(rnorm(n),sample(0:1,n,replace=TRUE))
beta <- c(rep(1,8),rep(0,42))
y <- cx %*% c(0.5,0.5)+ x %*% beta+rnorm(n)
# method = 'asum' and selection = 'forward'
## Not run: g <- rvsel(x,y,cx=cx,selection="forward")
# selection = 'Fsel'
## Not run: g <- rvsel(x,y,cx=cx,selection="Fsel", lambda=0.01)
``` |

rvsel documentation built on April 14, 2017, 7:15 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.