# mst.mle: Maximum likelihood estimation for a (multivariate) skew-t... In doppelgangR: Identify likely duplicate samples from genomic or meta-data

## Description

Fits a skew-t (ST) or multivariate skew-t (MST) distribution to data, or fits a linear regression model with (multivariate) skew-t errors, using maximum likelihood estimation. Functions copied from `sn` CRAN library v0.4.18 because they were later deprecated in that library.

## Usage

 ```1 2 3 4``` ```mst.mle(X, y, freq, start, fixed.df=NA, trace=FALSE, algorithm = c("nlminb","Nelder-Mead", "BFGS", "CG", "SANN"), control=list()) st.mle(X, y, freq, start, fixed.df=NA, trace=FALSE, algorithm = c("nlminb","Nelder-Mead", "BFGS", "CG", "SANN"), control=list()) ```

## Arguments

 `y` a matrix (for `mst.mle`) or a vector (for `st.mle`). If `y` is a matrix, rows refer to observations, and columns to components of the multivariate distribution. `X` a matrix of covariate values. If missing, a one-column matrix of 1's is created; otherwise, it must have the same number of rows of `y`. If `X` is supplied, then it must include a column of 1's. `freq` a vector of weights. If missing, a vector of 1's is created; otherwise it must have length equal to the number of rows of `y`. `start` for `mst.mle`, a list contaning the components `beta`,`Omega`, `alpha`, `df` of the type described below; for `st.mle`, a vector whose components contain analogous ingredients as before, with the exception that the scale parameter is the square root of `Omega`. In both cases, the `dp` component of the returned list from a previous call has the required format and it can be used as a new `start`. If the `start` parameter is missing, initial values are selected by the function. `fixed.df` a scalar value containing the degrees of freedom (df), if these must be taked as fixed, or `NA` (default value) if `df` is a parameter to be estimated. `trace` logical value which controls printing of the algorithm convergence. If `trace=TRUE`, details are printed. Default value is `FALSE`. `algorithm` a character string which selects the numerical optimization procedure used to maximize the loglikelihood function. If this string is set equal to `"nlminb"`, then this function is called; in all other cases, `optim` is called, with `method` set equal to the given string. Default value is `"nlminb"`. `control` this parameter is passed to the chose optimizer, either `nlminb` or `optim`; see the documentation of this function for its usage.

## Details

If `y` is a vector and it is supplied to `mst.mle`, then it is converted to a one-column matrix, and a scalar skew-t distribution is fitted. This is also the mechanism used by `st.mle` which is simply an interface to `mst.mle`.

The parameter `freq` is intended for use with grouped data, setting the values of `y` equal to the central values of the cells; in this case the resulting estimate is an approximation to the exact maximum likelihood estimate. If `freq` is not set, exact maximum likelihood estimation is performed.

Numerical search of the maximum likelihood estimates is performed in a suitable re-parameterization of the original parameters with aid of the selected optimizer (`nlminb` or `optim`) which is supplied with the derivatives of the log-likelihood function. Notice that, in case the optimizer is `optim`), the gradient may or may not be used, depending on which specific method has been selected. On exit from the optimizer, an inverse transformation of the parameters is performed. For a specific description on the re-parametrization adopted, see Section 5.1 and Appendix B of Azzalini \& Capitanio (2003).

## Value

A list containing the following components:

 `call` a string containing the calling statement. `dp` for `mst.mle`, this is a list containing the direct parameters `beta`, `Omega`, `alpha`. Here, `beta` is a matrix of regression coefficients with `dim(beta)=c(ncol(X),ncol(y))`, `Omega` is a covariance matrix of order `ncol(y)`, `alpha` is a vector of shape parameters of length `ncol(y)`. For `st.mle`, `dp` is a vector of length `ncol(X)+3`, containing `c(beta, omega, alpha, df)`, where `omega` is the square root of `Omega`. `se` a list containing the components `beta`, `alpha`, `info`. Here, `beta` and `alpha` are the standard errors for the corresponding point estimates; `info` is the observed information matrix for the working parameter, as explained below. `algorithm` the list returned by the chose optimizer, either `nlminb` or `optim`, plus an item with the `name` of the selected algorithm; see the documentation of either `nlminb` or `optim` for explanation of the other components.

## Background

The family of multivariate skew-t distributions is an extension of the multivariate Student's t family, via the introduction of a `shape` parameter which regulates skewness; when `shape=0`, the skew-t distribution reduces to the usual t distribution. When `df=Inf` the distribution reduces to the multivariate skew-normal one; see `dmsn`. See the reference below for additional information.

## References

Azzalini, A. and Capitanio, A. (2003). Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t distribution. The full version of the paper published in abriged form in J.Roy. Statist. Soc. B 65, 367–389, is available at http://azzalini.stat.unipd.it/SN/se-ext.ps

`dst`

## Examples

 ```1 2 3``` ```dat <- rt(100, df=5, ncp=100) fit <- st.mle(y=dat) fit ```

### Example output

```Loading required package: Biobase

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

The following objects are masked from 'package:base':

Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colMeans, colSums, colnames, do.call,
duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
setdiff, sort, table, tapply, union, unique, unsplit, which,
which.max, which.min

Welcome to Bioconductor

Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.

\$call
st.mle(y = dat)

\$dp
location    scale    shape       df
59.89582 56.31529 23.82501 19.35569

\$se
scale       dat
2.522929  7.092954 25.077751 32.205560

\$logL
[1] -483.9759

\$algorithm
\$algorithm\$par
location                 shape
59.8958166  4.0309662  0.4230647  2.9629863

\$algorithm\$objective
[1] 967.9518

\$algorithm\$convergence
[1] 0

\$algorithm\$iterations
[1] 22

\$algorithm\$evaluations
28       23

\$algorithm\$message
[1] "relative convergence (4)"

\$algorithm\$value
[1] 967.9518

\$algorithm\$name
[1] "nlminb"
```

doppelgangR documentation built on Nov. 17, 2017, 1:46 p.m.