# SmoothWin: Implementation of the soft windowing for linear models In SmoothWin: Soft Windowing on Linear Regression

## Description

Implementation of the (symmetric) soft windowing on a range of methods/models by imposing weights on the model.

- The function accepts a model fit, such as 'lm', 'lme', 'glm' etc., as the input and fits a window to it.

- The parameters "k" and "l" control the shape and bandwidth of the windowing function respectively.

- There are several other parameters to cope with the different scenarios/models/window shapes.

- The default settings of the function is adapted to International Mouse Phenotyping Consortium (IMPC) statistical pipeline

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49``` ```SmoothWin(object , data , t , m , l = function(ignore.me.in.default) { r = SmoothWin:::lseq( from = 1 , to = max(abs(t[m] - min(t, na.rm = TRUE)) , abs(t[m] - max(t, na.rm = TRUE)), 1) , length.out = min(500, max(1, diff(range( t,na.rm = TRUE )))) ) r = unique(round(r)) return(r) } , k = SmoothWin:::lseq(from = .5 , to = 10 , length.out = 50) , min.obs = function(ignore.me.in.default) { lutm = length(unique(t[m])) r = ifelse(lutm > 1, 35, max(pi * sqrt(length(t)), 35)) r = max(r * lutm, length(m), na.rm = TRUE) r = min(r , length(t), na.rm = TRUE) return(r) } , direction = c(1, 1) , weightFUN = function(x) { x } , residFun = function(x) { resid(x) } , predictFun = function(x) { predict(x) } , weightORthreshold = 'weight' , cdf = plogis , check = 2 , sensitivity = c(1, 1, 1, 0) , pvalThreshold = c(0, 0, 0, 0) , threshold = sqrt(.Machine\$double.eps) * 10 , zeroCompensation = 0 , messages = TRUE , seed = NULL , simple.output = FALSE , debug = FALSE , ...) ```

## Arguments

 `object` The fitted model. The object must support 'update(weights =)'. See examples `data` data.frame. Input data that is used to fit the initial model `t` Vector of (numeric) time values. `m` Vector of integers (peaks). Mode indices on the time component. For example 10, 11, 12. Note that it is different from t, t, t `l` Vector of numeric values for the bandwidth parameter, l. The default uses the maximum distance of the modes (t[m]) from the time boundaries, max(max(t)-t[m],t[m]-min(t)) split on 500 points on the logarithmic scale. `k` Vector of numeric values for the shape parameter, k. The default uses 50 splits of the values from 0.5 to 10 on the logarithmic scale. `min.obs` Single value. The minimum observations/sum weight scores (SWS) that must be in the total window(s). The default uses the following steps. 1. If there are more than one modes (peaks) in the data, then: 35*(the number of the unique modes) 2. If there is a single mode in the data, then: max(pi * sqrt(length(t)), 35) * min.obs must not be less than the total number of observations in the mode time(s). For example, it can not be less than the number of mutant animals in the IMPC application. ** to function properly, min.obs should be less than the total number of observations *** min.obs is applied on the total number of observations on all windows NOT each single window **** if weightORthreshold='weight' then min.obs will be evaluated against SWS `direction` Vector of two non-negative values. A non-negative vector of the form c(Left,right), for example c(1,1) [default] or c(0.5,0.5) or c(0,1). The first element specifies the speed of expansion of the window(s) from the left and the second value for the right expansion. Setting to c(0,1) and c(1,0) lead to right and left expansions of the window(s) respectively. Default c(1,1) that is the window(s) expand symmetrically from both sides. `weightFUN` Weight function. By default, a vector of weights called "ModelWeight" is passed to this function. See the examples. `residFun` Residual computation function. The default is 'resid()'. However, the the user can define its own function. Note that the input of the function is the model object. The default is residFun = function(object){resid(object)} `predictFun` Similar to residFun but instead defines the 'predict()' function. The default is predictFun = function(object){predict(object)} `weightORthreshold` select between 'weight' (default) or 'threshold'. If set to 'weight' then the sum of weights (Sum Weight Score (SWS)) would be used as the total number of (active) observations in the window, otherwise, total number of weights (count of weights) that are greater than a threshold (see 'threshold' below) (count weights>=threshold) would be used for the total number of samples in the window (see 'threshold'). `cdf` A cdf function preferably symmetric. The cdf function is used for the (window) weight generating function (WGF). The cdf function must have two parameters precisely a location such as mean and a scale. Standard cdf functions such as 'pnorm', 'pcauchy' and 'plogis' (default) can be used. For an example of custom made function we define uniform function as below: punif0 = function(x,mean=.5,sd=sqrt(1/12)){ a = mean - sqrt(3) *sd; b = mean + sqrt(3) *sd; r = punif(q = x,min = a,max = b); return(r) } `check` Single integer in {0,1,2}: - check=1, the function selects the times (t) with more than one observations. Further, the function only selects the values with weights greater than the 'threshold' (see threshold below). Mostly useful in fitting linear mixed model - check=2 (default), the function only selects the values with weights greater than the 'threshold' parameter - check=0, disables all checks `sensitivity` (Default window selection criteria) Vector of four values (m, v, m*v, normality_test). For example (default) c(1,1,1,0) specifies the same weights for mean, variance, mean*variance interaction and zero weight for the test of normality (shapiro.test) in determining the optimal (final) window. We should stress that the window size is calculated by detecting the changes amongst the consecutive means (two sample t.test) and variances (two sample var.test) (as well as the normality of the first set) from each of predictFun() and residFun() combined together. For example, (m, v, m*v, normality_test) is calculated for predictFun() and the same for residFun(), then two means are combined under the 'sensitivity'; and the same for variance, interactions and the normality. `pvalThreshold` Vector of four values. It would be used as the significant level for the mean, variation and normality tests (for more details see 'sensitivity' above). If all zero (default) or (all) negative (<= 0) then the internal adaptive method (sensitivity - see above) would be used. `threshold` Single positive value. The minimum value for weights before removing the corresponded samples, given check=1 or check=2 and also in 'weightORthreshold'. Default sqrt(.Machine\$double.eps)*10 ~ 10^-7 `zeroCompensation` Single non-negative value. Setting to any non-negative value would replace all (weights =< zeroCompensation) with 'zeroCompensation'. Useful for algorithms that have difficulties with zero. Default 0. `messages` Logical value. Set to TRUE (default) to see the errors and warnings `seed` seed. Default NULL `simple.output` Logical flag. Setting to TRUE leads to not exporting the list of models for l and k. Useful for preventing memory overflow. Default FALSE `debug` Logical flag. Setting to TRUE will show some plots for the parameter selection step. Useful for debuging. Default FALSE `...` Other parameters that can be passed to the weightFUN()

## Value

 `final.k, final.l ` Final values for k and l `model.l, model.k, finalModel` List of models for l, k and the final model. `others` The input parameters such as x, y, t and so on

## Author(s)

`expWeight`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189``` ```################################################# #################### Example in the manuscript ################################################# set.seed(1234) par(mfrow = c(3, 1)) ############################# # Simulating data ############################# n = 60 t = 1:n sd = 1 m = n / 2 x = t y = c(0 * x[t <= n / 3] , x[t < 2 * n / 3 & t > n / 3] * 1 , 0 * x[t >= 2 * n / 3]) + rnorm(n, 0, sd) # True weights w = weights = expWeight( t = t , k = 5 , l = n/6 , m = m , plot = 0 ) ############################# # Fitting and ploting data and models ############################# l = lm(y ~ x, weights = w) plot( x , y , ylim = c(min(y), max(y) * 1.5) , col = t %in% seq(n / 3 + 1, 2 * n / 3 - 1) + 1, cex = 1.5 , pch = 16 , xlab = 'Time' , main = 'Simulated data' ) abline(v = x[c(n / 3 + 1, 2 * n / 3 - 1)], lty = 2 , lwd = 4 , col = 'gray') abline(l, col = 2 , lty = 2, lwd = 4) abline(lm(y ~ x) , col = 3 , lty = 3 , lwd = 4) plot( t, w , type = 'b' , main = 'True weights', ylab = 'Weight' , xlab = 'Time' ) ############################# # Fitting the Windowing model ############################# r = SmoothWin( object = l , data = data.frame(y = y, x = x), t = t , m = m , min.obs = 4 , debug = FALSE ) ############################# # Plot fitted (windowed) model ############################# plot(r, main = 'Estimated weights from WGF') ################################################# #################### Other examples ################################################# # All examples import the Orthodont dataset from the nlme package library(nlme) # Sort the data on the time component (age) Orthodont = Orthodont[order(Orthodont\$age), ] ############################# # Modes ############################# mode = which(Orthodont\$age %in% c(12)) ############################# # Time component ############################# time = Orthodont\$age f = formula(distance ~ Sex) ################################################# #################### Examples ################### ################################################# ### Example 1. Linear model ############################# # Method 1 (recommanded) ############################# lm = do.call('lm', list(formula = f, data = Orthodont)) rm(f) ############################# # Method 2 (can cause error if you pass the formula to the lm function) # lm = lm(distance ~ Sex, data = Orthodont) ############################# lm.result = SmoothWin( object = lm, data = Orthodont, t = time, m = mode, check = 0, weightFUN = function(x) { x }, debug = TRUE ) plot( lm.result, col = Orthodont\$Sex, pch = as.integer(Orthodont\$Sex), main = 'Simple liner model' ) ############################# #### Example 2. Linear Model Using Generalized Least Squares # Method 1 (recommanded) ############################# f = formula(distance ~ Sex) gls = do.call('gls', list(model = f, data = Orthodont)) rm(f) ############################# # Method 2 (can cause error if you pass the formula to the gls function) # gls = gls(distance ~ Sex, data = Orthodont) ############################# gls.result = SmoothWin( object = gls, data = Orthodont, t = time, m = mode, check = 2, weightFUN = function(ignore.me) { varFixed(~ 1 / ModelWeight) #nlme package uses the inverse weights }, debug = TRUE ) plot( gls.result, col = Orthodont\$Sex, pch = as.integer(Orthodont\$Sex), main = 'Linear model using GLS' ) ################################################# #### Example 3. Linear mixed model ################################################# # Method 1 (recommanded) ############################# fixed = formula(distance ~ Sex) random = formula(~ 1 | Subject) lme = do.call('lme', list( fixed = fixed, random = random, data = Orthodont )) rm(fixed, random) ############################# # Method 2 (can cause error if you pass the formula to the lme function) # lme = lme(fixed = distance ~ Sex, random=~1|Subject , data = Orthodont) ############################# lme.result = SmoothWin( object = lme, data = Orthodont, t = time, m = mode, # Remove zero weights as well as single observation dates check = 1, weightFUN = function(ignore.me) { varFixed(~ 1 / ModelWeight) }, debug = TRUE ) plot( lme.result, col = Orthodont\$Sex, pch = as.integer(Orthodont\$Sex), main = 'Linear mixed model' ) ```