Notes and to-do for s6methods branch of Rdistance:

Package Notes:

  1. Expansion Terms: Figure out what's going on with the hermite and simple expansion terms. I think you are missing a scaling factor or something. I think ALL expansion terms should have intercept 1; hence, maybe you should scale all of them by their first element. Or, better yet, or in addition to, you should figure out the b-spline expansions and use them.
  2. Note: You use Simpsons 1/3 composite rule in version >4.0.0. Study precision differences between it and Simpson's alternate composite formula. https://en.wikipedia.org/wiki/Simpson's_rule $$ (dx/48) * (17f(x_0) + 59f(x_1) + 43f(x_2) + 49f(x_3) + 48f(x_4) + 48f(x_5) + ... + 48f(x_n-4) + 49f(x_n-3) + 43f(x_n-2) + 59f(x_n-1) + 17f(x_n)) $$
  3. Consider completely separate function flow for Gamma and smooth likelihoods.
  4. Circles in plots : At present, covariates associated with distances are not used to plot circles. This means if distance i is from observer 1, it will get plotted on all drawn distance functions including those from observer 2 or 3. Consider fixing this and only plot distances from observations associated with covariates being plotted. That is, figure out which observations are associated with the requested covariate classes in newdata. If you do this, you could also plot bars associated with the covariate classes. This is complicated because you will need to tidyr::unite covariate values in the newdata and the model.matrix and figure out which rows from model.matrix are associated with which rows in newdata. Factors are easy. For numeric covariates, you can say a distance is associated with a numeric value of X if it is within 1 standard deviation of X.
  5. Testing Versus Distance: Check parameter values and log likelihoods of every likelihood against what distance gives.
  6. Scaling of logistic: Check scaling of logistic likelihood when covariates are present.
  7. Two param likelihoods: At some stage, after Rdistance is out, consider including two equations for parameters of likelihoods with two parameters. E.g., one formula for 'beta' of Huber likelihood and another for 'gamma', e.g., beta ~ observer and gamma ~ observer. I think this will require a whole new estimation function.
  8. Gamma likelihood: Add the Gamma likelihood back into the package. You took out Gamma at version 4.0.0.
  9. Logistic, Uniform, Triangle, and Huber Likelihoods: Implement logistic and uniform likelihoods. You took them out of version 4.0.0. I think you need to use the EM algorithm. See next note. You placed the old versions in CodeParkingLot/OtherLikelihoods.

  10. Uniform likelihood: Fix this. I am pretty sure you need to use the EM algorithm. See this Stack Exchange post. I have the following: $$ z = 0.5I(w_l < x < \theta) $$ and $$ \theta = \sum(z) / n + w_l $$ this is close to correct, but not. I think the $z$ need to change each iteration. See papers Mukai sent you.

  11. Stepwise variable selection: Write add1 and drop1 and extractAIC so that step function will work with distance function objects.

Testing:

You have most of these testing files drafted in Rdistance/CodeParkingLot/tests/version3.1.1.

  1. Make sure Points are tested
  2. Test methods for all the options (those set in onLoad)
  3. Test methods for type = density in predict(dfunc)
  4. Test units for equality. e.g., set w.hi = "100 m" then w.hi = 100 m in ft and check that answers are same.
  5. Tests that check against distance::ds and other routines. This should become a vignette or paper.
  6. Test NA distances in data set. These should not appear in the distance function but the observation (groupsize) should count toward abundance.
  7. Test NA transect lengths. These should contribute distance observations, but not count toward abundance.

Missing values:

Missing values in an observation contribute to either distance function estimation ('dfunc') or abundance estimation ('abundance') as follows:

tabl <- "
| Distance        | Group Size  | Transect len  | Contributes to:  | Comment                   |
|:---------------:|:-----------:|:-------------:|:----------------:|:-------------------------:|
|  present        |   present   |   present     | dfunc; abundance | usual observation         |
|  MISSING        |   present   |   present     | abundance        |                           |
|  present        |   MISSING   |   present     | dfunc; abundance |for abundance, length only |
|  present        |   present   |   MISSING     | dfunc            |                           |
|  MISSING        |   MISSING   |   present     | abundance        |   so-called zero transect |
|  present        |   MISSING   |   MISSING     | dfunc            |                           |
|  MISSING        |   present   |   MISSING     | nothing          | *ignored*                 |
|  MISSING        |   MISSING   |   MISSING     | nothing          | *does not happen*         |
"

cat(tabl) # output the table in a format good for HTML/PDF/docx conversion

Group size values never contribute to distance function estimation. Distance functions are estimated from distances only.



tmcd82070/Rdistance documentation built on April 13, 2025, 1:38 p.m.