wtd.colMeans | R Documentation |
Returns weighted mean of each column of a data.frame or matrix, based on specified weights, one weight per row.
Relies on weighted.mean()
and unlike wtd.colMeans2()
it also uses data.table::data.table()
wtd.colMeans(x, wts, by = NULL, na.rm = TRUE, dims = 1)
x |
Data.frame or matrix, required. |
wts |
Weights, optional, defaults to 1 which is unweighted, numeric vector of length equal to number of rows |
by |
Optional vector, default is none, that can provide a single column name (as character) or character vector of column names,
specifying what to group by, producing the weighted mean within each group.
See help for |
na.rm |
Logical value, optional, TRUE by default. Defines whether NA values should be removed before result is found. Otherwise result will be NA when any NA is in a vector. |
dims |
dims=1 is default. Not used. integer: Which dimensions are regarded as 'rows' or 'columns' to sum over. For row, the sum or mean is over dimensions dims+1, ...; for col it is over dimensions 1:dims. |
** Not yet handling factor or character fields well.
For a given column of data values,
If just some values are NA (but no wts are NA), and na.rm = TRUE as in default,
returns a weighted mean of all non-NA values.
If just some values are NA (but no wts are NA), and na.rm = FALSE,
returns NA.
If all values are NA (but no wts are NA),
returns NaN.
If any weights are NA, it behaves like stats::weighted.mean, so it
returns NA,
unless each value corresponding to a NA weight is also NA and thus removed.
Note Hmisc::wtd.mean is not exactly same as stats::weighted.mean since na.rm defaults differ
Hmisc::wtd.mean(x, weights=NULL, normwt="ignored", na.rm = TRUE )
Note na.rm defaults differ.
weighted.mean(x, w, ..., na.rm = FALSE)
If by
is not specified, returns a vector of numbers of length equal to number of columns in df.
If by
is specified, returns weighted mean for each column in each subset defined via by
.
# library(analyze.stuff)
wtd.colMeans(data.frame(a = 1:4, b = c(NA, 2, 3, 4)))
wtd.colMeans(data.frame(a = 1:4, b = c(NA, 2, 3, 4)), wts = c(1,1,1,1))
wtd.colMeans(data.frame(a = 1:4, b = c(NA, 2, 3, 4)), wts = c(NA,1,1,1))
wtd.colMeans(data.frame(a = 1:4, b = c(NA, 2, 3, 4)), wts = c(1,NA,1,1))
wtd.colMeans(data.frame(a = 1:4, b = c(NA, 2, NA, 4)), wts = c(1,1,1,1))
wtd.colMeans(data.frame(a = 1:4, b = c(NA, NA, NA, NA)), wts = c(1,1,1,1))
# tests of wtd.colMeans
suppressWarnings({
wtd.colMeans(data.frame(a = 1:4, someNA = c(NA, 2, 3, 4)))
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, 2, 3, 4)), wts = c(1,1,1,1))
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, 2, NA, 4)), wts = c(1,1,1,1))
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, NA, NA, NA)), wts = c(1,1,1,1))
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, 2, 3, 4)), wts = c(NA,1,1,1))
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, 2, 3, 4)), wts = c(1,NA,1,1))
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, 2, 3, 4)), wts = c(1,NA,NA,NA))
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, 2, 3, 4)), wts = c(NA,NA,NA,NA))
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, NA, NA, NA)), wts = c(NA,NA,NA,NA))
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, 2, 3, 4)), na.rm = FALSE)
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, 2, 3, 4)), wts = c(1,1,1,1), na.rm = FALSE)
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, 2, NA, 4)), wts = c(1,1,1,1), na.rm = FALSE)
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, NA, NA, NA)), wts = c(1,1,1,1), na.rm = FALSE)
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, 2, 3, 4)), wts = c(NA,1,1,1), na.rm = FALSE)
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, 2, 3, 4)), wts = c(1,NA,1,1), na.rm = FALSE)
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, 2, 3, 4)), wts = c(1,NA,NA,NA), na.rm = FALSE)
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, 2, 3, 4)), wts = c(NA,NA,NA,NA), na.rm = FALSE)
wtd.colMeans(data.frame(a = 1:4,
someNA = c(NA, NA, NA, NA)), wts = c(NA,NA,NA,NA), na.rm = FALSE)
})
n <- 1e6
mydf <- data.frame(pop = 1000 + abs(rnorm(n, 1000, 200)),
v1 = runif(n, 0, 1),
v2 = rnorm(n, 100, 15),
REGION = c('R1', 'R2', sample(c('R1', 'R2', 'R3'), n-2,
replace = TRUE)),
stringsAsFactors = FALSE)
mydf$pop[mydf$REGION == 'R2'] <- 4 * mydf$pop[mydf$REGION == 'R2']
mydf$v1[mydf$REGION == 'R2'] <- 4 * mydf$v1[mydf$REGION == 'R2']
wtd.colMeans(mydf[ , 1:3])
wtd.colMeans(mydf[ , 1:3], wts = mydf$pop)
wtd.colMeans(mydf, by = 'REGION')
# R HANGS/STUCK: # wtd.colMeans(mydf[1:100, 1:3], by = mydf$REGION,
# wts = mydf$pop)
mydf2 <- data.frame(a = 1:3, b = c(1, 2, NA))
wtd.colMeans(mydf2)
wtd.colMeans(mydf2, na.rm = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.