SSI: Similar Student Index

Description Usage Arguments Details Value Author(s) Examples

View source: R/SSI.R

Description

The similar student index uses a K nearest neighbor algorithm to generate a set of conditional norms for the outcome variable. The conditional norm is constructed on the basis of the K students in the data most like student i who are used as the comparison set

Usage

1
2
3
4
5
SSI(...)
## Default S3 method:
SSI(mf, y, k, ...)
## S3 method for class 'formula'
SSI(formula, data, id, k, na.action, subset, ...)

Arguments

formula

a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs also numeric variables giving the conditioning variables used to identify the nearest neighbor.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which SSI is called.

na.action

a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").

subset

an optional vector specifying a subset of observations to be used.

id

the individual (student) id identifying records in the data

k

the number of nearest neighbors to choose. k cannot be larger than the total number of pairwise comparisons in the data.

mf

a model frame with the variables used for conditioning. Only implemented for the default method.

y

the numeric outcome variable. Only implemented for the default method

...

Not implemented

Details

Implementation of the K nearest neighbor method is based on the euclidean distance metric. Because the process identifies the k nearest neighbors for each record in the data, the process can be relatively slow, executing in O(n^2logn)

Value

A list with class "SSI" containing the following components:

Zscore

the conditional z-score for each record in the data)

percentile

the conditional percentile for each reocrd in the data

ID

the individual's record id

Iterations

the number of Newton-Raphson iterations used

model.frame

the data matrix used for estimating the conditional norms. This data frame can differ from the original data depending on the use of na.action.

Author(s)

Harold Doran

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## Generate sample data
## construct a norm for the math score based on the k = 20
## other individuals in the data most like student i. 
## readScore and scienceScore are used as the conditioning variables
## to compute the euclidean norm.
set.seed(1234)
tmp <- data.frame(ID = 1:100, mathScore = rnorm(100), readScore = rnorm(100), scienceScore = rnorm(100))
(result <- SSI(mathScore ~ readScore + scienceScore, tmp, k = 20, id=ID, na.action = na.omit))
summary(result)
str(result)
head(result$model.frame)

wasabi1989/MiscPsycho documentation built on Jan. 19, 2020, 12:29 a.m.