R/rss.simulation.R

Defines functions rss.simulation

Documented in rss.simulation

#' @details This function simulates balanced or unbalanced RSS sampling. The length of the sample allocation vector (nsamp) must match the set size (H). The dist parameter allows users to select one of three population distributions: 'normal', 't', or 'lognormal'. Additionally, the delta parameter adjusts the population mean. The rho parameter controls ranking accuracy by representing the correlation between the outcome and an auxiliary variable. A value of rho = 1 indicates perfect ranking, while values between 0 and 1 represent varying levels of imperfect ranking using a linear ranking model which is defined as \eqn{X_i = Y_i + \epsilon_i} for \eqn{i=1,2,\cdots, n} where \eqn{\epsilon_i} are independent random variables from \eqn{N(0,\sigma^2)}. The variance \eqn{\sigma^2} is set to obtain a specific correlation between the outcome (Y) and auxiliary (X) variables.
#' @title Generate example ranked set samples
#' @name rss.simulation
#' @description The rss.simulation function generates ranked set samples by simulating data from a specified population distribution ('normal', 't', or 'lognorm") with options to adjust the mean (delta) and control ranking quality (rho).
#'
#' @param H The RSS set size.
#' @param nsamp A numeric vector specifying the sample allocation for each stratum.
#' @param dist A character string specifying the distribution to generate ranked set samples. Must be one of "normal", "t", or "lognorm".
#' @param rho The ranking quality or accuracy indicating the correlation between outcome and auxiliary variable. Must be between 0 and 1 for imperfect ranking, or exactly 1 for perfect ranking.
#' @param delta The true population mean.
#'
#' @return
#' \item{rank}{The rank information assigned to each sample.}
#' \item{y}{The generated ranked set samples based on the specified distribution.}
#' @examples
#' ## Balanced RSS with a set size 3 and equal sample sizes of 6 for each stratum,
#' ## using imperfect ranking from a normal distribution with a mean of 0.
#' rss.data=rss.simulation(H=3,nsamp=c(6,6,6),dist="normal", rho=0.8,delta=0)
#'
#' ## Unbalanced RSS with a set size 3 and different sample sizes of 6, 10, and 8 for each stratum,
#' ## using perfect ranking from a t distribution with a mean of 0.
#' rss.data=rss.simulation(H=3,nsamp=c(6,10,8),dist="t", rho=1,delta=0)
#'
#' # Check the structure of the RSS data
#' colnames(rss.data) # Should include "y" and "rank"
#' head(rss.data$y)
#' head(rss.data$rank)
#'
#' @export
rss.simulation <- function(H, nsamp, dist, rho, delta)
{
  n=sum(nsamp)
  data=matrix(0,n,2)
  dist.set=c("normal","t","lognorm")
  if(H != length(nsamp)) stop("Set size are different with the length of sample allocations", call. = F)
  if(rho <0 || rho >1) stop("Invalid value for rho. It must be between 0 and 1 (imperfect ranking) or exactly 1 (perfect ranking).")

  if(!dist %in% dist.set) stop("Invalid distribution selected. Please choose from 'normal', 't', or 'lognormal'.")

    for(h in (1:H)){
    for(i in (1:nsamp[h])){
      k=i
      if(h>1){
        k=i+sum(nsamp[1:(h-1)])
      }
      if(dist=="normal"){
        tdata1=stats::rnorm(H)+delta
        tdata2=tdata1+stats::rnorm(H)*sqrt((1-rho^2)/rho^2)
      }
      if(dist=="t"){
        tdata1=stats::rt(H,5)+delta
        sigma.err=sqrt((1-rho^2)/rho^2)*sqrt(5/3)
        tdata2=tdata1+stats::rnorm(H)*sigma.err
      }
      if(dist=="lognorm"){
        tdata1=stats::rlnorm(H,0,sqrt(0.481))-1.27+delta
        sigma.err=sqrt((1-rho^2)/rho^2)
        tdata2=tdata1+stats::rnorm(H)*sigma.err
      }
      rtdata2=rank(tdata2,ties.method='first')

      data[k,1]=h
      data[k,2]=tdata1[rtdata2==h]
    }
  }
  colnames(data) <- c("rank","y")
  data<-as.data.frame(data)
  return(data)
}

Try the generalRSS package in your browser

Any scripts or data that you put into this service are public.

generalRSS documentation built on April 4, 2025, 12:19 a.m.