hdfail: Hard drive failure dataset

hdfailR Documentation

Hard drive failure dataset

Description

This dataset contains the observed follow-up times and SMART statistics of 52k unique hard drives.

Daily snapshots of a large backup storage provider over 2 years were made publicly available. On each day, the Self-Monitoring, Analysis, and Reporting Technology (SMART) statistics of operational drives are recorded. When a hard drive is no longer operational, it is marked as a failure and removed from the subsequent daily snapshots. New hard drives are also continuously added to the population. In total, there are over 52k unique hard drives over approximately 2 years and 2885 (5.5%) failures.

Usage

data("hdfail")

Format

A data frame with 52422 observations on the following 8 variables.

serial

unique serial number of the hard drive

model

hard drive model

time

the observed followup time

status

failure indicator

temp

temperature in Celsius

rsc

binary covariate, where 1 indicates sectors that encountered read, write, or verification errors

rer

binary covariate, where 1 indicates a non-zero rate of errors that occur in hardware when reading from data from disk.

psc

binary covariate, where 1 indicates there were sectors waiting to be remapped due to an unrecoverable error.

Source

https://www.backblaze.com/cloud-storage/resources/hard-drive-test-data

Examples

## Not run: 
data(hdfail)

# Select only Western Digital hard drives
dat <- subset(hdfail, grepl("WDC", model))

fit.hd <- fitfrail(Surv(time, status) ~ temp + rer + rsc 
                                      + psc + cluster(model), 
                   dat, frailty="gamma", fitmethod="score")

fit.hd

## End(Not run)

frailtySurv documentation built on Aug. 14, 2023, 1:06 a.m.