flag_outliers: Flag outlier observations

Description Usage Arguments Details Value Author(s)

View source: R/flag_outliers.R

Description

Flags observations that are considered outliers

Usage

1
2
flag_outliers(data, sd_for_outlier = 2, flag_by = "facility",
  result = "outliers")

Arguments

data

The ANC-RT dataset. The functions check_data, data_clean and mt_adjust should have been run on the data to properly prepare the data for use here. The dataset must have the following variables:

  • faciluid: Facility ID.

  • time: The time period over which the data was collected.

  • n_clients: The number of women from the specified facility, during the specified time period, that attended their first ANC visit.

  • n_status_c: The cleaned number of women from the specified facility, during the specified time period, that had their HIV status ascertained at their first ANC visit, either by testing or through previous knowledge (generated using the data_clean function).

  • testpos_c: The cleaned number of women from the specified facility, during the specified time period, that tested positive for HIV at their first ANC visit (generated using the data_clean function).

  • knownpos_c: The cleaned number of women from the specified facility, during the specified time period, that already knew that they were HIV-positive at their first ANC visit (generated using the data_clean function).

  • testneg_c: The cleaned number of women from the specified facility, during the specified time period, that tested negative for HIV at their first ANC visit (generated using the data_clean function).

  • totpos_c: The cleaned total number of positive HIV cases (generated using the data_clean function).

  • prv: The HIV prevalence from the specified facility at the specified time period (generated using the mt_adjust function).

  • cov: The HIV testing coverage from the specified facility at the specified time period (generated using the mt_adjust function).

  • snu1: The subnational unit 1 (only required if results are to be flagged by snu1).

sd_for_outlier

Standard deviation used to flag outliers (default is 2).

flag_by

Options include:

  • "facility" compares each observation's value to their facility's mean value and flags the observations that are greater than or less than 2 standard deviations from the facility mean.

  • "snu1" compares each observation's value to their sub national unit 1's mean value and flags the observations that are greater than or less than 2 standard deviations from the snu1 mean.

  • "country" compares each observation's value to their country's mean value and flags the observations that are greater than or less than 2 standard deviations from the country mean.

result

Options include:

  • "outliers" returns a dataset including the observations that are considered to have an outlier value for any of: n_clients, n_status_c, testpos_c, testneg_c, knownpos_c, totpos_c, prv or cov. The values for each of the eight variables are only reported if they are considered an outlier. If they are not considered an outlier, they are reported as "NA". For identification purposes faciluid and time are also included.

  • "data" returns the complete dataset (that was originally input into the function) with the following additional variables:

    • flag_n_clients: A value of 1 indicates that the n_clients value is considered an outlier and a value of 0 indicates that it is not considered an outlier.

    • flag_n_status_c: A value of 1 indicates that the n_status_c value is considered an outlier and a value of 0 indicates that it is not considered an outlier.

    • flag_testpos_c: A value of 1 indicates that the testpos_c value is considered an outlier and a value of 0 indicates that it is not considered an outlier.

    • flag_testneg_c: A value of 1 indicates that the testneg_c value is considered an outlier and a value of 0 indicates that it is not considered an outlier.

    • flag_knownpos_c: A value of 1 indicates that the knownpos_c value is considered an outlier and a value of 0 indicates that it is not considered an outlier.

    • flag_totpos_c: A value of 1 indicates that the totpos_c value is considered an outlier and a value of 0 indicates that it is not considered an outlier.

    • flag_prv: A value of 1 indicates that the prv value is considered an outlier and a value of 0 indicates that it is not considered an outlier.

    • flag_cov: A value of 1 indicates that the cov value is considered an outlier and a value of 0 indicates that it is not considered an outlier.

Details

This function has been developed to flag outlier observations for the following variables: n_clients, n_status_c, testpos_c, testneg_c, knownpos_c, totpos_c, prv and cov. Outliers are defined as 2 standard deviations greater than or less than the mean value.

Value

A dataset including either the flagged observations only or the full, original dataset with additional variables indicating flagged observations, according to user inputs.

Author(s)

Mathieu Maheu-Giroux

Brittany Blouin


brittanyblouin/ANCRTAdjust documentation built on Oct. 28, 2019, 4:53 a.m.