# HPAstainR: HPAStainR In tnieuwe/HPAStainR_package: Queries the Human Protein Atlas Staining Data for Multiple Proteins and Genes

 HPAStainR R Documentation

## HPAStainR

### Description

Uses a protein/gene list to query Human Protein Atlas (HPA) staining data.

### Usage

HPAStainR(
gene_list,
hpa_dat,
cancer_dat = data.frame(),
cancer_analysis = c("normal", "cancer", "both"),
tissue_level = TRUE,
stringency = c("normal", "high", "low"),
scale_abundance = TRUE,
round_to = 2,
csv_names = TRUE,
stained_gene_data = TRUE,
tested_protein_column = TRUE,
percent_or_count = c("percent", "count", "both"),
drop_na_row = FALSE,
test_type = c("fisher", "chi square"),
)


### Arguments

 gene_list A list of proteins or genes that you want to query the HPA staining data with. hpa_dat The data frame of normal HPA staining data data, required to run HPAStainR. cancer_dat The data frame of pathologic HPA staining data, required to run HPAStainr. cancer_analysis A character string indicating inclusion of cancer data in the result, must be one of 'normal' (default), 'cancer', or 'both'. tissue_level A boolean that determines whether tissue level data for the cell types are included. Default is TRUE stringency A character string indicating how stringent the confidence level of the staining findings have to be. Must be 'normal' (default), 'high', or 'low'. This stringency is based on the 'Reliability' column from the hpa_dat object which varies from "Enhanced", "Supported", "Approved", to "Uncertain" in decreasing order of certainty. Low stringency includes all data, normal stringency includes "Enhanced", "Supported", and "Approved", while high stringency only includes "Enhanced" and "Supported". Further information about these categorizations can be found in the following link https://www.proteinatlas.org/about/assays+annotation scale_abundance A boolean that determines whether you scale Staining Score based on the size of the gene list. Default is TRUE. round_to A numeric that determines how many decimals in numeric outputs are desired. Default 2. csv_names A Boolean determining if you want names suited for a csv file/pipeline, or for presentation. Default is TRUE giving csv names. stained_gene_data A boolean determining if there is a list of which proteins stained, TRUE is default. tested_protein_column A boolean determining if there is a column listing which proteins were tested, TRUE is default. percent_or_count A character string determining if percent of proteins stained, count of proteins stained, or both are shown for high, medium, and low staining. Must be 'percent' (default), 'count', or 'both'. drop_na_row A boolean that determines if cell types with no proteins tested are kept or dropped, default is FALSE. test_type A character vector for either "fisher" or "chi square", used to select the statistical test for determining cell type enrichment. The two options are Fisher's Exact Test and a Chi Square test. The original version of HPAStainR defaulted to the Chi Square test, however because this requires simulated values to run correctly, we suggest the usage of the Fisher's Exact Test for consistency. adjusted_pvals A boolean indicating if you want the p-values corrected for multiple testing. Default is TRUE.

### Value

A tibble containing the results of HPAStainR.

### Details

Calculation of the staining score below:

(\frac{h \times 100}{t}) + (\frac{m \times 50}{t}) + (\frac{l \times 25}{t})

### Examples

   ## Below will give you the results found on the shiny app website
HPA_data$hpa_dat, HPA_data$cancer_dat,