top_percent: easy data frame clustering based on percentile

Description Usage Arguments Value Author(s) Examples

View source: R/top_percent.R

Description

This function enables comparison of data sets of different length. It is suggested to use it on gene lists which have associated numeric values. It is an alternative for clusterizer_oneR which doesn't deal well with continuous numbers like p-values or fold changes Prioritization of the analyzed gene lists can based on the scores assigned after data aggregation and counting. This function helps to avoid arbitrary selection of top candidates, subsetting top percent of genes for a given cutoff. It includes all genes close to a cutoff if they have same value. It generates new column with TRUE or FALSE value giving information if our gene was present in the top percents.

Usage

1
2
top_percent(inputDF,
landmark_col, cols_to_cluster, cutoff)

Arguments

inputDF

input data frame, need to have at least two columns landmark_col= and cols_to_cluster=

landmark_col

column from the input DF we want to analyze for example column with gene symbols (characters)

cols_to_cluster

column or multiple columns from the input DF with numeric scores (counts), for example number of regulatory miRNAs for each gene, number of data sets the gene was present in

cutoff

percent of top hits which should be selected, default is set as 25 percent

Value

output column clus_... - logical information if gene was present in top percent cutoff. Name includes information about cutoff value

Author(s)

Zofia Wicik

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#example####


#create input DF called DE_miRNA
miR<-c('hsa-miR-497-5p', 'hsa-miR-106a-5p', 'hsa-miR-195-5p', 'hsa-miR-4753-3p',
'hsa-miR-493-5p', 'hsa-miR-450b-5p', 'hsa-miR-448', 'hsa-miR-1264', 'hsa-miR-541-5p',
'hsa-miR-449b-5p', 'hsa-miR-493-3p', 'hsa-miR-4731-3p', 'hsa-miR-106a-3p', 'hsa-miR-345-5p',
'hsa-miR-3612', 'hsa-miR-1343', 'hsa-miR-1197', 'hsa-miR-1229-3p', 'hsa-miR-4766-3p',
'hsa-miR-580-3p', 'hsa-miR-345-3p', 'hsa-miR-4714-5p')
values_A<- c(66, 62, 54, 40, 34, 32, 32, 16, 15, 15, 15, 14, 14, 9,
9, 9, 9, 8, 5, 5, 4, 1)
values_B<- c(3, 5, 12, 14, 7, 7, 7, 1, 1, 13, 20, 12, 15,
6, 2, 2, 1, 12, 21, 10, 13, 3)

DE_miRNA<- data.frame(miR,values_A,values_B)


#set parameters
inputDF<- DE_miRNA
name_input_df="DE_miRNA"
landmark_col<- "miR"
cols_to_cluster<- c('values_A', 'values_B')
cutoff=20

#run function


temp<- top_percent(inputDF, landmark_col, cols_to_cluster, cutoff)

wizbionet/wizbionet documentation built on Sept. 9, 2020, 12:45 a.m.