col_agrecounter: data frame aggregator

Description Usage Arguments Value Author(s) Examples

View source: R/col_agrecounter_v1.R

Description

This function allows to aggregates and summarize complex data frames with preserving names of the columns and providing the number of occurrences column "_count".

Usage

1
col_agrecounter(inputDF, col_names, col_collapse , rows_collapse, control_col)

Arguments

inputDF

data frame for aggregation and counting (A&C)

col_names

column names which will be A&C

col_collapse

landmark column for columns which will A&C

rows_collapse

vector of characters<- content of the landmark column (col_collapse=). It can be easy generated using function wizbionet::col_to_string()

control_col

additional landmark column which doesn't allow to deduplicate data based on column col_collapse=. For example when we have different transcripts for a gene and we want to preserve this information

Value

output provides input data frame with additional columns in which rows were aggregated. For each aggregated column it provides two columns with suffixes "_coll" and "_count". Column with suffix "_coll" has aggregated strings (gene) for example: "CDH24|ONECUT2|PAX1|PTGER3". Column with suffix "_count" has numbers of items from column "_coll". Output is sorted based on the counts from the control_col= . After using this function you can try to use clusterizer_OneR() or top_percent() on col_collapse= and column with suffix _COUNT

Author(s)

Zofia Wicik

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#Example
dat1<- data.frame(
  mature_miRNA=c('hsa-miR-195-5p', 'hsa-miR-195-3p','hsa-miR-195-5p', 'hsa-miR-195-5p',
                 'hsa-miR-4753-5p', 'hsa-miR-4753-3p'),
  pre_miRNA=c('hsa-miR-195', 'hsa-miR-195','hsa-miR-195', 'hsa-miR-195',
              'hsa-miR-4753', 'hsa-miR-4753'),
  Target=c('CDH24',	'PAX1',	'PTGER3',	'ONECUT2',	'TGFB3',	'FGFR1'))


#set parameters####

#dataframe for aggegation and counting (A&C)
    inputDF<-dat1
#selected column names which will be A&C
    col_names <- c( "Target")
#landmark column for A&C columns,other columns will be aggregated based on this
    col_collapse <- "pre_miRNA"
#vector of content of the landmark column "col_collapse" as vector.
#You can use internal col_to_string() function

    rows_collapse <-col_to_string(inputDF$pre_miRNA)
#additional landmark column which will not allow to deduplicate.
#Useful when you want to analyze pre-miRNAs instead of mature mirNAs
    control_col<- "mature_miRNA"

#run function
output<- col_agrecounter(inputDF, col_names, col_collapse , rows_collapse, control_col)

wizbionet/wizbionet documentation built on Sept. 9, 2020, 12:45 a.m.