starfish_all: starfish_all

View source: R/starfish_all.r

starfish_allR Documentation

starfish_all

Description

This function loads SV, CNV and gender files, identifies "connected" complex genome rearrangement (CGR) regions and complex SVs, then constructs a CGR event vs. feature matrix and infers CGR signatures based on clustering and classification.

Usage

starfish_all(
  sv_file,
  cnv_file,
  gender_file,
  prefix = "",
  genome_v = "hg19",
  cnv_factor = "auto",
  arm_del_rm = TRUE,
  plot = TRUE,
  cmethod = "class"
)

Arguments

sv_file

a SV dataframe with 8 columns: "chrom1","end1", "chrom2","end2","svtype" (DEL,DUP,h2hINV,t2tINV,TRA),"strand1" (+/-) and "strand2" (+/-),"sample". Other svtypes like INV, INS, BND are not accepted

cnv_file

a CNV dataframe with 5 columns: "chromosome","start","end","total_cn", and "sample". "total_cn" should contain absolute copy numbers.

gender_file

a sample table with 2 columns: "sample" and "gender". Gender could be "Female, "female","F","f","Male","male","M", or "m". If the gender is unknown, any other characters could be given, such as "unknown", and the gender will be inferred by the CN baseline of chromosome X

prefix

the prefix for all intermediate files, default is none

genome_v

which genome assembly was used to call SV and CNV. It should be "hg19" or "hg38", default is "hg19"

cnv_factor

the CN fluctuation beyond or below baseline to identify loss and gain fragments for samples with decimal CN, default is "auto", or users can provide a value between 0 and 1

arm_del_rm

the logical value of removing arm level deletion or not, default is TRUE

plot

the logical value of plotting "connected" CGRs, default is TRUE

cmethod

"class" based on a pre-constructed classifier from PCAWG dataset or "cluster" based on de-novo unsupervised clustering, default is "class"

Value

a list of files: $complex_sv contains complex SVs, $starfish_call contains connected CGRs, $feature_matrix contains CGR feature matrix, and $chrss_signature shows the signatures computed and inferred from PCAWG dataset. The signature classification table and plot will be generated if "class" is selected, otherwise clustering matrices and plots will be stored under "CGR_cluster" folder with K from 2 to 10.


yanglab-computationalgenomics/Starfish documentation built on July 27, 2022, 10:26 a.m.