pangenomes_from_files: Creates Pangenomes from file list
In irycisBioinfo/PATO: Pangenome Analysis Toolkit

pangenomes_from_files

R Documentation

Creates Pangenomes from file list

Description

This function perform a clustering using a designated maximun distance among samples and creates a set of pangenomes using it. Pan-genomes can be filtered by genome frequency (i.e. the number of genomes that belong to the pangenome). pangenomes_from_list() builds a new accnet object that relates the pangenome with the genes/proteins. The user can filter the proteins that are included by the presence frequency in the set.

Usage

pangenomes_from_files(
  files,
  min_pange_size = 10,
  min_prot_freq = 2,
  file_type = "prot",
  distance,
  cluster,
  coverage = 0.8,
  identity = 0.8,
  evalue = 1e-06,
  n_cores,
  cov_mode = 0,
  cluster_mode = 0
)

Arguments

`files`	A data.frame of one column whit the path of the files.
`min_pange_size`	Minimun number of genomes per pan-genome
`min_prot_freq`	Mininum frequency of a protein (minimun number of pan-genomes in where is present)
`file_type`	Type of fasta file (prot or nucl)
`distance`	Maximun distance (mash distance) among genomes in each pangenome. (exlude cluster)
`cluster`	(optional, exclude distance). Data.frame of two columns (file, cluster) describing the clutering of the files
`coverage`	Minimun coverage (length) to cluster.
`identity`	Minimun Identity.
`evalue`	Maximun Evalue.
`n_cores`	Number of cores to use.
`cov_mode`	Coverage mode: 0: Coverage of query and target 1: Coverage of target 2: coverage of query 3: target seq. length needs be at least x% of query length 4: query seq. length needs
`cluster_mode`	Cluster mode: 0: Setcover 1: connected component 2: Greedy clustering by sequence length 3: Greedy clustering by sequence length (low mem)