analyzePFAM: Import Result of PFAM analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/analyze_external_sequence_analysis.R

Description

Allows for easy integration of the result of Pfam (external sequence analysis of protein domains) in the IsoformSwitchAnalyzeR workflow. Please note that due to the 'removeNoncodinORFs' option in analyzeCPAT and analyzeCPC2 we recommend using analyzeCPC2/analyzeCPAT before using analyzePFAM, analyzeNetSurfP2 and analyzeSignalP if you have predicted the ORFs with analyzeORF.

Usage

1
2
3
4
5
6
analyzePFAM(
    switchAnalyzeRlist,
    pathToPFAMresultFile,
    showProgress=TRUE,
    quiet=FALSE
)

Arguments

switchAnalyzeRlist

A switchAnalyzeRlist object

pathToPFAMresultFile

A string indicating the full path to the Pfam result file(s). If multiple result files were created (multiple web-server runs) just supply all the paths as a vector of strings. See details for suggestion of how to run and obtain the result of the Pfam tool.

showProgress

A logic indicating whether to make a progress bar (if TRUE) or not (if FALSE). Default is TRUE.

quiet

A logic indicating whether to avoid printing progress messages (incl. progress bar). Default is FALSE

Details

A protein domain is a part of a protein which by itself can maintain a fixed three-dimensional structure. Protein domains are found in most proteins and usually have a specific function.

The PFAM webserver is quite strict with regards to the number of sequences in the files uploaded so we suggest multiple runs each with one of the the files containing subsets. See extractSequence for info on how to split the amino acid fasta files.

Notes for how to run the external tools:
Use default parameters. If you want to use the webserver it is easily done as follows:. 1) Go to https://www.ebi.ac.uk/Tools/hmmer/search/hmmscan 2) Switch to the the "Upload a File" tab. 3) Upload the amino avoid file (_AA) created with extractSequence file and add your mail address - this is important because there is currently no way of downloading the web output so you need them to send the result to your email. 4) Check Pfam is selected in the "HMM database" window. 5) Submit your job. 6) Wait till you receive the email with the result (usually quite fast). 7) Copy/paste the result part of the (ONLY what is below the line starting with "seq id") into an empty plain text document (notepad, sublimetext TextEdit or similar (not word)). 8) Save the document and supply the path to that document to analyzePFAM()

To run PFAM locally you should use the pfam_scan.pl script as described in the readme at ftp://ftp.ebi.ac.uk/pub/databases/Pfam/Tools/ and supply the path to the result file to analyzePFAM().

Protein domains are only added to isoforms annotated as having an ORF even if other isoforms exists in the file. This means if you quantify the same isoform many times you can just run pfam once on all isoforms and then supply the entire file to analyzePFAM().

Please note that the analyzePFAM() function will automatically only import the Pfam results from the isoforms stored in the switchAnalyzeRlist - even if many more are stored in the result file.

Value

A column called 'domain_identified' is added to isoformFeatures containing a binary indication (yes/no) of whether a transcript contains any protein domains or not. Furthermore the data.frame 'domainAnalysis' is added to the switchAnalyzeRlist containing the details about domain names(s) and position for each transcript (where domain(s) were found).

The data.frame added have one row per isoform and contains the columns:

Furthermore depending on the exact tool used (local vs web-server) additional columns are added with information such as E score and type.

Author(s)

Kristoffer Vitting-Seerup

References

See Also

createSwitchAnalyzeRlist
extractSequence
analyzeCPAT
analyzeSignalP
analyzeNetSurfP2
analyzeSwitchConsequences

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
### Load example data (matching the result files also store in IsoformSwitchAnalyzeR)
data("exampleSwitchListIntermediary")
exampleSwitchListIntermediary

### Add PFAM analysis
exampleSwitchListAnalyzed <- analyzePFAM(
    switchAnalyzeRlist   = exampleSwitchListIntermediary,
    pathToPFAMresultFile = system.file("extdata/pfam_results.txt", package = "IsoformSwitchAnalyzeR"),
    showProgress=FALSE
    )

exampleSwitchListAnalyzed

IsoformSwitchAnalyzeR documentation built on Nov. 8, 2020, 5:36 p.m.