To go to the individual scripts/programs, please click on the name:
Determination of the parameters for fuzzy c-means cluster analysis
Assessment and Improvement of Statistical Tools for Comparative Proteomics
CrossTalkDB for Mass Spectrometry Data
CROSSWORK : Cross-link identification
GLYCANTHROPE: Glyco-peptide identification
MassAI: Protein identification / scan annotation
BACK TO PR-GROUP MAIN PAGE
Developers: Thomas Aarup Hansen, Simone Sidoli (firstname.lastname@example.org) and Chrystian Ruminowicz (email@example.com)
Histone Coder counts the number of MS/MS ions in a given spectrum to determine the unambiguous localization of a post-translational modification (PTM). The software lists number and type of site determining ions found between the assigned PTM localization by Mascot (Matrix Science) and the closest other amino acids which can host the modification. The PTMs included in the script are phosphorylation (S,T,Y), acetylation (K), mono- and dimethylation (K,R) and trimethylation (K). From the interface it is possible to select filters for spectrum score, fragment ion types and MS/MS tolerance for the search. As additional feature, Histone Coder converts the Mascot output to the standard Brno nomenclature for PTMs (e.g. K4me3K14ac).
Requirements: The software activates with Java. The input is Mascot output in .xml extension.
Developers: Chrystian Ruminowicz (firstname.lastname@example.org) and Simone Sidoli (email@example.com)
isoScale is a proteomics software to quantify peptides based on the total ion intensity of the MS/MS spectra. The software provides also relative quantification of isobaric peptides co-fragmented in MS/MS spectra which share the same sequence but have distinct localizations of post-translational modifications (PTMs). The principle of the quantification is described in Pesavento et al. (Analytical Chemistry, 2006). We recommend the use of the software for middle-down or top-down analyses, as it is suitable for direct comparison of single sequences with scrambled combinatorial PTMs. The user interface provides a choice for CID/HCD or ETD fragmentation. It is suitable for high and low resolution data, as it allows the choice of MS/MS ion tolerance for searching the fragment ions required to calculate the fragment ion relative ratio of isobaric peptides (FIRR).
Requirements: The software requires Java. It processes Mascot output files in .csv extension. The software input is the result file, which is the output format of Histone Coder, and a data file, which is the Mascot .csv output.
Developer: Alistair Edwards, firstname.lastname@example.org
ReportSites is a software tool for useful for distilling information about the physical distributions of post-translational modification sites from large scale data sets.
The software can be found here: http://ptmtools.portjackson.org/
Reference: Edwards, A. V. G., Edwards, G. J., Larsen, M. R., Cordwell, S. J. (2012) ReportSites - a computational method to extract positional and physico-chemical information from large-scale proteomic post-translational modification datasets, Journal of Proteomics and Bioinformatics, 5: 104-107
Developer: Veit Schwämmle, email@example.com.
Bioinformatics, Vol. 26, Nr. 22, 15.11.2010, s. 2841-8.
Fuzzy c-means clustering is widely used to identify cluster structures in high-dimensional data sets. One of its main limitations is the lack of a computationally fast method to set optimal values of algorithm parameters. Wrong parameter values may either lead to the inclusion of purely random fluctuations in the results or ignore potentially important data. The optimal solution has parameter values for which the clustering does not yield any results for a purely random data set but which detects cluster formation with maximum resolution on the edge of randomness. Taking the dimension of the set and the number of objects as input values instead of evaluating the entire data set allows us to propose a functional relationship determining the fuzzifier directly. This result speaks strongly against using a predefined fuzzifier as typically done in many previous studies. A comparison shows that the minimum distance between the centroids provides results that are at least equivalent or better than those obtained by other computationally more expensive validation indices.
Link to R script that automatically estimates the parameters for your data set
Developers: Veit Schwämmle, Ileana Rodríguez León , and Ole Nørregaard Jensen
Veit Schwämmle, Ileana Rodríguez León, and Ole Nørregaard Jensen
J. Proteome Res., 2013, 12 (9), pp 3874–3883
Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type.
Link to R script that implements the improved rank products algorithm and the combined analysis.
Developers: Veit Schwämmle, firstname.lastname@example.org.
Large-scale analysis of co-existing post-translational modifications on histone tails reveals global fine-structure of crosstalk
Veit Schwammle, Claudia-Maria Aspalter, Simone Sidoli and Ole N. Jensen
Multiply modified histones are known to change the chromatin structure and control gene expression. The vast amount of possible combinations of these marks complicate an understanding of their function. Recent mass spectrometry experiments are able to determine the co-occurrence of multiple histone marks. The increasing amount of information gathered from these experiments demands an organization of data from mass spectrometry experiments on histone marks. We present here a publicly accessible database with the aim to collect such data. The software includes extensive search options and statistical tools to further analyze the searched data.
Find the crosstalkdb here
FORMAT MGF is a program, which imports various types of MGF/MSM/PKL and LIST files from tandem mass-spectrometric data, and formats them into a standard MGF file.
The program also features a number of noise-reducing filters, which can cut down on the false positive ratio in subsequent searches with search engines (including CrossWork, Glycanthrope or MassAI).
CROSSWORK is a stand-alone software package for which identifying cross-linked peptides from mass spectrometric datasets. Modified and non-cross-linked peptides are likewise identified.
The fragment ions are annotated for each scan, the cross-linked residues are highlighted and the results are displayed graphically. Graphical representations of the proteins involved and the position of their cross-links are likewise created. Results and graphical representations can be exported for later publication.
Glycanthrope is a program for identifying glycopeptides from mass-spectrometric datasets.
The program identifies both the peptide and glycan moieties from intact glyco-peptides. The fact that the glycopeptides remain intact makes the need to separate the glycans from the peptides enzymatically prior to search redundant.
Peptide and glycan fragment ions are annotated and displayed graphically. Result lists and graphics can be exported for later publication.
The program works with any combination of CID/HCD and ETD fragmentation and combines information from each type of fragmentation for better identification.
When working with complex samples, Glycanthrope searches can be assisted greatly by running the dataset(s) through MassAI first, in order to narrow down the number of proteins.
MassAI is a wholly self-contained software package for peptide/protein- identification from tandem mass-spectrometric datasets. MassAI can search batches of mass-spectrometric datasets (CID/ETD/HCD) against proteome scale databanks (UNIPROT) when searching complex samples, against a user defined set of protein sequences for simple protein samples - or against a combination of the two.
MassAI features numerous in-silico proteases, and can also perform non-specific protein digest, as encountered when using pepsin or bacterial degradation of proteins.
A high emphasis is put on the score being analytical rather than statistical, which means that the reliance on an arbitrary score is reduced. Instead, MassAI features full transparency of every result:
-from every peptide
-from every protein
is fully annotated and displayed.
MassAI also includes a number of data-mining routines for identifying peptides with both simple and complex modifications.
Through annotating non-peptide fragmentation, such as glycosylations and polymers, the data-mining routines can be applied to clean up datasets: un-desirable scans can be purged from the peaklist(s), while desirable scans can be retained. This way, batches of peak-lists can be cleaned and concentrated into a single new file (.mgf)
Such noise-reduced mgf files, can then be submitted for other types of search, including CROSSWORK and GLYCANTHROPE.
The built-in labelling feature can be used to compare of labelled/non-labelled datasets (single or in batch), and can be favourably used for analysing hydrogen-deuterium exchange (HDX) datasets at residue level resolution.
The software with tutorials can be downloaded from :http://massai.dk/