Bioinformatics Tools

Pages

Saturday, August 29, 2015

Predicting Beta Barrel Outer Membrane Proteins (OMPs)

Here is compilation of some of the commonly available tools for prediction of Beta Barrel Outer Membrane Proteins (OMPs)


1.     Pred-TMBB (2004): http://bioinformatics.biol.uoa.gr/PRED-TMBB/input.jsp
PRED-TMBB: a web server for predicting the topology of beta-barrel outer membrane proteins. The beta-barrel outer membrane proteins constitute one of the two known structural classes of membrane proteins. Whereas there are several different web-based predictors for alpha-helical membrane proteins, currently there is no freely available prediction method for beta-barrel membrane proteins, at least with an acceptable level of accuracy. We present here a web server (PRED-TMBB, http://bioinformatics.biol.uoa.gr/PRED-TMBB) which is capable of predicting the transmembrane strands and the topology of beta-barrel outer membrane proteins of Gram-negative bacteria. The method is based on a Hidden Markov Model, trained according to the Conditional Maximum Likelihood criterion. The model was retrained and the training set now includes 16 non-homologous outer membrane proteins with structures known at atomic resolution. The user may submit one sequence at a time and has the option of choosing between three different decoding methods. The server reports the predicted topology of a given protein, a score indicating the probability of the protein being an outer membrane beta-barrel protein, posterior probabilities for the transmembrane strand prediction and a graphical representation of the assumed position of the transmembrane strands with respect to the lipid bilayer. http://nar.oxfordjournals.org/content/32/suppl_2/W400.long

2.      BOCTOPUS (2012): http://boctopus.cbr.su.se/
BOCTOPUS: improved topology prediction of transmembrane β barrel proteins
Transmembrane β barrel proteins (TMBs) are found in the outer membrane of Gram-negative bacteria, chloroplast and mitochondria. They play a major role in the translocation machinery, pore formation, membrane anchoring and ion exchange. TMBs are also promising targets for antimicrobial drugs and vaccines. Given the difficulty in membrane protein structure determination, computational methods to identify TMBs and predict the topology of TMBs are important. Results: Here, we present BOCTOPUS; an improved method for the topology prediction of TMBs by employing a combination of support vector machines (SVMs) and Hidden Markov Models (HMMs). The SVMs and HMMs account for local and global residue preferences, respectively. Based on a 10-fold cross-validation test, BOCTOPUS performs better than all existing methods, reaching a Q3 accuracy of 87%. Further, BOCTOPUS predicted the correct number of strands for 83% proteins in the dataset. BOCTOPUS might also help in reliable identification of TMBs by using it as an additional filter to methods specialized in this task. http://bioinformatics.oxfordjournals.org/content/28/4/516.long

3.       TBBpred (2004): http://www.imtech.res.in/raghava/tbbpred/
Prediction of transmembrane regions of β-barrel proteins using ANN- and SVM-based methods. This article describes a method developed for predicting transmembrane β-barrel regions in membrane proteins using machine learning techniques: artificial neural network (ANN) and support vector machine (SVM). The ANN used in this study is a feed-forward neural network with a standard back-propagation training algorithm. The accuracy of the ANN-based method improved significantly, from 70.4% to 80.5%, when evolutionary information was added to a single sequence as a multiple sequence alignment obtained from PSI-BLAST. We have also developed an SVM-based method using a primary sequence as input and achieved an accuracy of 77.4%. The SVM model was modified by adding 36 physicochemical parameters to the amino acid sequence information. Finally, ANN- and SVM-based methods were combined to utilize the full potential of both techniques. The accuracy and Matthews correlation coefficient (MCC) value of SVM, ANN, and combined method are 78.5%, 80.5%, and 81.8%, and 0.55, 0.63, and 0.64, respectively. These methods were trained and tested on a nonredundant data set of 16 proteins, and performance was evaluated using “leave one out cross-validation” (LOOCV). http://onlinelibrary.wiley.com/doi/10.1002/prot.20092/abstract;jsessionid=F041C3CA2F5E53B83924D0D73D2832C7.f03t02

4.       BETAWARE (2013): http://www.biocomp.unibo.it/~savojard/betawarecl/
BETAWARE: a machine-learning tool to detect and predict transmembrane beta-barrel proteins in prokaryotes. The annotation of membrane proteins in proteomes is an important problem of Computational Biology, especially after the development of high-throughput techniques that allow fast and efficient genome sequencing. Among membrane proteins, transmembrane β-barrels (TMBBs) are poorly represented in the database of protein structures (PDB) and difficult to identify with experimental approaches. They are, however, extremely important, playing key roles in several cell functions and bacterial pathogenicity. TMBBs are included in the lipid bilayer with a β-barrel structure and are presently found in the outer membranes of Gram-negative bacteria, mitochondria and chloroplasts. Recently, we developed two top-performing methods based on machine-learning approaches to tackle both the detection of TMBBs in sets of proteins and the prediction of their topology. Here, we present our BETAWARE program that includes both approaches and can run as a standalone program on a linux-based computer to easily address in-home massive protein annotation or filtering. http://bioinformatics.oxfordjournals.org/content/29/4/504.abstract 

Prediction of the transmembrane strands and topology of β-barrel outer membrane proteins is of interest in current bioinformatics research. Several methods have been applied so far for this task, utilizing different algorithmic techniques and a number of freely available predictors exist. The methods can be grossly divided to those based on Hidden Markov Models (HMMs), on Neural Networks (NNs) and on Support Vector Machines (SVMs). In this work, we compare the different available methods for topology prediction of β-barrel outer membrane proteins. We evaluate their performance on a non-redundant dataset of 20 β-barrel outer membrane proteins of gram-negative bacteria, with structures known at atomic resolution. Also, we describe, for the first time, an effective way to combine the individual predictors, at will, to a single consensus prediction method. We assess the statistical significance of the performance of each prediction scheme and conclude that Hidden Markov Model based methods, HMM-B2TMR, ProfTMB and PRED-TMBB, are currently the best predictors, according to either the per-residue accuracy, the segments overlap measure (SOV) or the total number of proteins with correctly predicted topologies in the test set. Furthermore, we show that the available predictors perform better when only transmembrane β-barrel domains are used for prediction, rather than the precursor full-length sequences, even though the HMM-based predictors are not influenced significantly. The consensus prediction method performs significantly better than each individual available predictor, since it increases the accuracy up to 4% regarding SOV and up to 15% in correctly predicted topologies.
http://www.biomedcentral.com/1471-2105/6/7
TMBETA-NET: discrimination and prediction of membrane spanning β-strands in outer membrane proteins. We have developed a web-server, TMBETA-NET for discriminating outer membrane proteins and predicting their membrane spanning β-strand segments. The amino acid compositions of globular and outer membrane proteins have been systematically analyzed and a statistical method has been proposed for discriminating outer membrane proteins. The prediction of membrane spanning segments is mainly based on feed forward neural network and refined with β-strand length. Our program takes the amino acid sequence as input and displays the type of the protein along with membrane-spanning β-strand segments as a stretch of highlighted amino acid residues. Further, the probability of residues to be in transmembrane β-strand has been provided with a coloring scheme. We observed that outer membrane proteins were discriminated with an accuracy of 89% and their membrane spanning β-strand segments at an accuracy of 73% just from amino acid sequence information. The prediction server is available at http://psfs.cbrc.jp/tmbeta-net/

7.       TMB-HUNT (2005): http://www.bioinformatics.leeds.ac.uk/betaBarrel/
TMB-Hunt: a web server to screen sequence sets for transmembrane β-barrel proteins. TMB-Hunt is a program that uses a modified k-nearest neighbour (k-NN) algorithm to classify protein sequences as transmembrane β-barrel (TMB) or non-TMB on the basis of whole sequence amino acid composition. By including differentially weighted amino acids, evolutionary information and by calibrating the scoring, a discrimination accuracy of 92.5% was achieved, as tested using a rigorous cross-validation procedure. The TMB-Hunt web server, available at www.bioinformatics.leeds.ac.uk/betaBarrel, allows screening of up to 10 000 sequences in a single query and provides results and key statistics in a simple colour coded format. http://nar.oxfordjournals.org/content/33/suppl_2/W188.long

8.       TMBPro (2008): suite of specialized predictors for predicting secondary structure, beta-contacts, and tertiary structure of Transmembrane Beta-Barrel (TMB) proteins. http://tmbpro.ics.uci.edu/ TMBpro: secondary structure, β-contact and tertiary structure prediction of transmembrane β-barrel proteins. Transmembrane β-barrel (TMB) proteins are embedded in the outer membranes of mitochondria, Gram-negative bacteria and chloroplasts. These proteins perform critical functions, including active ion-transport and passive nutrient intake. Therefore, there is a need for accurate prediction of secondary and tertiary structure of TMB proteins. Traditional homology modeling methods, however, fail on most TMB proteins since very few non-homologous TMB structures have been determined. Yet, because TMB structures conform to specific construction rules that restrict the conformational space drastically, it should be possible for methods that do not depend on target-template homology to be applied successfully.Results: We develop a suite (TMBpro) of specialized predictors for predicting secondary structure (TMBpro-SS), β-contacts (TMBpro-CON) and tertiary structure (TMBpro-3D) of transmembrane β-barrel proteins. We compare our results to the recent state-of-the-art predictors transFold and PRED-TMBB using their respective benchmark datasets, and leave-one-out cross-validation. Using the transFold dataset TMBpro predicts secondary structure with per-residue accuracy (Q2) of 77.8%, a correlation coefficient of 0.54, and TMBpro predicts β-contacts with precision of 0.65 and recall of 0.67. Using the PRED-TMBB dataset, TMBpro predicts secondary structure with Q2 of 88.3% and a correlation coefficient of 0.75. All of these performance results exceed previously published results by 4% or more. Working with the PRED-TMBB dataset, TMBpro predicts the tertiary structure of transmembrane segments with RMSD <6.0 Å for 9 of 14 proteins. For 6 of 14 predictions, the RMSD is <5.0 Å, with a GDT_TS score greater than 60.0. http://bioinformatics.oxfordjournals.org/content/24/4/513.long

9.       MCMBB Markov Chain Model Beta Barrels (2004): http://athina.biol.uoa.gr/bioinformatics/mcmbb/
The task of finding β-barrel outer membrane proteins of the gram-negative bacteria is of greatimportance in current Bioinformatics research. We developed a computational method, which discriminates β- barrel outer membrane proteins from globular ones and, also, from α-helical membrane proteins. The methodis based on a 1st order Markov Chain model, which captures the alternating pattern of hydrophilic-hydrophobicresidues occurring in the membrane-spanning beta-strands of beta-barrel outer membrane proteins. The modelachieves high accuracy in discriminating outer membrane proteins, and could be used alone, or in conjunctionwith other more sophisticated methods, already available http://www.academia.edu/316959/Finding_Beta-Barrel_Outer_Membrane_Proteins_With_a_Markov_Chain_Model

TMB-Hunt: a web server to screen sequence sets for transmembrane β-barrel proteins
TMB-Hunt is a program that uses a modified k-nearest neighbour (k-NN) algorithm to classify protein sequences as transmembrane β-barrel (TMB) or non-TMB on the basis of whole sequence amino acid composition. By including differentially weighted amino acids, evolutionary information and by calibrating the scoring, a discrimination accuracy of 92.5% was achieved, as tested using a rigorous cross-validation procedure. The TMB-Hunt web server, available at www.bioinformatics.leeds.ac.uk/betaBarrel, allows screening of up to 10 000 sequences in a single query and provides results and key statistics in a simple colour coded format. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1160145/

11.   transFold (2006): super-secondary structure prediction of transmembrane β-barrel proteins http://bioinformatics.bc.edu/clotelab/transFold/
transFold: a web server for predicting the structure and residue contacts of transmembrane beta-barrels. Transmembrane β-barrel (TMB) proteins are embedded in the outer membrane of Gram-negative bacteria, mitochondria and chloroplasts. The cellular location and functional diversity of β-barrel outer membrane proteins makes them an important protein class. At the present time, very few non-homologous TMB structures have been determined by X-ray diffraction because of the experimental difficulty encountered in crystallizing transmembrane (TM) proteins. The transFold web server uses pairwise inter-strand residue statistical potentials derived from globular (non-outer-membrane) proteins to predict the supersecondary structure of TMB. Unlike all previous approaches, transFold does not use machine learning methods such as hidden Markov models or neural networks; instead, transFold employs multi-tape S-attribute grammars to describe all potential conformations, and then applies dynamic programming to determine the global minimum energy supersecondary structure. The transFold web server not only predicts secondary structure and TMB topology, but is the only method which additionally predicts the side-chain orientation of transmembrane β-strand residues, inter-strand residue contacts and TM β-strand inclination with respect to the membrane. The program transFold currently outperforms all other methods for accuracy of β-barrel structure prediction. Available at http://bioinformatics.bc.edu/clotelab/transFold. http://nar.oxfordjournals.org/content/34/suppl_2/W189.full

BOMP: a program to predict integral β-barrel outer membrane proteins encoded within genomes of Gram-negative bacteria. This work describes the development of a program that predicts whether or not a polypeptide sequence from a Gram-negative bacterium is an integral β-barrel outer membrane protein. The program, called the β-barrel Outer Membrane protein Predictor (BOMP), is based on two separate components to recognize integral β-barrel proteins. The first component is a C-terminal pattern typical of many integral β-barrel proteins. The second component calculates an integral β-barrel score of the sequence based on the extent to which the sequence contains stretches of amino acids typical of transmembrane β-strands. The precision of the predictions was found to be 80% with a recall of 88% when tested on the proteins with SwissProt annotated subcellular localization in Escherichia coli K 12 (788 sequences) and Salmonella typhimurium (366 sequences). When tested on the predicted proteome of E.coli, BOMP found 103 of a total of 4346 polypeptide sequences to be possible integral β-barrel proteins. Of these, 36 were found by BLAST to lack similarity (E-value score < 1e−10) to proteins with annotated subcellular localization in SwissProt. BOMP predicted the content of integral β-barrels per predicted proteome of 10 different bacteria to range from 1.8 to 3%. BOMP is available at http://www.bioinfo.no/tools/bomp http://nar.oxfordjournals.org/content/32/suppl_2/W394.full

13.   TMBETA-net (2004): http://psfs.cbrc.jp/tmbeta-net/
TMBETA-NET: discrimination and prediction of membrane spanning beta-strands in outer membrane proteins. We have developed a web-server, TMBETA-NET for discriminating outer membrane proteins and predicting their membrane spanning beta-strand segments. The amino acid compositions of globular and outer membrane proteins have been systematically analyzed and a statistical method has been proposed for discriminating outer membrane proteins. The prediction of membrane spanning segments is mainly based on feed forward neural network and refined with beta-strand length. Our program takes the amino acid sequence as input and displays the type of the protein along with membrane-spanning beta-strand segments as a stretch of highlighted amino acid residues. Further, the probability of residues to be in transmembrane beta-strand has been provided with a coloring scheme. We observed that outer membrane proteins were discriminated with an accuracy of 89% and their membrane spanning beta-strand segments at an accuracy of 73% just from amino acid sequence information. The prediction server is available at http://psfs.cbrc.jp/tmbeta-net/. http://nar.oxfordjournals.org/content/33/suppl_2/W164.long

TMBB-DB: a transmembrane β-barrel proteome database. We previously reported the development of a highly accurate statistical algorithm for identifying β-barrel outer membrane proteins or transmembrane β-barrels (TMBBs), from genomic sequence data of Gram-negative bacteria (Freeman,T.C. and Wimley,W.C. (2010) Bioinformatics26, 1965–1974). We have now applied this identification algorithm to all available Gram-negative bacterial genomes (over 600 chromosomes) and have constructed a publicly available, searchable, up-to-date, database of all proteins in these genomes. For each protein in the database, there is information on (i) β-barrel membrane protein probability for identification of β-barrels, (ii) β-strand and β-hairpin propensity for structure and topology prediction, (iii) signal sequence score because most TMBBs are secreted through the inner membrane translocon and, thus, have a signal sequence, and (iv) transmembrane α-helix predictions, for reducing false positive predictions. This information is sufficient for the accurate identification of most β-barrel membrane proteins in these genomes. In the database there are nearly 50 000 predicted TMBBs (out of 1.9 million total putative proteins). Of those, more than 15 000 are ‘hypothetical’ or ‘putative’ proteins, not previously identified as TMBBs. This wealth of genomic information is not available anywhere else. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3463127/



Thursday, July 16, 2015

Woods: A fast and accurate functional annotator and classifier of genomic and metagenomic sequences

Another recent publication from our lab.

Citation: 

Sharma, A. K., Gupta, A., Kumar, S., Dhakan, D. B., & Sharma, V. K. (2015). Woods: A fast and accurate functional annotator and classifier of genomic and metagenomic sequences. Genomics.

Functional annotation of the gigantic metagenomic data is one of the major time-consuming and computationally demanding tasks, which is currently a bottleneck for the efficient analysis. The commonly used homology-based methods to functionally annotate and classify proteins are extremely slow. 

Therefore, to achieve faster and accurate functional annotation, we have developed an orthology-based functional classifier 'Woods' by using a combination of machine learning and similarity-based approaches. Woods displayed a precision of 98.79% on independent genomic dataset, 96.66% on simulated metagenomic dataset and >97% on two real metagenomic datasets. In addition, it performed >87 times faster than BLAST on the two real metagenomic datasets. Woods can be used as a highly efficient and accurate classifier with high-throughput capability which facilitates its usability on large metagenomic datasets.

The Woods web server is freely accessible at http://metagenomics.iiserb.ac.in/woods/index.php and http://metabiosys.iiserb.ac.in/woods/index.php. The standalone version of Woods can be downloaded from the above web servers and usage instructions are provided in Text S1 and also in the Tutorial section of the web server.

Thursday, February 5, 2015

MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data


The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. 

Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51–100 amino acids and Blind B: 30–50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. 

To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100–150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. 

16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets


Classifier is available freely at http://metagenomics.iiserb.ac.in/16Sclassifier.

The sequencing of 16S rRNA gene is commonly performed to estimate the microbial diversity in a metagenomic study. The rapid developments in genome sequencing technologies have shifted the focus on sequencing the selected hypervariable regions (HVRs) of 16S rRNA gene instead of sequencing the complete gene. The recent metagenomic projects involve the sequencing of only a single HVR or a combination of two or more HVRs. At present there is no specialized method available for the correct identification and classification of species using short variable 16S rRNA sequences. Therefore, we have developed 16S Classifier using a machine learning method, Random Forest, for faster and accurate taxonomic classification of short hypervariable regions of 16S rRNA sequence. It displayed the precision values of up to 0.91 on training datasets and the precision values of up to 0.98 on the first test dataset. On real metagenomic datasets, it showed up to 99.7% accuracy at the phylum level and up to 99.0% accuracy at the genus level. 

16S classifier displayed up to 42.9%, 40.7%, 41.0%, 57.9% and 73.8% higher accuracy at phylum, class, order, family and genus levels, respectively, as compared to the commonly used RDP classifier program. In addition, it is 7.5 times faster than RDP Classifier and 800 times faster than BLAST. 16S classifier can be easily used with the QIIME pipeline which is commonly used for the 16S rRNA analysis.

To the best of our knowledge, 16S Classifier is the only available tool which can carry out the efficient, sensitive and accurate taxonomic assignment of any of the 16S rRNA hypervariable regions which are commonly used in metagenomic projects. In the case of complete 16S rRNA also, it displayed exceptional (precision of 0.97) performance on the test dataset. Thus, the wide usage of this tool is anticipated in different metagenomic projects. 16S 

Instructions for running the stand-alone version of 16S Classifier on the Linux PC.

1. User can download zip file of a particular hypervariable region or complete 16S, which is freely available at http://metagenomics.iiserb.ac.in/16Sclassifier/download.html

2. Extract the zipped file which contains a model file (*.Rdata), a script file (*.sh) and an exe file (16sclassifier.exe).

Other dependencies:

1. User has to install R from the following link http://cran.r-project.org/

2. intall Randomforest by typing the following commands in terminal  R  and install.packages ('randomForest')

# Command line usage #

./16sclassifier.exe 'queryfile' 'modelname'

The query file should be in Fasta format and the model name could be v2, v3, v4, v5, v6, v7, v8, v23, v34, v35, v45, v56, v67, v78 and Complete16S.