Bioinformatics Tools

Pages

Sunday, February 26, 2012

In-silico Binding Site Prediction in Proteins

CASTp : http://sts.bioengr.uic.edu/castp/  Computed Atlas of Surface Topography of proteins (CASTp) provides an online resource for locating, delineating and measuring concave surface regions on three-dimensional structures of proteins. These include pockets located on protein surfaces and voids buried in the interior of proteins. The measurement includes the area and volume of pocket or void by solvent accessible surface model (Richards' surface) and by molecular surface model (Connolly's surface), all calculated analytically. CASTp can be used to study surface features and functional regions of proteins. CASTp includes a graphical user interface, flexible interactive visualization, as well as on-the-fly calculation for user uploaded structures. CASTp is updated daily and can be accessed at http://cast.engr.uic.edu.

LigASite: http://www.bigre.ulb.ac.be/Users/benoit/LigASite/index.php?home is a gold-standard dataset of biologically relevant binding sites in protein structures. It consists of proteins with one unbound structure and at least one structure of the protein-ligand complex. Both a redundant and a non-redundant (sequence identity lower than 25%) version is available. Quaternary structures proposed by PISA (3) are used for all structures in the dataset.

PDBeMotif: http://www.ebi.ac.uk/pdbe-site/pdbemotif/ is an extremely fast and powerful search tool that facilitates exploration of the Protein Data Bank (PDB) by combining protein sequence, chemical structure and 3D data in a single search. Currently it is the only tool that offers this kind of integration at this speed. PDBeMotif can be used to examine the characteristics of the binding sites of single proteins or classes of proteins such as Kinases and the conserved structural features of their immediate environments either within the same specie or across different species. For example, it can highlight a conserved activation loop common to protein kinases, which is important in regulating activity and is marked by conserved DFG and APE motifs at the start and end of the loop, respectively. The prediction of the effect of modifications to small molecules that bind to the active and/or regulatory sites of proteins on their efficacy can be based on the outcome of analytic work done using PDBeMotif.

fPOP: http://pocket.uchicago.edu/fpop/ (footprinting Pockets Of Proteins, http://pocket.uchicago.edu/fpop/) is a database of the protein functional surfaces identified by shape analysis. In this relational database, we collected the spatial patterns of protein binding sites including both holo and apo forms from more than 40,000 structures. To identify protein binding sites, we model the shape of a split pocket induced by a binding ligand(s). Essentially, we use a purely geometric method to extract site-specific spatial patterns of split pockets as templates to match those from unbound structures. To perform an effective shape comparison, we utilize the Smith-Waterman algorithm to footprint an unbound pocket fragment with those selected from the canonical functional surfaces of >19,000 structures in the SplitPocket (http://pocket.uchicago.edu/). The pairwise alignment of the unbound and split-pocket fragments is superimposed to evaluate the local structural similarity for detecting the unbound split characteristic through the RMSD measurement. Furthermore, we conduct a large-scale computation to systematically identify binding sites of proteins. In addition to the geometric measurements, we extensively measure the propensity of surface conservation encapsulated in the evolutionary history.(more)

metaPocket: http://metapocket.eml.org/  is a meta server to identify pockets on protein surface to predict ligand-binding sites. The identification of ligand-binding sites is often the starting point for protein function annotation and structure-based drug design. Many computational methods for the prediction of ligand-binding sites have been developed in recent decades. Here we present a consensus method metaPocket, in which the predicted sites from four methods: LIGSITEcs, PASS, Q-SiteFinder, and SURFNET are combined together to improve the prediction success rate. All these methods are evaluated on two datasets of 48 unbound/bound structures and 210 bound structures. The comparison results show that metaPocket improves the success rate from 70 to 75% at the top 1 prediction. MetaPocket is available at http://metapocket.eml.org.

PocketQuery: http://pocketquery.csb.pitt.edu/  is a web service for interactively exploring not only hot spot and anchor residues, but hot regions, defined by clusters of residues, at the interface of protein-protein interactions. An assortment of metrics, including changes in solvent accessible surface area, energy-based scores, and sequence conservation, are available to screen and sort clusters of residues. PocketQuery was developed by David Koes from the Camacho Lab in the Department of Computational and System Biology at the University of Pittsburgh.

IBIS: http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi is the NCBI Inferred Biomolecular Interactions Server. For a given protein sequence or structure query, IBIS reports physical interactions observed in experimentally-determined structures for this protein. IBIS also infers/predicts interacting partners and binding sites by homology, by inspecting the protein complexes formed by close homologs of a given query. To ensure biological relevance of inferred binding sites, the IBIS algorithm clusters binding sites formed by homologs based on binding site sequence and structure conservation.

3DLigandStie: http://www.sbg.bio.ic.ac.uk/~3dligandsite/ is an automated method for the prediction of ligand binding sites. Users can either submit a sequence or a protein structure. If a sequence is submitted then Phyre is run to predict the structure. The structure is then ussed to search a structural library to identify homologous structures with bound ligands. These ligands are superimposed onto the protein structure to predict a ligand binding site.

SitesBase: http://www.modelling.leeds.ac.uk/sb/ is a database of known ligand binding sites within the PDB which is navigable by PDB identifier or ligand 3 letter code e.g. NAD. Each binding site has a frequently updated register of structurally similar binding sites sharing atomic similarity detected by geometric hashing (Brakoulias and Jackson 2004). Multiple alignments, structural superpositions and links to other structural databases are also available enabling further analysis.

PROSURFER: http://163.43.140.95/top contains information about structural similarities with respect to the query surfaces. A pocket search algorithm detected 48,347 potential ligand binding sites from the 9,708 non-redundant protein entries in the PDB database. All-against-all structural comparison was performed for the predicted sites, and the similar sites with the Z-score ≥ 2.5 were selected. These results can be accessed by the PDB code or ligand name.

KBDOCK: http://kbdock.loria.fr/index.php is a 3D database system that defines and spatially clusters protein binding sites for knowledge-based protein docking. KBDOCK integrates protein domain-domain interaction information from 3DID and sequence alignments from PFAM together with structural information from the PDB in order to analyse the spatial arrangements of DDIs by Pfam family, and to propose structural templates for protein docking. [More]

Pocketome: http://www.pocketome.org/ The Pocketome is an encyclopedia of conformational ensembles of all druggable binding sites that can be identified experimentally from co-crystal structures in the Protein Data Bank.

sc-PDB: http://cheminfo.u-strasbg.fr:8080/scPDB/2011/db_search/about_scpdb.html  To assist structure-based approaches in drug design, we have processed the PDB to identify binding sites suitable for the docking of a drug-like ligand and we have so created a database called sc-PDB. The sc-PDB database provides separated MOL2 files for the ligand, its binding site and the corresponding protein chain(s). Ions and cofactors at the vicinity of the ligand are included in the protein. More details about the sc-PDB scope, its content and its evolution during the 2004-2009 period are provided in a pdf document.

The FunFOLD Binding Site Residue Prediction Server: BACKGROUND: The accurate prediction of ligand binding residues from amino acid sequences is important for the automated functional annotation of novel proteins. In the previous two CASP experiments, the most successful methods in the function prediction category were those which used structural superpositions of 3D models and related templates with bound ligands in order to identify putative contacting residues. However, whilst most of this prediction process can be automated, visual inspection and manual adjustments of parameters, such as the distance thresholds used for each target, have often been required to prevent over prediction. Here we describe a novel method FunFOLD, which uses an automatic approach for cluster identification and residue selection. The software provided can easily be integrated into existing fold recognition servers, requiring only a 3D model and list of templates as inputs. A simple web interface is also provided allowing access to non-expert users. The method has been benchmarked against the top servers and manual prediction groups tested at both CASP8 and CASP9.RESULTS: The FunFOLD method shows a significant improvement over the best available servers and is shown to be competitive with the top manual prediction groups that were tested at CASP8. The FunFOLD method is also competitive with both the top server and manual methods tested at CASP9. When tested using common subsets of targets, the predictions from FunFOLD are shown to achieve a significantly higher mean Matthews Correlation Coefficient (MCC) scores and Binding-site Distance Test (BDT) scores than all server methods that were tested at CASP8. Testing on the CASP9 set showed no statistically significant separation in performance between FunFOLD and the other top server groups tested. CONCLUSIONS: The FunFOLD software is freely available as both a standalone package and a prediction server, providing competitive ligand binding site residue predictions for expert and non-expert users alike. The software provides a new fully automated approach for structure based function prediction using 3D models of proteins.

ProBiS: http://probis.cmm.ki.si/index.php  algorithm for detection of structurally similar protein binding sites by local structural alignment. Motivation: Exploitation of locally similar 3D patterns of physicochemical properties on the surface of a protein for detection of binding sites that may lack sequence and global structural conservation. Results: An algorithm, ProBiS is described that detects structurally similar sites on protein surfaces by local surface structure alignment. It compares the query protein to members of a database of protein 3D structures and detects with sub-residue precision, structurally similar sites as patterns of physicochemical properties on the protein surface. Using an efficient maximum clique algorithm, the program identifies proteins that share local structural similarities with the query protein and generates structure-based alignments of these proteins with the query. Structural similarity scores are calculated for the query protein's surface residues, and are expressed as different colors on the query protein surface. The algorithm has been used successfully for the detection of protein–protein, protein–small ligand and protein–DNA binding sites. Availability: The software is available, as a web tool, free of charge for academic users at http://probis.cmm.ki.si

Active Site prediction: http://www.scfbio-iitd.res.in/dock/ActiveSite_new.jsp Active Site Prediction of Protein server computes the cavities in a given protein.

DEPTH: http://mspc.bii.a-star.edu.sg/tankp/run_depth.html Depth measures the closest distance of a residue/atom to bulk solvent. Accessible surface area is a parameter that is widely used in analyses of protein structure and stability. However accessible surface area does not distinguish between atoms just below the protein surface and those in the core of the protein. In order to differentiate between such buried residues, we describe a computational procedure for calculating the depth of a residue from the protein surface. A detailed description of the computation of depth can be found here.

FINDSITE: http://cssb.biology.gatech.edu/findsite  FINDSITE is a threading-based binding site prediction/protein functional inference/ligand screening algorithm that detects common ligand binding sites in a set of evolutionarily related proteins. Crystal structures as well as protein models can be used as the target structures.

PocketDepth: http://proline.physics.iisc.ernet.in/pocketdepth/  A new depth based algortihm for identification of ligand binding sites. Abstract: Computational methods for identifying and predicting functional sites in protein structures are increasingly becoming important in structural biology and bioinformatics not only for understanding the function of the molecule in detail but also for structure-based design of possible ligands and potential drugs as well as modified protein molecules. While there are a few structure based prediction methods already available, given the complexity and diversity of protein structural types, there is still a great need to explore newer methods and concepts to develop accurate, versatile and efficient binding site prediction algorithms. We have developed a new method PocketDepth, for identification of binding sites in proteins. The method is purely geometry-based and proceeds in two stages, labeling of grid cells with depth factors followed by a depth based clustering that uses neighbourhood information. Depth is an important parameter considered during protein structure visualization and analysis but has been used more often intuitively than systematically. Our current implementation of depth reflects how central a given sub-space is to a putative pocket rather than reflecting merely how far away it is situated from the nearest external surface of the protein. We have tested the algorithm against PDBbind, a large curated set of 1091 proteins obtained from PDB. A prediction was considered a true-positive if the predicted pocket had at-least 10% overlap with the actual ligand. The prediction accuracy using this set was about 96%. Moreover, 87% of the true-positives were identified within the first five ranks for each protein, of which 55% are in the first rank itself. 77% of the predictions had at least 50% overlap with the experimentally observed ligand. High prediction rates were again observed, when the method was tested against a data-set of apo-proteins and compared with their respective ligand complexes. A comparison of our method with four other widely used methods for a chosen representative set is also presented.

GHECOM 1.0 : http://strcomp.protein.osaka-u.ac.jp/ghecom/  Grid-based HECOMi finder. A program for finding multi-scale pockets on protein surfaces using mathematical morphology

Pocket-Finder: http://www.modelling.leeds.ac.uk/pocketfinder/ is based on the Ligsite algorithm written by Hendlich et al. (1997). Pocket-Finder was written to compare pocket detection with our new ligand binding site detction algorithm Q-SiteFinder.

Screen2: http://luna.bioc.columbia.edu/honiglab/screen2/cgi-bin/screen2.cgi  is a tool for identifying protein cavities and computing cavity attributes that can be applied for classification and analysis. The original Screen, written by Murad Nayal, was dependent on the obsolete Irix platform and is no longer available. Screen2 was reengineered by Brian Y. Chen for efficiency and compatibility, and made accessible as a web service by Raquel Norel.

ConCavity: http://compbio.cs.princeton.edu/concavity/ Identifying a protein's functional sites is an important step towards characterizing its molecular function. Numerous structure- and sequence-based methods have been developed for this problem. Here we introduce ConCavity, a small molecule binding site prediction algorithm that integrates evolutionary sequence conservation estimates with structure-based methods for identifying protein surface cavities. In large-scale testing on a diverse set of single- and multi-chain protein structures, we show that ConCavity substantially outperforms existing methods for identifying both 3D ligand binding pockets and individual ligand binding residues. As part of our testing, we perform one of the first direct comparisons of conservation-based and structure-based methods. We find that the two approaches provide largely complementary information, which can be combined to improve upon either approach alone. We also demonstrate that ConCavity has state-of-the-art performance in predicting catalytic sites and drug binding pockets. Overall, the algorithms and analysis presented here significantly improve our ability to identify ligand binding sites and further advance our understanding of the relationship between evolutionary sequence conservation and structural and functional attributes of proteins. Data, source code, and prediction visualizations are available on the ConCavity web site (http://compbio.cs.princeton.edu/concavit​y/).

MultiBind and MAPPIS: http://bioinfo3d.cs.tau.ac.il/MultiBind/index.html Web servers for multiple alignment of protein 3D binding sites and their interactions. Analysis of protein–ligand complexes and recognition of spatially conserved physico-chemical properties is important for the prediction of binding and function. Here, we present two webservers for multiple alignment and recognition of binding patterns shared by a set of protein structures. The first webserver, MultiBind (http://bioinfo3d.cs.tau.ac.il/MultiBind), performs multiple alignment of protein binding sites. It recognizes the common spatial chemical binding patterns even in the absence of similarity of the sequences or the folds of the compared proteins. The input to the MultiBind server is a set of protein-binding sites defined by interactions with small molecules. The output is a detailed list of the shared physico-chemical binding site properties. The second webserver, MAPPIS (http://bioinfo3d.cs.tau.ac.il/MAPPIS), aims to analyze protein–protein interactions. It performs multiple alignment of protein–protein interfaces (PPIs), which are regions of interaction between two protein molecules. MAPPIS recognizes the spatially conserved physico-chemical interactions, which often involve energetically important hot-spot residues that are crucial for protein–protein associations. The input to the MAPPIS server is a set of protein-protein complexes. The output is a detailed list of the shared interaction properties of the interfaces.

MolAxis: http://bioinfo3d.cs.tau.ac.il/MolAxis/  is a tool for the identification of high clearance pathways or corridors which represent molecular channels in the complement space of proteins. It is extremely efficient because it samples the medial axis of the complement of the molecule, reducing the problem dimension to two, since the medial axis is composed of surface patches. It is designed to analyze proteins channels, calculate pore dimensions and analyze atom accessibility. MolAxis reads files in the standard Protein Data Bank format (PDB) containing a single frame or multiple frames generated by molecular dynamics (MD) simulations. MolAxis handles two distinct scenarios: It computes channels that connect a single point (like an inner chamber) to the bulk solvent, and it also computes transmembrane (TM) channels. MolAxis has a friendly web interface (see the Web Server tab). It also has a stand-alone version, statically compiled for linux, which can be downloaded from the Download tab.

fpocket: http://fpocket.sourceforge.net/ fpocket is a very fast open source protein pocket (cavity) detection algorithm based on Voronoi tessellation. It was developed in the C programming language and is currently available as command line driven program. A GUI is in development and mdpocket (fpocket on md trajectories) is out now. fpocket includes two other programs (dpocket & tpocket) that allow you to extract pocket descriptors and test own scoring functions respectively. Furthermore a nifty druggability prediction score has been added to fpocket recently. As the algorithm is very fast it can be used on a large scale level (PDB size for instance). If you use fpocket for publication, please cite : Vincent Le Guilloux, Peter Schmidtke and Pierre Tuffery, "Fpocket: An open source platform for ligand pocket detection", BMC Bioinformatics, 2009, 10:168

SuMo: http://sumo-pbil.ibcp.fr/cgi-bin/sumo-welcome allows you to screen the Protein Data Bank (PDB) for finding ligand binding sites matching your protein structure or inversely, for finding protein structures matching a given site in your protein. This method is neither based on aminoacid sequence nor on fold comparisons. Priority is given to biological relevance. SuMo uses its own heuristics for defining ligand binding sites. Automatically selected ligand binding sites are extracted from PDB structure files and stored into SuMo's own database.

CAVER: http://www.caver.cz/ CAVER is a software tool for analysis and visualization of tunnels and channels in protein structures. Tunnels are void pathways leading from a cavity buried in a protein core to the surrounding solvent. Unlike tunnels, channels lead through the protein structure and their both endings are opened to the surrounding solvent. Studying of these pathways is highly important for drug design and molecular enzymology.

SiteHound: http://scbx.mssm.edu/sitehound/sitehound-download/download.html SiteHound identifies protein regions that are likely to interact with ligands. The only input files required by SITEHOUND are the PDB file of the protein and the Molecular Interaction Field (MIFs) or Affinity Map for that protein structure structure. EasyMIFs is provided as a tool to calculate MIFs, alternatively AutoGrid (part of the AutoDock suite developed by Arthur Olson’s group at The Scripps Research Insitute) or the SiteHound-web server can be used to produce Affinity maps or MIFs. A python script named 'auto.py' is provided in the package and can be used to perform binding site identification in a fully automated fashion. The script will prepare the protein PDB file, compute a Molecular Interaction Fields map with EasyMIFs and carry out binding site identification using SiteHound. It is also possible to use EasyMIFs and SiteHound separately.

SURFNET: http://www.biochem.ucl.ac.uk/~roman/surfnet/surfnet.html The SURFNET program generates surfaces and void regions between surfaces from coordinate data supplied in a PDB file.

MSPocket: http://appserver.biotec.tu-dresden.de/MSPocket/ is an orientation independent program for the detection and graphical analysis of protein surface pockets [Zhu2011]. The approach is based on the solvent excluded surfaces generated by MSMS [Sanner1996].

Pfinder : http://pdbfun.uniroma2.it/pfinder/index.html  Pfinder is a bioinformatic method for the prediction of phosphate-binding sites in protein structures. Given a protein structure, Pfinder compares it with a set of 215 highly conserved structural motifs known to bind the phosphate moiety of phosphorylated ligands.

VOIDOO: http://xray.bmc.uu.se/usf/voidoo.html is a program for detection of cavities in macromolecular structures. It uses an algorithm that makes it possible to detect even certain types of cavities that are connected to "the outside world". Three different types of cavity can be handled by VOIDOO: Vanderwaals cavities (the complement of the molecular Vanderwaals surface), probe-accessible cavities (the cavity volume that can be occupied by the centres of probe atoms) and MS-like probe-occupied cavities (the volume that can be occupied by probe atoms, i.e. including their radii).

PocketPicker: http://gecco.org.chemie.uni-frankfurt.de/pocketpicker/index.html Background: Identification and evaluation of surface binding-pockets and occluded cavities are initial steps in protein structure-based drug design. Characterizing the active site's shape as well as the distribution of surrounding residues plays an important role for a variety of applications such as automated ligand docking or in situ modeling. Comparing the shape similarity of binding site geometries of related proteins provides further insights into the mechanisms of ligand binding. Results: We present PocketPicker, an automated grid-based technique for the prediction of protein binding pockets that specifies the shape of a potential binding-site with regard to its buriedness. The method was applied to a representative set of protein-ligand complexes and their corresponding apo-protein structures to evaluate the quality of binding-site predictions. The performance of the pocket detection routine was compared to results achieved with the existing methods CAST, LIGSITE, LIGSITEcs, PASS and SURFNET. Success rates PocketPicker were comparable to those of LIGSITEcs and outperformed the other tools. We introduce a descriptor that translates the arrangement of grid points delineating a detected binding-site into a correlation vector. We show that this shape descriptor is suited for comparative analyses of similar binding-site geometry by examining induced-fit phenomena in aldose reductase. This new method uses information derived from calculations of the buriedness of potential binding-sites. Conclusion: The pocket prediction routine of PocketPicker is a useful tool for identification of potential protein binding-pockets. It produces a convenient representation of binding-site shapes including an intuitive description of their accessibility. The shape-descriptor for automated classification of binding-site geometries can be used as an additional tool complementing elaborate manual inspections.

McVol: http://www.bisb.uni-bayreuth.de/index.php?page=data/mcvol/mcvol  This program was developed to integrate the molecular volume, solven accessible volume an Van der Waals volume of proteins using a Monte carlo algorithm. Based on this calculations, McVol is also able to identify internal cavities as well as surface clefts und fill these cavities with water molecules. Additionally, a membrane of dummy atoms can be placed as a disc atound the protein. The program is available under the Gnu Public Licence. A precompiled binary (X86) can be downloaded free of charge from here (when the associated paper is published).

Pharmacological compound databases

Zinc: http://zinc.docking.org/ Welcome to ZINC, a free database of commercially-available compounds for virtual screening. ZINC contains over 14 million purchasable compounds in ready-to-dock, 3D formats. ZINC is provided by the Shoichet Laboratory in the Department of Pharmaceutical Chemistry at the University of California, San Francisco (UCSF). To cite ZINC, please reference: Irwin and Shoichet, J. Chem. Inf. Model. 2005;45(1):177-82 PDF, DOI. We thank NIGMS for financial support (GM71896).

PubChem: http://pubchem.ncbi.nlm.nih.gov/ PubChem, released in 2004, provides information on the biological activities of small molecules. PubChem is organized as three linked databases within the NCBI's Entrez information retrieval system. These are PubChem Substance, PubChem Compound, and PubChem BioAssay. PubChem also provides a fast chemical structure similarity search tool. More information about using each component database may be found using the links in the homepage. Links from PubChem's chemical structure records to other Entrez databases provide information on biological properties. These include links to PubMed scientific literature and NCBI's protein 3D structure resource. Links to PubChem's bioassay database present the results of biological screening. Links to depositor web sites provide further information. A PubChem FTP site, Download Facility, Power User Gateway(PUG), Standardization Service, Score Matrix Service, Structure Clustering, and Deposition Gateway are also available. PubChem provides tips and example code to allow users to add PubChem search tool (free) in their sites. A PubChem publication site provides links to published articles. 

The DrugBank database: http://www.drugbank.ca/ is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains 6712 drug entries including 1441 FDA-approved small molecule drugs, 134 FDA-approved biotech (protein/peptide) drugs, 83 nutraceuticals and 5086 experimental drugs. Additionally, 4231 non-redundant protein (i.e. drug target/enzyme/transporter/carrier) sequences are linked to these drug entries. Each DrugCard entry contains more than 150 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data. DrugBank is supported by David Wishart, Departments of Computing Science & Biological Sciences, University of Alberta. DrugBank is also supported by The Metabolomics Innovation Centre, a Genome Canada-funded core facility serving the scientific community and industry with world-class expertise and cutting-edge technologies in metabolomics. 

ChemDB: http://cdb.ics.uci.edu/index.htm ChemicalSearch: Find Chemicals by Various Criteria Find a chemical by basic criteria like molecular weight and predicted logP, or by the more abstract notion of structural similarity. Virtual Chemical Space: Retro-Synthesis and Combinatorial Library Design Interactively deconstruct target compounds into component precursors and reconstruct similar building-blocks into combinatorial libraries representing the "virtual chemical space" near the target compound. Reaction Explorer: Synthesis Explorer and Mechanism Explorer Interactive system for learning and practicing reactions, syntheses and mechanisms in organic chemistry, with advanced support for the automatic generation of random problems, curved-arrow mechanism diagrams, and inquiry-based learning. Datasets: For Machine Learning and Searching Experiments Various available chemical datasets annotated with interesting properties to train and test machine-learning prediction and searching methods. Supplements: Articles and Support Material Online articles relating to the system with supplementary data and figures referenced in them.

 The Chapman & Hall/CRC Chemical Database is a structured database holding information on chemical substances. It includes descriptive and numerical data on chemical, physical and biological properties of compounds; systematic and common names of compounds; literature references; structure diagrams and their associated connection tables. The Dictionary of Natural Products Online is a subset of this database and includes all compounds contained in the Dictionary of Natural Products (Main Work and Supplements). The Dictionary of Natural Products (DNP) is the only comprehensive and fully-edited database on natural products. It arose as a daughter product of the well-known Dictionary of Organic Compounds (DOC) which, since its inception in the 1930s has, through successive editions, always been a leading source of natural product information. In the early 1980s, following the publication of the Fifth Edition of DOC, the first to be founded on database methods, the Editors and contributors for the various classes of natural products embarked on a programme of enlargement, rationalisation and classification of the natural product entries, while at the same time keeping the coverage up-to-date. In 1992 the results of this major project, which had grown to match DOC in size, were separately published in both book (7 volumes) and CD-ROM format, leaving DOC with coverage of only the most widely distributed and/or practically important natural products. DNP compilation has since continued unabated by a combination of an exhaustive survey of current literature and of historical sources such as reviews to pick up minor natural products and items of data previously overlooked. The compilation of DNP is undertaken by a team of academics and freelancers who work closely with the in-house editorial staff at Chapman & Hall. Each contributor specialises in a particular natural product class (e.g. alkaloids) and is able to reorganise and classify the data in the light of new research so as to present it in the most consistent and logical manner possible. Thus the compilation team is able to reconcile errors and inconsistencies. The resulting on-line version represents an extremely well organised dictionary documenting virtually every known natural product. A valuable feature of the design is that closely related natural products (e.g. where one is a glycoside or simple ester of another) are organised into the same entry, thus simplifying and bringing out the underlying structural and biosynthetic relationships of the compounds. Structure diagrams are drawn and numbered in the most consistent way according to best stereochemical and biogenetic relationships. In addition, every natural product is indexed by structural/biogenetic type under one of more than 1000 headings, allowing the rapid location of all compounds in the category, even where they have undergone biogenetic modification and no longer share exactly the same skeleton. There is extensive (but not complete) coverage of natural products of unknown structure, and the coverage of these is currently being enhanced by various retrospective searches. 

ChemSpider: http://www.chemspider.com/ is a free chemical structure database providing fast text and structure search access to over 26 million structures from hundreds of data sources.

ChemBank: http://chembank.broadinstitute.org/ is a public, web-based informatics environment created by the Broad Institute's Chemical Biology Program and funded in large part by the National Cancer Institute's Initiative for Chemical Genetics (ICG). This knowledge environment includes freely available data derived from small molecules and small-molecule screens, and resources for studying the data so that biological and medical insights can be gained. ChemBank is intended to guide chemists synthesizing novel compounds or libraries, to assist biologists searching for small molecules that perturb specific biological pathways, and to catalyze the process by which drug hunters discover new and effective medicines. ChemBank stores an increasingly varied set of cell measurements derived from, among other biological objects, cell lines treated with small molecules. Analysis tools are available and are being developed that allow the relationships between cell states, cell measurements and small molecules to be determined. Currently, ChemBank stores information on hundreds of thousands of small molecules and hundreds of biomedically relevant assays that have been performed at the ICG in collaborations involving biomedical researchers worldwide. These scientists have agreed to perform their experiments in an open data-sharing environment.The goals of ChemBank are to provide life scientists unfettered access to biomedically relevant data and tools heretofore available almost exclusively in the private sector. We intend for ChemBank to be a planning and discovery tool for chemists, biologists, and drug hunters anywhere, with the only necessities being a computer, access to the Internet, and a desire to extract knowledge from public experiments whose greatest value is likely to reside in their collective sum.

SuperDrug: http://bioinf.charite.de/superdrug/ Different resources exist for experimentally determined and computed three-dimensional (3D)-structures of low molecular weight structures but for approved drugs, no free, publicly accessible source of 3D-structures and conformers is available. Furthermore, for selection purposes or for correlation of structural similarity with medical application, the assignment of the Anatomical Therapeutic Chemical (ATC) classification codes to each structure according to the WHO-scheme would be desirable.RESULTS: The database contains approximately 2500 3D-structures of active ingredients of essential marketed drugs. To account for structural flexibility they are represented by 10(5) structural conformers. Here we present a web-query system enabling searches for drug name, synonyms, trade name, trivial name, formula, CAS-number, ATC-code etc. 2D-similarity screening (Tanimoto coefficients) and an automatic 3D-superposition procedure based on conformational representation are implemented. Drug structures above a similarity threshold as well as superimposed conformers can be retrieved in the mol- file format via a graphical interface. AVAILABILITY: For academic use the system is accessible at http://bioinf.charite.de/superdrug . The retrieval system requires the free browser-plugin 'chime' from MDL for visualization.

Ligand Expo: http://ligand-expo.rutgers.edu/ Ligand Expo (formerly Ligand Depot) provides chemical and structural information about small molecules within the structure entries of the Protein Data Bank. Tools are provided to search the PDB dictionary for chemical components, to identify structure entries containing particular small molecules, and to download the 3D structures of the small molecule components in the PDB entry. A sketch tool is also provided for building new chemical definitions from reported PDB chemical components.

Schrödinger has made available a set of the ligand decoys used in Glide enrichment studies. 1K Drug-Like Ligand Decoys Set: This collection of ligands was created by selecting 1000 ligands from a one million compound library that were chosen to exhibit "drug-like" properties. Creation and application of the ligand set is presented in the following publications: 

Friesner, R. A.; Banks, J. L.; Murphy, R. B.; Halgren, T. A.; Klicic, J. J.; Mainz, D. T.; Repasky, M. P.; Knoll, E. H.; Shaw, D. E.; Shelley, M.; Perry, J. K.; Francis, P.; Shenkin, P. S, "Glide: A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy", J. Med. Chem. 2004, 47, 1739-1749.

Halgren, T. A.; Murphy, R. B.; Friesner, R. A.; Beard, H. S.; Frye, L. L.; Pollard, W. T.; Banks, J. L., "Glide: A New Approach for Rapid, Accurate Docking and Scoring. 2. Enrichment Factors in Database Screening", J. Med. Chem. 2004, 47, 1750-1759.

The SuperLigands: http://bioinf-tomcat.charite.de/superligands/ The SuperLigands is an encyclopedia that is dedicated to a ligand oriented view of the protein structural space. The database contains small molecule structures occurring as ligands in the Protein Data Bank. SuperLigands integrates different information about drug-likeness or binding properties. A 3D superpositioning algorithm is implemented that allows screening all ligands for possible scaffold hoppers as well as a 2D similarity screen for compounds based on fingerprints.

ChEBI: http://www.ebi.ac.uk/chebi/ Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds. The term ‘molecular entity’ refers to any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity. The molecular entities in question are either products of nature or synthetic products used to intervene in the processes of living organisms.ChEBI incorporates an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified.ChEBI uses nomenclature, symbolism and terminology endorsed by the following international scientific bodies: 
Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) 

Molecules directly encoded by the genome (e.g. nucleic acids, proteins and peptides derived from proteins by cleavage) are not as a rule included in ChEBI. All data in the database is non-proprietary or is derived from a non-proprietary source. It is thus freely accessible and available to anyone. In addition, each data item is fully traceable and explicitly referenced to the original source.

Wednesday, February 1, 2012

Working with multiple urls from text file to tabs and vice versa

Copy or save multiple tabs to a text file

There are various methods; I use this one for Firefox:

1. Install send-tab-urls add-on to Firefox https://addons.mozilla.org/en-US/firefox/addon/send-tab-urls/

2. Open all the urls in different tabs.

3. Go to Files --> Send tab urls --> select your options and send to clipboard

4. Url open in all tabs will be copied to your clipboard, you can just paste them to a text file and save them or you can reuse them for opening in multiple tabs again.

Use: While looking for published articles on Pubmed or on Google, you need to save the relevant article list search wise as a text file, and then reuse them whenever required. At home I did not have accesses to various journals, so I used to save the links in text files and then at institute just open and save all the articles required.

Open multiple urls in different tabs from a text file

1. Copy all urs in a txt file

2. Open http://www.urlopener.com/index.php

3. Paste it in the space provided

4. Click submit and then click open all

Use: Opening all the urls in one go, for faster work. It would be logical to open a new window in Firefox to do this so that you won’t clutter your ongoing work with the multiple urls

Meanwhile I also found a very useful tool for common comparisons or lists and making Venn diagrams

Please cite: Oliveros, J.C. (2007) VENNY. An interactive tool for comparing lists with Venn Diagrams. http://bioinfogp.cnb.csic.es/tools/venny/index.html

Happy surfing.