FBLD by Protein Local Similarity

One of my more recent papers, based on research and software development undertaken at MEDIT SA, has just gone to press. Entitled “Computational Fragment-Based Approach at PDB Scale by Protein Local Similarity” it can be found in the Journal of Chemical Information and Modeling.

In a nutshell, this paper explores extensions to our MED-SuMo technology for Fragment-Based Lead Discovery (FBLD). It provides the results for two simple but representative drug design case studies. A third, more significant, case study will be covered in a separate paper that is currently under revision.

Entrez PubMed ACS Journals

This paper, as is often the case with those written in commercial environments, has acquired a lot of authors!

At least in this case, I could point to a specific contribution from each author, rather than having a collection of managers, and their managers, who might have proof-read the first draft.

In the paper, we define an object that describes the structure of a chemical fragment coupled with the MED-SuMo representation of its protein-bound environment, the MED-Portion. A database of these MED-Portions corresponding to the entire set of ligands seen in the Protein Data Bank (PDB), a repository of publicly available macromolecular structures, is mined using the MED-SuMo technology. MED-SuMo is based on an exceptionally rapid and precise search procedure for locating protein surface patches according to their putative chemical interactions. A MED-SuMo search is not dissimilar to a pharmacophore screen, but instead applied to known protein surfaces. The collated MED-Portions populate the target binding side, and their corresponding chemical fragments are applied to a de novo design procedure. In the paper, this step is performed via a procedure that resembles the BREED algorithm for generating hybrid molecules from sets of molecule aligned in a binding site. There are a number of alternative methods that we could apply, although none of those as described in this paper. Our hybridisation, filtering and analysis tools, including those used for this work, are packaged in a cheminformatic tool named MED-Hybridise.

This MED-Portions/MED-SuMo/MED-Hybridise protocol was applied to a Protein Kinase (as that seems to be the default system for demonstrating drug-discovery procedures) and a G-Protein Coupled Receptor (as those are the single most valuable drug targets to date, and they have historically posed intractable problems for structure-based drug design methods, as minimal structural data was available.). In both cases we retrieve known active molecules, proving that at least at some level the procedure gives relevant structures. We also generate novel molecules, presented in their predicted docked posses, that we postulate would also be active. These would be ideal starting points for further rational drug design.

One topic that the paper doesn’t cover is the technology stack used for our software. This is interesting, as it is quite different from the technology stack traditionally used for computational chemistry software. For each part of the system, “the best tool for the job” was selected, rather than sticking with one technology or programming language and bending it for use where better alternatives are available. In each case, we’ve selected validated technologies in preference to “the latest trend” and, where possible, we avoid being locked onto one platform or operating system. MED-SuMo itself is a client-server system. The server is written in OCaml, a functional language that is closely related to Microsoft’s new F# language that is gaining rapid traction. Currently, we only support the MED-SuMo server on Linux platforms, although in the past it was successfully run on Windows, and porting to non-Linux UNIX platforms should be trivial. We use Linux in-house, and no customer has yet asked for support on a different platform! Typically, we use SQLite as the database engine. The MED-SuMo server can be “programmed” using Lua scripts – just like many commercially available PC games! It also has a management interface built using PHP5. We have a series of MED-Sumo clients, but the one we use in-house is a Windows GUI built using Microsoft’s .Net platform. For 3D visualisation in this GUI we use an ActiveX control using C++ and OpenGL, although this is due to be replaced by a cross platform solution in the near future. We wouldn’t choose to use ActiveX for new developments at MEDIT, but that component was an artefact of an earlier project that we decided to reuse. Other MED-SuMo clients are available, but more importantly, it is easy to integrate into any proprietary system via a simple API that encapsulates the client-server communications. MED-Hybridise is also developed using C# on .Net, and it is based on our proprietary cross-platform .Net (obviously using Mono for most platforms!) cheminformatics assembly. MED-Hybridise comes in both GUI and command-line editions.

When we began developing our C#/.Net cheminformatics code, we were very concerned that it would be too slow for any computationally expensive calculations. At that time, we used another proprietary C++ library for any “chemistry” in our applications. As it happened, we needn’t have worried. The .Net runtime is amazingly fast compared to our initial expectations for a virtual machine versus native code from C++. I guess our judgement was clouded by the abysmal performance from which Java was traditionally famed! An example of performance for SMARTS matching, a typical cheminformatic task, for “realistic” queries against drug-like molecules shows the our C#/.Net code is approximately 30% slower that our native and fully optimised C++ version. That is perfectly acceptable given the huge developer productivity gain. Furthermore, as our GUI’s are all developed for .Net, the cost of passing data across the managed-unmanaged boundary, if we wanted to continue using the C++ code for computations, mitigates much of the small performance benefit.

We have already sold some commercial licenses for our software that implements this technology, with the first deliveries to be made at the start of April. I’d better get back to my compiler now!

For further information, visit:

Finally, don’t ask me to pronounce my co-author’s names! I think I get as far as “Fabrice” before I start making mistakes.

0 comment