Computer-engineered designs

Exploring the use of computational designed nanomedicine

Methods Development

Accelerated virtual wet lab

Drug development is expensive and laborious, largely due to the costs of synthesis and physical testing of compounds, the vast majority of which are not viable as pharmaceuticals for the purpose under investigation. In other words, a huge amount of the cost associated with developing drugs is in making and testing compounds that ultimately fail.

The Caulfield team aims to increase the efficiency of drug development by implementing in silico methods for initial screenings, then testing a reduced subset of promising compounds in a physical setting. Because hundreds of thousands, or even millions, of compounds can be screened in silico, the lab's approaches offer promise for accelerating the progression of new drugs from bench to bedside.

Dr. Caulfield's lab uses molecular modeling and docking methods as an accelerated virtual laboratory to screen compounds. Model preparation, parameterization, scoring function and ranking are all portions of the pipeline that can be optimized to make the virtual screening more sensitive and accurate.

Algorithm development, programs and analytics for research

In the Caulfield lab, algorithms are developed to allow for improvements in:

  1. Conformational enhanced sampling and cryogenic electron microscopy (cryo-EM) fitting
  2. Quantum docking
  3. Drug discovery and design

Some of these algorithms rely upon machine learning and deep learning techniques.

Conformational enhanced sampling

For example, we utilize conformational sampling for capturing rare or complex energy landscape, energetics of complex formation and stabilities, such as drug-target, protein-protein, nucleic acids-drug and nucleic acids-protein. Also, the lab has pioneered an entropic manipulator method called Maxwell's demon molecular dynamics (see Animations). It is important to realize that the goal of molecular dynamics simulations (MDS) is to investigate physically realistic motions of biologically relevant molecules. Since these investigations can be conducted at the atomic level of resolution, they are complex. Furthermore, global molecular motions that are meaningful, such as a conformational change, are built up from millions of atomic-level motions.

Conformational sampling is the usage of computational experiments of molecular motions to observe large-scale events like an allosteric transition. It typically takes months or longer to accrue enough data from conformational sampling to observe the dynamical feature of interest. More exotic molecular motions or rare events may be nearly impossible to observe by merely sampling random atomic motions. There are a number of methods to accelerate conformational sampling, under the collective title of enhanced sampling. As with most things, they all have their unique benefits and drawbacks.

In some cases, one has two experimentally determined structures, but no knowledge of how the molecule transitions between them. This type of transition can be explored with enhanced sampling molecular simulations. Dr. Caulfield pioneered the first entropy-based enhanced sampling method, Maxwell's demon molecular dynamics (MdMD). Most enhanced sampling (biasing) methods utilize external forces or otherwise perturb interatomic bonds, thereby they utilize enthalpy to exert control on the conformational sampling. MdMD applies no external forces, however, uses a series of short unbiased simulations called sprints. If the global variable assigned changes in the direction compared to the starting state of sprint, then the sprint has brought the structure closer to the goal and is kept. Sprints that deviate are discarded and retried, which yields a different result based on random Boltzmann distribution within each sprint. In this manner the transition from state A to state B can be investigated in realistic physics, as well as feasible central processing unit (CPU) and human calendar time.

This method was successfully adopted for cryo-EM fitting (see Animations).

Quantum docking

Dr. Caulfield's lab, using the MdMD method, introduced a new kind of global variable between sprints to complete quantum mechanics (QM)-based calculations on drug-protein interface with a subset of atoms surrounding the ligand. These QM calculations give true meaningful estimation of interaction chemistry and energies (Hartree-Fock) for the ligand, which then dictates the iteration of subsequent MDS sprint. Using this method, the lab has achieved docking scores 3-to-4 magnitudes greater in accuracy than existing commercial-grade docking software. Our quantum docking program qDockMdMD can allow for the very final set of lead compounds to be adjusted for computational medicinal chemistry tweaking of R-groups and bioisoteres for improvements to bind the protein target (see Animations).

Drug discovery and design

In addition to quantum docking, the lab has pioneered taking layers of biological and chemical data to create Z-scoring matrices for drug decomposition and reconstruction to build sets of small testable batches of compounds for screening (virtual to actual screening). De novo drug design comes from combining efforts from our partner labs on drug discovery projects using these above methods and existing software that is industry adopted.

Machine learning is one of the branches of artificial intelligence that uses systematic classification or grouping based on an internal logical framework in order to learn information about data and discover patterns therein. There are many different approaches to machine learning, such as support vector machines (SVMs). In general, machine learning algorithms take labeled input data (training data), for which the classification is known, for instance coins that are heads and tails. Then, construction of a logical model may be generated that explains the input data via a classification system, which may not resemble how human brains reconcile the classification.

The lab's machine learning approaches can rely upon K-means analyses within 3D quantitative structure-activity relationship (3D-QSAR), Maxwell's demon for compound iterations, and fingerprints (chemical space) using contextual or other methods. These methods combined with the feedback from actual data allow the lab to predict EC50 that can be combined with shape, docking and other metrics to give a combined weighted scoring system for drug design stages.

How does this machine learning help? It is used to classify compounds as potentially potent binders or selective binders for drug discovery and helps reduce the number of compounds that need to be physically synthesized and tested. It could also be used to group snapshots (instantaneous atomic coordinates) of molecular dynamics trajectories to determine conformational states visited. This can then be analyzed with statistics to indicate which states are more prevalent in solution and corroborated with experimental data such as kinetics. In these cases, experimental data can then be fed into the system as external weights to improve the algorithm.