Visualizing the State of Ugi Reaction Experimentation
November 22nd, 2008
Reading through a sample of the UsefulChem Experiments / Reactions (http://usefulchem.wikispaces.com/All+Reactions) reveals the significance of the Ugi reaction. “The Ugi reaction is a multi-component reaction in organic chemistry involving a ketone or aldehyde, an amine, an isocyanide and a carboxylic acid to form a bis-amide (http://en.wikipedia.org/wiki/Ugi_reaction).” The “Applications” section of the Wikipedia page is also interesting as it describes the significance of the reaction in terms of variations of the input components. This creates a complex combinatorics problem. The space of possible input molecules may be large due to the range of molecules that can fall into the four input component categories. The input molecules should be structurally similar and they should therefore have similar functional characteristics making it possible to use predictive models to describe the products of the reaction. I am not a Chemist but that is my naive understanding of the Ugi reaction.
A problem for the researcher then is to explore the combinatorial space with experimentation and then to relate the results to a theoretical understanding. Given the size of the combinatorial space a challenge is to perform enough experiments to get good coverage of the space. One view of the space could be defined in theoretical terms by identifying the range of all possible input molecules. But what is the breadth of coverage that has already been achieved though experimentation?
UsefulChem Experiment 099 (http://usefulchem.wikispaces.com/Exp099) provides a good example of a Ugi reaction. Here the links to ChemSpider provide enough information for a spider to infer that the document describes a Ugi reaction. A spider could also identify the four input components. The ketone or aldehyde component is benzaldehyde. The amine is furfurylamine. The isocyanide is tert-butyl isocyanide and the carboxylic acid is boc-glycine. The algorithms that would allow such a spider to make these inferences are an opportunity for research. In this best-case scenario where UsefulChem has linked to ChemSpider the ChemSpider record could be pulled via their web services layer. The InChI string should hold enough structural information for an algorithm to verify that the molecules fall into the necessary input component categorizations.
This hypothetical spider could then collect reaction information from open-notebook entries, journal articles, patents and other reports. In the cases where a Ugi reaction and its component molecules can be identified they might be cross-referenced with ChemSpider to collect property information. Such a dataset could be used to create a map of the coverage of the Ugi reaction space. The map could be annotated with references to the source reports that describe the reaction.
There are many possibilities for how such a map could be drawn. One technique might be to create four scatter plots, one for each set of the input components. A dimensionality reduction technique such as multi-dimensional scaling might be performed on the property information for each component set to get it down to x and y coordinates. Rendering all four scatter plots in a three-dimensional space would allow for lines to be drawn connecting the points across the plots. These lines would represent individual Ugi reactions that have been performed and documented. An example of this kind of visualization is the “3D Parallel Coordinates” view as described at (http://bdtnp.lbl.gov/Fly-Net/content/bid/pcx/ParallelCoordinates/ParallelCoordinates.html).
A 3D Parallel Coordinates View, or similar such visualization, of the state of Ugi reaction experiments could be a useful tool for researchers. The visualization would allow a researcher to situate their experiment in the context of existing work. This would provide a unique perspective that could not be obtained by simple keyword search or other existing literature and web search techniques. It would also provide insight to a modeler who was trying to refine theoretical chemical combinatorics models with experimental results. Additionally it might help to highlight outliers and other unexpected results that should be attended to.
November 23rd, 2008 at 5:11 am
In terms of accessing the virtual space of possible Ugi products from lists of starting materials, Rajarshi Guha has created a web service for this using SMILES strings:
http://rguha.ath.cx/~rguha/cicc/combilib/vl
This is how we create virtual libraries and evaluate them in silico for potential as inhibiting certain enzymes of interest (like malarial parasite’s falcipain-2). The compounds are then ranked as lists of SMILES codes.
For example see DEXP-014: http://usefulchem.wikispaces.com/D-EXP014
Note that the vast majority of these compounds have not been characterized in the literature – as we make them and characterize them we upload that info onto ChemSpider. Eventually we publish in traditional peer-reviewed journals some of the information.