dP - Blog  


Visualizing the State of Ugi Reaction Experimentation

November 22nd, 2008

Reading through a sample of the UsefulChem Experiments / Reactions (http://usefulchem.wikispaces.com/All+Reactions) reveals the significance of the Ugi reaction.  “The Ugi reaction is a multi-component reaction in organic chemistry involving a ketone or aldehyde, an amine, an isocyanide and a carboxylic acid to form a bis-amide (http://en.wikipedia.org/wiki/Ugi_reaction).”  The “Applications” section of the Wikipedia page is also interesting as it describes the significance of the reaction in terms of variations of the input components.  This creates a complex combinatorics problem.  The space of possible input molecules may be large due to the range of molecules that can fall into the four input component categories.  The input molecules should be structurally similar and they should therefore have similar functional characteristics making it possible to use predictive models to describe the products of the reaction.  I am not a Chemist but that is my naive understanding of the Ugi reaction.

A problem for the researcher then is to explore the combinatorial space with experimentation and then to relate the results to a theoretical understanding.  Given the size of the combinatorial space a challenge is to perform enough experiments to get good coverage of the space.  One view of the space could be defined in theoretical terms by identifying the range of all possible input molecules.  But what is the breadth of coverage that has already been achieved though experimentation?

UsefulChem Experiment 099 (http://usefulchem.wikispaces.com/Exp099) provides a good example of a Ugi reaction.  Here the links to ChemSpider provide enough information for a spider to infer that the document describes a Ugi reaction.  A spider could also identify the four input components.  The ketone or aldehyde component is benzaldehyde.  The amine is furfurylamine.  The isocyanide is tert-butyl isocyanide and the carboxylic acid is boc-glycine.  The algorithms that would allow such a spider to make these inferences are an opportunity for research.  In this best-case scenario where UsefulChem has linked to ChemSpider the ChemSpider record could be pulled via their web services layer.  The InChI string should hold enough structural information for an algorithm to verify that the molecules fall into the necessary input component categorizations.

This hypothetical spider could then collect reaction information from open-notebook entries, journal articles, patents and other reports.  In the cases where a Ugi reaction and its component molecules can be identified they might be cross-referenced with ChemSpider to collect property information.  Such a dataset could be used to create a map of the coverage of the Ugi reaction space.  The map could be annotated with references to the source reports that describe the reaction.

There are many possibilities for how such a map could be drawn.  One technique might be to create four scatter plots, one for each set of the input components.  A dimensionality reduction technique such as multi-dimensional scaling might be performed on the property information for each component set to get it down to x and y coordinates.  Rendering all four scatter plots in a three-dimensional space would allow for lines to be drawn connecting the points across the plots.  These lines would represent individual Ugi reactions that have been performed and documented.  An example of this kind of visualization is the “3D Parallel Coordinates” view as described at (http://bdtnp.lbl.gov/Fly-Net/content/bid/pcx/ParallelCoordinates/ParallelCoordinates.html).

A 3D Parallel Coordinates View, or similar such visualization, of the state of Ugi reaction experiments could be a useful tool for researchers.  The visualization would allow a researcher to situate their experiment in the context of existing work.  This would provide a unique perspective that could not be obtained by simple keyword search or other existing literature and web search techniques.  It would also provide insight to a modeler who was trying to refine theoretical chemical combinatorics models with experimental results.  Additionally it might help to highlight outliers and other unexpected results that should be attended to.

Research Projects

November 18th, 2008

Below is a list of some of the research ideas I am currently working on.  These ideas all need further refinement to get more specific research questions and to build out the theoretical orientations, methodologies and evaluation strategies that will be used.  If these are subjects that interest you please let me know.

An Information Theoretic Framework for Visual Analytic Reasoning

This project is concerned with analyzing the structural properties of associative networks that are related to knowledge structures.  The knowledge structures may be concept maps representing the conceptualizations of an individual about a domain or they may be aggregate networks that describe conceptualizations across multiple actors within a domain.  The associative networks are not limited to the concept maps, but are used to aggregate heterogeneous data into a single integrated representation.  This representation is then further refined by relating the higher level concepts with their supporting data.  There are two primary research questions being pursued under this project:

  1. How can temporal patterns be distinguished from other structural patterns?
  2. Can can information metrics be expanded to integrate latent semantics of the information with the structural and temporal properties?

Answers to these questions will likely take the form of measures and algorithms that relate patterns in such structures with sense-making and analytical reasoning processes.

Post-hoc Analysis of the VAST 2008 Challenge

Participation in the VAST 2008 Challenge was a rich experience that provided a lot of data from a longitudinal, purposeful application of Visual Analytics tools.  With the event now over the focus can now shift from answering the questions of the Challenge itself to a more reflective posture of analyzing the process that was used.  It is hoped that a post-hoc analysis will help to identify opportunities for future research.  It is also expected that opportunities for generalizing the practice to other domains will be found.  Based on [ Liu, Z.; Nersessian, N. J. & Stasko, J. T., Distributed Cognition as a Theoretical Framework for Information Visualization, IEEE Transactions on Visualization and Computer Graphics, 2008, 14, 1173-1180 ] it appears the that distributed cognition framework will provide a useful perspective for analysis of the results.  The primary research question here is:

  1. What can a post-hoc analysis of the VAST 2008 Challenge teach us about the role of Visual Analytics in Distributed Cognition?

Knowledge Structure in Experimental Chemistry

Open notebook science (http://en.wikipedia.org/wiki/Open_notebook_science) offers a new and exciting source of data that has the potential to tells us a lot about how science is done.  Bibliometric research has been very productive for the study of knowledge domains.  Bibliometricians use formal research publications and their citations as the unit of analysis.  The act of citing a work is a behavioral indicator that hints at the intentions of the author.  With open notebook science the digital laboratory notebook record is now available as a unit of analysis.  Exploratory research in this area can help us answer the following questions:

  1. Do open notebook entries include new behavioral indicators that can be useful for analyzing knowledge structures?
  2. How can information science and systems take advantage of open notebook entries to support the hypothesis formulation and discovery processes in Chemistry?

All three of these projects tie together.  The data from a “Post-hoc Analysis of the VAST 2008 Challenge” and from “Knowledge Structure in Experimental Chemistry” might serve to inform the development of a robust “Information Theoretic Framework for Visual Analytic Reasoning.”  The Information Theoretic Framework might be combined with the Distributed Cognition Framework so that we have a way of studying how knowledge structures develop and change over time.

The Social Brain

November 13th, 2008

I had the opportunity to attend a great talk by Professor Clive Gamble at the University of Pennsylvania Museum of Archeology and Anthropology:

“Breaking the Mind Barrier: The Archeology and Evolution of Our Social Brain” with Professor Clive Gamble, Co-Director British Academy Centenary Project, Thursday November 13, 2008 6:00 PM, The University of Pennsylvania Museum of Archeology and Anthropology, http://www.museum.upenn.edu/gamble

Ideas I collected from the presentation
Restated (maybe misstated) by me, not quotations by the presenter.

Socialization mediated by tools / technology.  Emotion as a basis for social cohesiveness.  Getting group size to 150 required language.  Developed along with increase in brain size.  Childhood development; 3-year-olds think that other minds are thinking the same as theirs.  Five-year-olds recognizes the existence of other minds that think differently.  Understanding the existence of other minds that think differently is necessary for the development of empathy, guilt and next order emotions.  These emotions are what create group cohesiveness.  Therefore in order to have a group size of 150 the mind must be developed to recognize the existence of other minds that think differently.  This is a higher order development above self-awareness.

  1. Self-Awareness
  2. Three-year-old children who recognize the existence of another mind.  (This is why 3 year-old children can’t lie).
  3. Five-year-old children who recognize the existence of another mind that thinks differently than their own.

My Thoughts

One consequence of this evolutionary development is the importance of the affective aspects of social computing.  Technology mediated socialization is based on the emotions that hold people together.  MySpace and Facebook have obvious affective components for maintaining cohesiveness of a group.  This is particularly evident in adolescents’ use of the system for socialization.  Complains from friends that I should stop posting work stuff and only post fun stuff to Facebook is consistent with this.  In posting work-related content I am inconsistent with the more affective kinds of bonds that form around personal content. How can this inform the design of collaborative support for open-notebook science?

Chemistry sites are potentially less emotive than Facebook style collaboration for users who treat them as pure reference systems yet chemistry sites are potentially more emotive than Facebook for users who are moving the field forward.  To make scientific collaboration successful it may be necessary to engage users in a personal level in debates and opinions.

Visualization and knowledge maps can support this kind of engagement with the content by helping to make the debates more explicit.  This will allow users to situate themselves within the group’s emotional structure rather than simply the hierarchical or relation-clustering structures.  Ultimately this can lead to users having stronger feeling about their contributions in the group.  How does the limit of 150 relationships fit into this?  It was interesting to see this number 150 come up three days in a row: Tuesday while reading the November issue of Communications of the ACM, Wednesday during the meeting with Professor Bradley, Thursday on the slides during this presentation.  It is probably worth digging deeper to see if this number is being used correctly or if it has taken on a life of its own in scientific discourse.

Hello world!

November 12th, 2008

Welcome to WordPress. This is your first post. Edit or delete it, then start blogging!


Valid XHTML 1.0 Strict Powered by Fedora Dublin Core Used Here

Metadata associated with this resource: .rdf">http://www.donpellegrino.com.rdf
Copyright © 2008 Don Pellegrino All Rights Reserved.