Professor Wolfgang Robien, University of Vienna, Austria, and Wiley-VCH are currently collaborating to build a 13C NMR spectroscopic data verification tool that is based on the CSEARCH robot referee system, developed by Professor Robien himself. Here he talks to Dr. Vera Koester for ChemViews Magazine about his motivations for developing the system and how it came into being.
What is the CSEARCH robot referee system?
The CSEARCH robot referee is a piece of software that allows a very detailed consistency check between a given chemical structure and its 13C NMR spectrum. Similar to the classification scheme used during peer-reviewing, the result of its analysis is either “Accept as it is”, “Minor revision necessary”, “Major revision necessary”, or “Reject”.
“Accept as it is” is a very rare case in which all signal assignments seem to be correct and, furthermore, there is no inconsistency in the underlying data contained in the knowledge base that are used for this specific request. A “Minor revision” is obtained when either a very minor assignment error within the given data is detected or an inconsistency is found in the underlying database taken from the literature. “Major revision” and “Reject” occur when a massive error in the particular request is detected. The decision engine is deliberately adjusted in a very precise way to provide maximum support to the user and to avoid as many errors as possible.
What is the knowledge base?
The knowledge base consists of some 340,000 well-assigned 13C NMR spectra taken from the public domain chemical literature.
The strategy used for this automatic structure verification starts quite simply with symmetry detection and the detection of stereocenters based on the given structure proposal, followed by a comparison of this information with the given spectral data. In the event that exchangeable assigned lines are given, the underlying logic is checked in greater detail, followed by a comparison between the experimental carbon chemical shift values versus the predicted ones. The differences massively contribute to the result of the final classification. Afterwards the two-dimensional topology of the given structure proposal is compared against the structures in the knowledge base to detect already known compounds. Sometimes the interpretation of the spectral data leads to incorrect structure proposal – the correct structure proposal might be already known.
To avoid publishing a “new”, but incorrect structure, the spectrum is also used as a query that will lead to possible alternative structure proposals for the given peak-list. A comprehensive description showing all the details of the technology together with a large number of erroneous data, assignments and structure proposals is given in [1]. In principle everything that can be automatically checked to verify the consistency between the given structure proposal and the experimental peak-list is analyzed in extensive detail.
Why did you develop CSEARCH?
I am frequently asked to review manuscripts with massive summaries of NMR-spectroscopic data. NMR-spectroscopy is a very important – maybe the most important – method for structure elucidation and structure verification. Despite this, NMR-spectroscopic data are usually summarized in the experimental section of a journal article or provided in its supporting information. Often they lack signal assignments.
A closer look into the chemical literature reveals that many chemical structures are wrong. This is proven by the number of errata that are published afterwards. When analyzing these errors, in many cases they seem to be quite trivial. They can be easily avoided by systematic application of computer-supported technologies. This is exactly what the CSEARCH robot referee does.
And your are currently collaborating with Wiley-VCH?
I am very proud of the successful cooperation with Wiley – one of the big players in publishing chemistry-related journals. Implementing the CSEARCH technology into their publication process is exactly the place, where this methodology should exist.
The intention of the CSEARCH robot referee is to avoid the most common errors found in the literature. Therefore, this technology should be applied when you see your experimental data for the very first time. Furthermore, every published dataset should be labeled with an official seal that indicates its quality. The best place for this is during the manuscript submission process.
Who can/or should use it?
Everybody who makes use of 13C NMR spectroscopy should use an automatic workflow like this for automatic structure verification whenever they are seeing their spectral data for the first time and again during the preparation of their manuscript.
Access to this technology is already supported by important software-vendors in this field. Bruker’s TOPSPIN and CMC-se software suite as well as Mestrelab’s MNova program allow researchers direct access to the CSEARCH robot referee with a few mouse clicks. This is exactly the right process to support generating high-quality datasets that can be published immediately and thus bring added value to the community as a high-quality reference material.
What kind of training do you need?
A useful piece of software must be intuitive for the user with respect to understanding the necessary input data and interpreting the obtained output. In other words, no special training is needed for CSEARCH.
What do you think the organic lab of the future will look like?
I am quite sure that automatization will be much more important than it is now – especially with respect to the teaching of laboratory skills. One step in the right direction is the increasing importance of desktop NMR spectroscopic equipment in a student’s lab.
What skills will the chemist of the future need?
A better understanding of computerized workflows supported by simpler programming schemes and the ability to extract reliable information from the vast amount of data we already have and that we will produce over the next few decades.
Can you say something about your research?
My research deals with algorithm development for spectrum prediction, spectrum interpretation, structure verification, and structure elucidation.
What got you interested in chemistry?
I started studying mathematics and chemistry at the University of Vienna, Austria. For me, chemistry was more fascinating because of the combination of intellectual challenges combined with manual skills. But, as you can see from my ongoing research, I couldn’t keep away from mathematics.
So what drives you?
I want to ‘translate’ my curiosity into algorithms that might be useful to the community
What other interests do you have?
I sit in front of my computers for many, many hours at a time. Therefore, sports activities are important to me. I played handball for 12 years, and later I focused on climbing and ice climbing for nearly 15 years. Now I prefer hiking and skiing.
Thank you very much for the interview.
Wolfgang Robien, born in 1956 in Vienna, Austria, studied mathematics and chemistry at the University of Vienna. He obtained his Ph.D. there in 1981. After postdoctoral research at Stanford University, CA, USA, he returned to Vienna and received his habilitation from the University of Vienna in 1988. Robien has been an associate professor at the University of Vienna since 1997.
- CSEARCH Robot Referee: Automatic Interpretation of 13C NMR Spectra
- [1] Wolfgang Robien, Progress in the Chemistry of Organic Natural Products in A Critical Evaluation of the Quality of Published 13C NMR Data in Natural Product Chemistry (Eds: A. Douglas Kinghorn, Heinz Falk, Simon Gibbons, Jun’ichi Kobayashi), Springer, Germany, 2017, 137–215. ISBN: 9783319497129
CSEARCH-Robot-Referee also accessible via:
- Bruker’s portfolio of NMR spectroscopic software
- Mnova (provides top-quality software tools to the scientific research community)