Product: ChemDraw
Do you have any additional documentation on Name to Structure?
Converting Chemical Names to Structures with NameStruct
NameStruct is Revvity Signals comprehensive algorithm for converting English chemical names into chemical structure diagrams. It is designed to be as practical as possible, interpreting chemical names as they are actually used by chemists. In addition to recognizing most of the official rules and recommendations of the International Union of Pure and Applied Chemistry (IUPAC), the International Union of Biochemistry and Molecular Biology (IUBMB), and the Chemical Abstracts Service (CAS), NameStruct also recognizes the shorthand, slang, and neologisms of everyday usage. It is extremely tolerant of deviations from the "official" rules in regard to spaces, parentheses, and punctuation. Both regular names ("chlorobenzene") and CAS-style inverted names ("benzene, chloro-") are supported. In addition, it has an extensive algorithm for the identification of common typos (typing errors, such as "mehtyl") to increase the odds of generating structures for the names it is given.
NameStruct will try its best to generate a reasonable structures. However, in the case of unspecific ("methyl phenol") or ambiguous ("2-chloroethylbenzene") input, it will display only the single structure that it deems most likely. In cases such as these, the addition of locants ("3-methyl phenol") or additional parentheses ("2-chloro(ethylbenzene)") will help ensure that the structure generated will match the structure you had in mind. When names can be identified as ambiguous, the possible ambiguity may also be noted.
A description of an older version of NameStruct was published as Brecher, J. "Name=Struct: A Practical Approach to the Sorry State of Real-Life Chemical Nomenclature." J. Chem. Inf. Comput. Sci. 39, 6, 943-950.
This introduction is divided into several sections of increasing technical detail. Because nomenclature can get very technical very quickly, we have tried to separate the discussion into two levels. Each topic is described first in general terms with limited details; that description should be accessible to most chemists. The general description is then followed by a link to more-detailed information.
General capabilities of NameStruct
NameStruct is designed to be as complete, accurate, and fast as possible, so that it can be used with confidence to interpret one name or a million, whether those names follow any official published nomenclature recommendations or not. NameStruct recognizes over 90% of organic nomenclature recommendations. While the figure is somewhat lower for inorganic nomenclature, all general procedures and all recommendations that occur with any frequency in real life usage is recognized. Testing over many different data sources has shown that with a typical database that is a combination of well-formed names, trade names, trivial names, and incorrect or misspelled names, NameStruct will generate structures for about 70-90% of the names actually used. When running in batch mode, it can easily process over 30,000 names/minute, with an accuracy of greater than 99%.
General limitations of NameStruct
For the most part, the only limitations of NameStruct are ones that are mandated by common sense.
As a component of some versions of ChemDraw, the interactive version of NameStruct runs under the same configurations as ChemDraw, which means that it is available for most modern Windows and Macintosh computers. The NameStruct algorithm itself is implemented in standards-compliant C++, and should be readily convertible to other systems; we simply haven't bothered to do so. Please contact your Revvity sales representative if you think you need NameStruct provided for some other operating system or in some other configuration.
There are very few limitations to NameStruct in a chemical sense -- there are no limitations to the length of name interpreted or the number of atoms in the resulting structure. All elements from hydrogen to lawrencium are supported, even considering that some of them (such as helium) will rarely appear in chemical names simply because they form so few compounds.
NameStruct does have some limitations in the types of structures it can generate. It is extremely difficult to generate good-looking structural diagrams for several classes of substances, including biological macromolecules (proteins, etc.), highly bridged ring systems (buckyballs), and polymers. Rather than producing incomprehensible diagrams for these cases, NameStruct refuses to generate a structure.
More significant are the limitations inherent to chemical nomenclature itself. Many of the names in common use to describe various substances have no systematic component at all. These include many pharmaceuticals ("Viagra") and pesticides, dyes ("Brilliant Green"), and others. Although NameStruct can interpret many of these so-called trivial names, that is not its primary focus. Revvity Signals offers several other products that are more appropriate for the interpretation of collections of asystematic names.
General issues related to nomenclature and NameStruct
In general, NameStruct is designed to be as smart as a real chemist -- if a human chemist can understand what structure is intended by a given name, then NameStruct should manage to do so as well. Chemical names come in many styles. Some names truly do conform to published nomenclature recommendations, most commonly from IUPAC, IUBMB, or CAS. Clearly, NameStruct needs to recognize these names, but that's only the start of the problem.
First, each of those organizations has changed their recommendations over time. There is no way to know which version of the recommendations were used to generate any given name, and so NameStruct must recognize names produced by all versions.
Second, many chemical names use trivial forms that have long been forbidden by all of those nomenclature bodies. Nonetheless, these trivial names are used frequently enough that most chemists will recognize their meaning, and so NameStruct should as well.
Finally, even though those organizations have published nomenclature recommendations, the recommendations are extremely complex and difficult to understand. Even the best-intentioned chemist will often produce names that -- technically or egregiously -- violate the published norms. As long as the meaning of the name remains clear, NameStruct should be able to handle it.
To achieve this goal, NameStruct attempts to be as flexible as possible. Capitalization, font type, and font style are completely ignored. Most punctuation is ignored as well, regardless of whether it is used correctly as per the published recommendations or not. Spelling, similarly, is important only for clarity: NameStruct will interpret many common misspellings correctly, but proper spelling is much more likely to be interpreted correctly. More recently, extensive typo recognition has been added, increasing the likelihood that names will be interpreted correctly even if they are not technically correct.
Nomenclature classes handled by NameStruct
The shortest answer to the question, "What types of nomenclature can be recognized by NameStruct?" is "Just about everything!" However, we recognize that a longer answer might be slightly more useful. Accordingly, here is a more extensive discussion of the types of nomenclature supported.
NameStruct can recognize all types of parent structures including chains and rings, and, of course, various combinations of the two. A "parent structure" is the core unit that most chemists would recognize as the basic framework of a chemical structure, something like "ethane" or "benzene". Natural products are special kinds of parent structures that are commonly found in biological organisms. Stereochemistry is crucially important for natural products, but may be relevant in any compound containing an asymmetric double bond or tetrahedral center.
With occasional exceptions, most parent structures can be used as ligands -- that is, as fragments attached to some other parent structure ("methane" is a parent structure; "methyl" is a ligand). Most parent structures can also be converted to a variety of functional class derivatives. The most common functional class derivatives feature nitrogen and oxygen, and include amines and alcohols as well as many different types of acids, both organic (acetic acid) and inorganic (perchloric acid).
In addition to forming neutral derivatives, parent structures may become charged to form ions; those ions may, in turn, combine to form salts.
...and lots and lots of other nomenclature is also supported!
Many more examples of nomenclature supported, with the resulting chemical structures
Comments
0 comments
Article is closed for comments.