Simplified Molecular Input Line Entry Specification
Assalamualaikum w.b.t. my fellow vicegerences of Earth! It is Wednesday and as usual we are here to equip ourselves with some IT knowledge. Today, we are going to learn a thing or two that will make you "smile". Why? We will see later on.Surely, most of you readers are science based students and are taking organic chemistry as your required subjects. And we all know that in organic chemistry we have to deal with complicated and sometimes long and highly branched structures and formula of organic molecules. And have you ever wonder how these molecules and formulas are going to be written in computer?
Fortunately, there is a system that has made it simple. It is the Simplified Molecular Input Line Entry Specification or SMILES. SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensions models of the molecules. SMILES uses atomic symbols and a set of intuitive rules. It also uses hydrogen-suppressed molecular graphs (HSMG). In terms of a graph-based computational procedure. SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph.
There are two types of SMILES:
- Canonical SMILES
- Isomeric SMILES
Isomeric SMILES refers to the version of the SMILES specification that includes extensions to support the specification of isotopes, chirality and configuration about double bonds. A notable feature of these rules is that they allow rigorous partial specification of chirality.
- E.g:
- trans-1,2-dibromoethene: Br/C=C/Br
- cis-1,2-dibromoethene: Br/C=C\Br
- Chirality indicated by the "@" symbol.
SMILES Bonds
The bonds in a molecular formula are represented with symbols such as follows:| Single* | - |
| Double | = |
| Triple | # |
| Aromatic | : |
Examples:
| Molecule | SMILES representation |
|---|---|
| Ethene | C=C |
| Chloroethene | ClC=C |
| 1,1-Dichloroethene | ClC(Cl)=C |
| cis-1,2-Dichloroethene | ClC=CCl |
| Trichloroethene | ClC(Cl)=CCl |
| Perchloroethene | ClC(Cl)=C(Cl)Cl |
SMILES Branches
- Branches are represented by enclosure in parentheses.
- Can be nested or stacked
- C=(CC) is invalid, but
- C(=CC)C or C(CC)-C are valid SMILES
SMILES Symbols
- Are strings of alphanumeric characters and certain punctuation symbols
- Termintes at the first space encountered when read left to right.
- The organic subset : B, C, N, O, P, S, F, Cl, Br, I
Cyclic Structures
- Aliphatic or nonaromatic carbon: C
- Atom in aromatic ring : lowercase letter
- Designate ring closure with pairs of matching digits, e.g:
- c1ccccc1 is Benzene, whereas
- C1CCCCC1 is Cyclohexane
- Numbers indicate start and stop of ring.
- Same number indicates start and end of the ring, entered immediately following the start/end atoms.
- Only numbers 1-9 are used.
- A number should appear only twice
- Atm can be associated with 2 consecutive number, e.g, Naphthalene: c12ccccc1cccc2.
Here are some examples of SMILES notation for some molecules:
SMILES Charges
- [H+] proton
- [OH-] hydroxyl anion
- [OH3+] hydronium cation
- [Fe++] iron(II) cation
- [NH4+] ammonium cation
Another application is SMILESCAS Database
http://esc.syrres.com/interkow/smilecas.htm
There are over 103,000 SMILES notations with input CAS Registry Number that leads to SMILES and thence to a structure search.
That's all for today's lesson and see you next time. Assalamualaikum w.b.t.
Bye!

