welcome peeps !

welcome peeps !

Sunday, 3 November 2013

SMILES

Simplified Molecular Input Line Entry Specification

Assalamualaikum w.b.t. my fellow vicegerences of Earth! It is Wednesday and as usual we are here to equip ourselves with some IT knowledge. Today, we are going to learn a thing or two that will make you "smile". Why? We will see later on.
Surely, most of you readers are science based students and are taking organic chemistry as your required subjects. And we all know that in organic chemistry we have to deal with complicated and sometimes long and highly branched structures and formula of organic molecules. And have you ever wonder how these molecules and formulas are going to be written in computer?
Fortunately, there is a system that has made it simple. It is the Simplified Molecular Input Line Entry Specification or SMILES. SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensions models of the molecules. SMILES uses atomic symbols and a set of intuitive rules. It also uses hydrogen-suppressed molecular graphs (HSMG). In terms of a graph-based computational procedure. SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph.

There are two types of SMILES:

  1. Canonical SMILES
  2. Isomeric SMILES
Canonical SMILES refers to the version of the SMILES specification that includes rules for ensuring that each distinct chemical molecule has a single unique SMILES representation. A common application of Canonical SMILES is for indexing and ensuring uniqueness of molecules in a database.
Isomeric SMILES refers to the version of the SMILES specification that includes extensions to support the specification of isotopes, chirality and configuration about double bonds. A notable feature of these rules is that they allow rigorous partial specification of chirality.
  • E.g:
    • trans-1,2-dibromoethene: Br/C=C/Br
    • cis-1,2-dibromoethene: Br/C=C\Br
  • Chirality indicated by the "@" symbol.

SMILES Bonds

The bonds in a molecular formula are represented with symbols such as follows:
Single* -
Double =
Triple #
Aromatic :
*can be omitted

Examples:
Molecule SMILES representation
Ethene C=C
Chloroethene ClC=C
1,1-Dichloroethene ClC(Cl)=C
cis-1,2-Dichloroethene ClC=CCl
Trichloroethene ClC(Cl)=CCl
Perchloroethene ClC(Cl)=C(Cl)Cl

SMILES Branches

  • Branches are represented by enclosure in parentheses.
  • Can be nested or stacked
A branch cannot immediately follow a double or triple bond symbol. E.g:
  • C=(CC) is invalid, but
  • C(=CC)C or C(CC)-C are valid SMILES

SMILES Symbols

  • Are strings of alphanumeric characters and certain punctuation symbols
  • Termintes at the first space encountered when read left to right.
  • The organic subset : B, C, N, O, P, S, F, Cl, Br, I

Cyclic Structures

  • Aliphatic or nonaromatic carbon: C
  • Atom in aromatic ring : lowercase letter
  • Designate ring closure with pairs of matching digits, e.g:
    • c1ccccc1 is Benzene, whereas
    • C1CCCCC1 is Cyclohexane
  • Numbers indicate start and stop of ring.
  • Same number indicates start and end of the ring, entered immediately following the start/end atoms.
  • Only numbers 1-9 are used.
  • A number should appear only twice
  • Atm can be associated with 2 consecutive number, e.g, Naphthalene: c12ccccc1cccc2.

Here are some examples of SMILES notation for some molecules:


SMILES Charges

  • [H+] proton
  • [OH-] hydroxyl anion
  • [OH3+] hydronium cation
  • [Fe++] iron(II) cation
  • [NH4+] ammonium cation
When using SMILES, avoid two consecutive left parentheses if possible. Strive for the fewest number of possible branches.

Another application is SMILESCAS Database
http://esc.syrres.com/interkow/smilecas.htm
There are over 103,000 SMILES notations with input CAS Registry Number that leads to SMILES and thence to a structure search.

That's all for today's lesson and see you next time. Assalamualaikum w.b.t.
Bye!