As some of you know, RDKit is an open-source toolkit for cheminformatics which is widely used in the bioinformatics research. One of its features is the conversion of molecules SMILES strings to a 2D and 3D structures.
The module interface presents two tabulations: Manage SMILES and Replace fragments. I will present these two sections one by one as they are totally independent. In each tabulation, you can find a menu bar that contains all the actions with corresponding shortcuts. Like in the previous version of the element, SMILES strings are presented in a table with their corresponding names and 2D conformation images.
- Manage SMILES
- Replace fragments
SMILES files (.smi) or text files (.txt) containing SMILES strings can be imported by clicking on Open from file drop-down menu located in the menu bar. The SMILES files must have the RDKit .smi format (image below) with the SMILES string in the first column and the molecule name in the second.
Adding, modifying and removing SMILES
It is possible to manually add SMILES strings using the Add line action from the Edit drop-down menu or using the Add button just below the SMILES table. In the same way, each line can be removed and the table can be cleared. The SMILES string can be easily modified by modifying the corresponding Code cell in the table.
Also, a name can be assigned to each SMILES string. If no name has been assigned, the name will be the same as its SMILES code.
Generating, opening, updating and saving 2D depictions
Using RDKit functions, 2D depictions of the SMILES strings are generated on the fly. When a SMILES string is invalid the corresponding line is highlighted and an error image is attributed instead of the 2D depiction. You can open a large view of the 2D depiction by either double-clicking on the 2D depiction or by right-clicking on the image and choosing the Open action in the context menu.
At any time the 2D depictions presented in the table can be hidden by clicking on the Hide 2D image action of the View drop-down menu or by clicking on the 2D header cell of the table. Note that even when 2D depictions are hidden it is possible to open them using the aforementioned ways.
You can create several open windows to compare several molecules or navigate through all of them in a unique window using the navigation buttons. For large molecules, it is possible to zoom in/out the images. Note that when the SMILES code of a molecule is modified the corresponding image is automatically updated (the same for the molecule’s name as well).
Finally, you can save the 2D depiction as a single PNG or SVG image either from the right-click context menu or from the large view window.
Grid images of 2D depictions
It is also possible to save the 2D depictions in a grid by clicking on the Save as grid image action of the File drop-down menu. You can also save only the selected 2D depictions by choosing the Save selection as grid image action. A popup window will appear where you will be able to set the parameters of the grid (number of columns, size of a single panel and whether the molecule’s name should be displayed or not).
Generating 3D structures
The main feature of this SAMSON module is to generate 3D structures from SMILES codes. After selecting molecules in the table, you can click on the Selected SMILES string to Document action in the Export drop-down menu. The 3D structures of the selected molecules with their corresponding names will be added to the active document.
The generation of the 3D structure can also be done for a single SMILES string either by clicking on the Generate 3D structure action in the right-click context menu or by clicking on the Generate 3D structure button in the large view window.
One interesting feature that I have improved in this new version is the substructure search. It is now possible to do a more advanced search using a given combination of patterns. For example, let us generate only the molecules that have a fluorine atom and a nitrogen atom separated in space. For that, in the Search bar enter F for fluorine then press on Select visible. This will select SMILES strings having a fluorine atom (two SMILES strings in the animation below). Then we clear the search bar. Note that our selection remains unchanged (in the animation below, two SMILES strings are still selected). Now we enter N for nitrogen and we can see in the result of our search that only one selected SMILES string is present (the other one does not have nitrogen). So we press on the Unselect hidden button in order to unselect the molecule that is not displayed and keep only the ones presenting the nitrogen atom.
This tabulation presents the main new feature in this new version of the RDKit-SMILES Manager module. It allows one to quickly generate thousands of different molecules by replacing a specified fragment of the molecule with several other fragments. This feature was recently added in RDKit to provide isosteric replacement and can be very useful to quickly generate a library of slightly different molecules to test their properties.
Note that each table is presented in the same way as the SMILES table in the Manage SMILES tabulation with all the same buttons. One difference is the saving of the table as a grid or as a file which is present as buttons at the right bottom of each table.
To achieve that you will need to :
- First import from a file (you can download initial molecule file used in this tutorial), add manually or import from SAMSON Document one or several initial molecules by going to the Open initial molecules action in the File drop-down menu.
- Enter the fragment that you want to be replaced in the initial molecules either by entering it manually or by importing it from the active SAMSON Document. In this tutorial, the target fragment to replace is [*:1]C(=O)NC[*:2] .
- Provide one or more “Replaced with fragments” (fragments with which you want to make a replacement) by importing from a file (you can download fragments file used in this tutorial) or by adding them manually.
- In the Run drop-down menu click on the Replace fragments.
- Results of the replacements are stored in the Results table.
Note that in RDKit the fragments must have the following syntax: [*:1]C(=O)NC[*:2] with [*:1] and [*:2] at the two extremities of the pattern.
The resulting molecules can then be converted into 3D structures, modified or saved in a file or a grid image.
That’s all for this tutorial about the RDKit-SMILES Manager module of SAMSON.
You may get this new SAMSON module here.
For remarks or suggestions, please let us know in the comments below or on the SAMSON forum.