Protein Preparation & Validation#
Before you run docking, simulation, or any other molecular-design workflow, make sure your protein model is valid. SAMSON gives you fast, reliable tools to prepare, validate, and fix proteins so you can focus on science, not file wrangling.
Why Preparation Matters#
A well-prepared and validated structure avoids bugs and boosts the accuracy of downstream protein-modeling tasks such as molecular dynamics, drug-design screening, and binding-energy calculations.
- Remove alternate locations and molecules unnecessary for downstream tasks, e.g. solvent, co-factors, etc.
- Add missing atoms - both heavy atoms and hydrogens.
- Check geometry - avoid simulation crashes.
One-Click Protein Preparation#
Use Home > Prepare to prepare your structure in a single step:
- Remove alternate locations (keeps the highest-occupancy atoms). This will remove lower-occupancy atoms, or if atoms have the same occupancy or no occupancy specified then it will leave atoms with the alternate location
A
, if present, else withB
, and so on. - Delete ligands you don't need, including covalently attached ligands, co-factors and other small molecules.
- Strip water molecules.
- Clear monatomic ions.
- Add hydrogens by residue type (for standard residues) or valence.
More Control?
Each option above can be run manually via Home > Validate or the Select menu. See details below in Manual Checks and Fixes.
Batch preparation#
Processing dozens or hundreds of PDB codes? Try the Batch Protein Prepare extension. It automatically downloads structures and applies the same cleaning steps you see in Home > Prepare.
Manual Checks and Fixes#
Remove alternate locations#
- Open Home > Validate and go to Alt. locations tab.
- Click Find alternate locations and review the list.
- Press Remove alt. locations to keep only the highest-occupancy atoms.
Tip
You can also perform other checks for your system if necessary, e.g. check bond lengths, non-standard residues, and clashes. See details below in Validate a protein system.
Strip Ligands, Water, or Ions#
- Select > Biology > Ligands (or Water / Ions).
- In the pop-up context toolbar click on delete (
) or in Document view, click Current selection > Erase selection.
Add hydrogens#
- Edit > Add hydrogens adds standard hydrogens at pH 7 based on amino and nucleic acid types for standard residues and based on valences for the rest.
- Select a sub-structure first to limit the scope.
Protonation States
Need pH-specific hydrogens? Use PDBFixer (see below) to protonate at any pH.
Minimizing after adding hydrogens
For some downstream tasks you might need to minimize the system after adding hydrogens. But you don't necessarily always need to minimize the structure after adding hydrogens - for example, AutoDock Vina needs only polar hydrogens for determining hydrogen bonds.
Validate a protein system#
The Structure Validation module (Home > Validate) lets you inspect and repair specific issues:
- Alternate locations - find & remove lower-occupancy atoms.
- Bond lengths - find bonds outside expected ranges.
- Non-standard residues - detect and relabel aliases. Note that when loading e.g. a PDB file, SAMSON tries to process the known aliases for non-standard residues and associates the proper residue types while keeping the non-standard residue names intact.
- Clashes and contacts - highlight steric clashes. If you have clashes between side chains and want to fix them, you can use the Rotamers editor to change the side chains based on the backbone-dependent rotamer library1.
-
Tools to keep atom and node lists neat for simulations:
- Merge nodes.
- Reorder atoms in a connected component.
- Renumber atom serial numbers.
Fixing Deeper Problems with PDBFixer#
When a protein structure is missing residues or atoms, use the PDBFixer extension (powered by the PDBFixer Python package2,3).
It can fix problems in proteins loaded in SAMSON, in PDB and mmCIF/PDBx files, and do it for a batch of files in a folder. It can do the following:
- Remove water and/or heterogens (ligands, co-factors, ions).
- Add missing residues based on SEQRES records (build missing loops).
- Convert non-standard residues to their standard equivalents.
- Add missing heavy atoms and side chains.
- Add hydrogens for specified pH.
- Resolve alternate locations automatically leaving only atoms with higher occupancy.
- Build explicit-solvent water boxes neutralized with specified ions.
- Build lipid membranes (from a list of available lipids) around membraine proteins with water and neutralized with specified ions.
References#
-
Shapovalov, M.S., and Dunbrack, R.L., Jr. (2011). A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure, 19, 844-858. https://doi.org/10.1016/j.str.2011.03.019 ↩
-
PDBFixer Python package. https://github.com/openmm/pdbfixer ↩
-
Eastman P. et al. (2017). OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLOS Computational Biology, 13(7), e1005659. https://doi.org/10.1371/journal.pcbi.1005659 ↩