.. Written by Konrad Hinsen
.. License: CC-BY 3.0
.. index::
single: PDB
Mosaic PDB convention
#####################
Mosaic can be used to store molecular models from the
`Protein Data Bank `_ (PDB). The main
application is to use such models as the starting point for
molecular simulations. The following conventions describe
how a PDB structure is stored in terms of Mosaic data items.
Note that only the structure itself can be stored, but not
experimental data (structure factors etc.) or metadata
describing the experiment or the refinement process.
The PDB's official data format is called
`PDBx/mmCIF `_. In the conversion from
PDBx/mmCIF to Mosaic, as much information as possible is
transposed without modification. In particular, residue and atom
names are the same.
.. index::
single: crystallographic structures
Crystallographic structures
---------------------------
A crystallographic structure is represented by two required
data items:
- A universe defining the molecular structures and, in the
case of crystals, the symmetries. The atoms in the universe
have multiple sites if the PDB structure contains alternate
locations.
- A configuration providing the positions for all sites and
the shape of the unit cell in the case of crystals.
Additional information from the PDB entry can be provided by
optional data items:
- The occupancy of each site can be provided as a
:ref:`property` of
:ref:`type` "site" or "template_site"
with an empty :ref:`units` string.
Each value is a scalar of type "float32" or "float64" in
the interval [0..1]. If no occupancy values are provided,
the occupancy of all sites is assumed to be 1.
- An anisotropic displacement parameter for each site can be provided
as a :ref:`property` of
:ref:`type` "site" or "template_site". A
valid :ref:`units` string must be provided,
the preferred units are "nm2". Each value is an array of shape "6"
and of type "float32" or "float64", the order of the elements is
[1][1], [2][2], [3][3], [2][3], [1][3], [1][2]. For the precise
definition of the anisotropic displacement parameters, see the PDB
documentation for items ``_atom_site.aniso_U[1][1]`` to
``_atom_site.aniso_U[3][3]``.
- An isotropic displacement parameter for each site can be provided
as a :ref:`property` of
:ref:`type` "site" or "template_site". A
valid :ref:`units` string must be provided,
the preferred units are "nm2". Each value is a scalar of type
"float32" or "float64". An isotropic displacement parameter of
value ``x`` is equivalent to an anisotropic displacement parameter
of value ``[x x x 0 0 0]``.
If anisotropic displacement parameters are provided, then no isotropic
displacement parameters may be given, in order to prevent
incoherencies in the data.
Heterogeneous sequences
~~~~~~~~~~~~~~~~~~~~~~~
In a heterogeneous sequence, a specific position can be taken by
different residues in different copies of the molecule. In a PDB
entry, heteregeneous sequences are marked as such, and contain
multiple residues with the same residue number, but of a different
chemical component. All the atoms in these multiple residues have
occupancies smaller than 1.
In Mosaic, heterogeneous sequences are represented by a single polymer
fragment. The fragment at a heterogeneous position has no atoms and as
many subfragments as there are residue variants at this position. Each
subfragment is one of the residue variants. There are no bonds between
atoms of different residue variants, but each of them can have bonds
to the neighboring residue(s).
.. index::
single: NMR structures
NMR structures
--------------
An NMR structure is represented by the following data items:
- A universe defining the molecular structures.
- One configuration per model contained in the PDB entry.