protomo user’s guideversion 2.2
Table of Contents
protomo is a software package used in electron tomography for marker-free alignment and 3D reconstruction of tilt series. The marker-free alignment is based on cross-correlation methods and projection matching. It also includes the refinement of geometric parameters of the tilt series. 3D reconstruction is carried out by weighted back-projection with general weighting functions that allow varying tilt angle increments (Winkler and Taylor, 2006). The software was originally developed for thin sections of insect flight muscle and paracrystalline protein arrays (Taylor et al., 1997), but has since been successfully applied to various specimens in cryo-electron tomography. Version 2 now also supports the alignment of dual-axis tilt series Winkler and Taylor (2013).
TIFF library (www.libtiff.org) (mandatory)
Fourier transform libraries, optional but recommended. If no additional libraries are installed, the fall-back routines from FFTPACK (www.netlib.org/fftpack) are used which are built-in.
GTK+, the GIMP toolkit (www.gtk.org)
GtkGLExt, OpenGL Extension to GTK+ (gtkglext.sourceforge.net)
GNU plotutils (www.gnu.org/software/plotutils)
A postscript viewer
Python version 2.6 (www.python.org)
Note: If some of the above packages or libraries are not present at runtime (Fourier transforms, display related libraries, etc.), the corresponding functionality will be disabled in the Python tomography extension module. Tilt series alignment will not be affected.
Currently, protomo is available for Linux on 32-bit and 64-bit architectures (Intel i686, AMD64) only. The software can be installed anywhere in the file system, including a user’s home directory. The programs do not need superuser privileges to run, however “root” may be required to install the software in directories that are typically writable for root only, such as /usr/local. The software is unpacked by typing the following commands at the shell prompt:
where /path/to/ is the absolute path where the downloaded tar file was stored, and x is the release number. This will create a new subdirectory protomo-2.2.x in /usr/local. The bash shell script setup.sh must be adapted to the installation: the environment variable I3ROOT must point to the installation directory, in our case /usr/local/protomo-2.2.x, and the variable I3DEPLIB must point to the location of the third-party libraries (Fourier transforms, etc.), that are distributed separately. If these libraries are already provided by the system, it is recommended to make a symbolic link to the actual library in the I3DEPLIB directory. Similarly, the EMAN2DIR variable must point to the EMAN2 installation. Other variables should not be changed. The setup script must be “sourced” before the first protomo session or program invocation in the current shell, or alternatively, the script can be executed at login time by putting it in the shell startup files .profile or .bash_profile. To make sure that the correct versions of executables and libraries are loaded, the environment variables PATH and LD_LIBRARY_PATH should be examined. Library dependencies can be printed with the Linux system utility “ldd”. When setup.sh is sourced without any arguments, PATH, LD_LIBRARY_PATH and PYTHONPATH are reset. This is useful for debugging purposes. If previous settings of these variables are to be preserved, it should be called as follows:
We define a Cartesian coordinate system
which is fixed with respect to the specimen, a coordinate system
which is the microscope coordinate system, and finally, for each projection image, a coordinate system
is the number of images in the tilt series.
is defined by the pixel raster, and the origin
is usually located at the bottom left of the image. We assume that the two origins
are identical, and the two planes spanned by the vector pairs
(the projection plane) are parallel.
is the projection direction. The projection of
onto the image plane is denoted as
is implicitly defined by choosing one of the projection images as a reference image, usually the projection of the untilted specimen.
The tilt azimuth angle
measured anti-clockwise from the
axis, indicates the azimuthal direction of the tilt axis
. An optional elevation angle
can also be specified (not shown in Figure 1). This angle rotates the axis
out of the red plane. A non-zero elevation angle means that the tilt axis is not perpendicular to the electron beam direction. The angles
are the specimen tilt angles corresponding to the
image. The in-plane rotation angles
align the projected tilt axes
in each of the images to a common axis. The green plane corresponds to the grid supporting the specimen, which in its untilted state coincides with the red plane. The rotation
from the green plane to the yellow plane is the specimen orientation relative to the grid (see Figure 1). It is specified by three Euler angles:
, a rotation about the
-axis, followed by a rotation
about the new
-axis, and finally a rotation
about the new
All geometric transformations are specified as transformations of coordinate axes. For example, the total specimen tilt is
is the rotation matrix of a rotation about the tilt axis
with the angle
, for the
projection. The matrix
is the transformation of the coordinate system
In the software, on the command line or in parameter files, the matrix is specified by listing its elements
in the order of increasing indices, starting with the first row, then the second row, etc.. Note, that the transformations also require an origin which can be defined anywhere in an image. By default it is the center of the image.
For dual-axis tilt series we define separate tilt azimuth angles
and elevation angles
for each tilt axis
. Also, we also assign a separate orientation
to the group of images that belong to tilt axis
. In addition, for data collection schemes that involve the reversal of the tilt stage motion, subgroups can be formed for each tilt axis, for example a group for images with tilt angles 0…+60° and a second group for the images with tilt angles 0…-60°. This takes into account that the orientation could be slightly different for the two groups due to hysteresis effects.
protomo supports the following file formats: CCP4, EM, FFF, IMAGIC, MRC, SPIDER, SUPRIM, and TIFF. It can detect and read these formats transparently. The automatic format detection is based on the file contents rather than file name suffixes. Map files are written as CCP4 files and diagnostic output files are written in the native FFF format Winkler (2007). While the default output file format can be changed to any of the supported formats, it is not recommended to do so, because not all of the image formats can store the necessary metadata required by the software. In particular, a coordinate system must be associated with each image. In the FFF format, the coordinates of the lower left corner are stored for this purpose, and cropping images with protomo utilities preserves this information.
The geometry information (see below, section “Geometry files”) and alignment information is stored in a binary database for each tilt series. These geometry metadata files have a file name suffix of “.i3t” and are generated the first time a tilt series is created by importing a text file. Alignment and geometry re-evaluation modify the contents of the databese to keep track of the geometric parameters during processing. For diagnostic purposes the data can be re-exported in text form at any stage of processing.
FFTPACK (www.netlib.org/fftpack), Fortran subroutines for complex and real sequences, developed by Paul Swarztrauber.
GSL FFT, (www.gnu.org/software/gsl), part of the GNU scientific library. This is a reimplementation of FFTPACK. In protomo, it performs slightly worse than the original FFTPACK version and is therefore not recommended.
FFTW (www.fftw.org), the “Fastest Fourier Transform in the West”.
djbfft (cr.yp.to/djbfft.html), an extremely fast FFT implementation that provides powers of two complex and real FFTs. In protomo, these routines are faster than FFTW.
Fourier transform modules are loaded during program startup. A fall-back sequence can be defined to select the optimal algorithm from the installed modules (default sequence is djbfft, FFTW, FFTPACK). During invocation of a Fourier transforms routine, the sequence is searched for the requested routine, and in case a particular transform type or transform size is missing or not available in an implementation of a particular algorithm, the next module in the defined sequence is selected. For instance, calls to a routine for a transform length that is not a power of two, would skip djbfft and use the FFTW algorithm in the default sequence.
The basic unit for real space images is the pixel. All parameters pertaining to real space images are specified with pixels as the unit. However, in Fourier space a pixel does not correspond directly to a fixed spatial frequency because the Discrete Fourier Transform introduces a dependency of spatial frequency on the number of samples in real space. To make the parameter specification for Fourier space images independent of the corresponding real space image size, the unit is defined as one reciprocal real space pixel for parameters pertaining to Fourier space images. With this definition, the coordinate range of Fourier space images is always -0.5…+0.5. Note that, if in the following “pixels” is specified as a unit, it always refers to real space pixels.
Programs that accept command line parameters are invoked at the shell prompt like other Unix commands by specifying the name of the executable first, followed by options and then file name(s). Options are prefixed by a “-” sign and can have zero or more parameters, which are all separated by spaces. To indicate the end of the option part on the command line, an optional double “–” is used before the first file name. This avoids problems with parsing the command line if a file names start with a number.
Parameter files are simple text files with a free format. Keywords and numbers are separated by “white space”, i. e. any number of consecutive, non-printing characters such as spaces, tabs, new line, etc. Parameters for a particular application are specified in sections which consist of a section name followed by the section body enclosed in braces. For instance
is a section that declares a parameter “sampling”. A section can contain nested subsections, and parameters with the same name declared within different subsections represent distinct values. Parameters are grouped in subsections according to various processing functions. The protomo software uses parameters defined in the section named “tiltseries” only and ignores all other sections. When read, the entire parameter file is parsed though, and all section and parameter specifications must therefore be syntactically correct. Comments can be included as follows and they can be nested and span multiple lines:
Parameters are declared with a parameter name, followed by a colon, followed by the parameter value. The parameter values can be of type boolean, integer, floating point and string. Boolean values are designated by the keywords true or false, strings are enclosed in double quotes. To define a parameter that consists of two or more values, the comma separated items are enclosed in braces. Variables can be defined with the text sequence “variable name”, “equal sign”, “value” and simple arithmetic operations can be performed: addition (+), subtraction (-), multiplication (*), and division (/).
In the following example, a window is defined and reduced in size according to the specified sampling factor, so that it covers the same area in the original image when the sampling factor is changed:
If parameters need to be specified elsewhere on a command line, the corresponding section names in the parameter file are concatenated and separated by dots. The top level name (tiltseries) can be omitted in protomo applications. The name of the size parameter in the above example would therefore be specified as “window.size”.
Like parameter files, the geometry files are text files using a free format, i. e. keywords, identifiers and numbers can be separated by any amount of white space. All definitions are enclosed by keywords as follows:
where identifier is an alphanumeric text string that defines a name for the tilt series. definitions is a list of subsections. Each subsection can be either a tilt axis definition, a specimen orientation definition, a reference definition, or an image parameter definition. The following notation applies: keywords are specified in upper case, words in italic are substituted by the pertaining parameter, and text in angle brackets is optional. Parameters that are not explicitly specified are assumed to be zero.
Angles are specified in degrees. number is a non-negative integer number. The AXIS subsection defines the two angles
, and the ORIENTATION subsection the rotation
. The REFERENCE IMAGE specification defines the image used as coordinate reference. If omitted, the image with a tilt angle closest to 0° is automatically selected. These subsections are followed by multiple IMAGE subsections defined below.
name is a file identifier. The actual file name is constructed as explained in the next section by prepending a file path and appending a suffix. The first character of a file identifier must be a letter and cannot be a number. Image files are either separate 2D images or image stacks stored in a “pseudo-3D” image. The first variant of the file identifier definition assumes a 2D image, the second one an image stack, for which index refers to the section number within the stack (the pseudo z-coordinate). x and y are the coordinates of point
in the image, where the index i corresponds to number defined with the IMAGE keyword. The angles theta and alpha are denoted as
in Figure 1. factor is an isotropic scale factor that modifies the sampling of the image. If unspecified, it is assumed to be 1.
For dual-axis tilt series, the AXIS subsection is repeated and the images and parameters listed after the second axis definition belong to the second tilt axis. Similarly, a repeated ORIENTATION subsection within an AXIS subsection defines a separate image group with its own set of parameters.
These parameters control the input and output of image data, diagnostic terminal output, and the sampling of the input images (Table 1). They are specified within a top level section called tiltseries. When a tilt series is processed, the raw images are first located by file name. File names are constructed from the identifiers specified in the geometry file by appending a suffix defined in the parameter file. If a path list was also defined, the files are searched in the directories given in that list. The images are first preprocessed if requested (see below), and stored in a cache file. If the parameter cachedir is defined, the cache file will be created in that directory, otherwise the current working directory is used. If binning is requested and the sampling factor is greater than or equal to two, the raw or preprocessed images are binned and stored in an additional cache file. The binning factor is the sampling factor truncated to an integer value. All cache files can be deleted anytime and will be automatically regenerated when needed. By default, other output files are created in the directory defined by the parameter outdir or the current working directory if the parameter is undefined. The selection and exclusion parameters have the effect of ignoring images in the tilt series without having to remove them explicitly. The format of a selection specification is a comma-separated list of image numbers or image number ranges. The parameter is a character string and must therefore be enclosed in double quotes. By default, all images are being selected. If both selection and exclusion parameters are present, selection takes place before exclusion.
The preprocessing is carried out in one or two passes. In each pass, image statistics are first computed from the whole image or a region thereof, which is reduced by a border of fixed width. From the selected region, a linear density gradient can optionally be subtracted and/or densities can be thresholded below and above a specified multiple of the density standard deviation, or alternatively, below and above absolute density values. Additionally, a median or Gaussian filter can be applied. All preprocessing parameters are specified in a section called preprocess (Table 2). If a subsection mask is present within the preprocess section, then the first pass generates a binary mask that indicates which pixels are to be set to a locally computed mean value in the second pass. Otherwise, if the subsection mask is absent, only one pass is carried out with the parameters specified in the preprocess section. The parameter grow applies to the binary image only, and it increases contiguous areas of selected pixels at the perimeter by the number of pixels specified. Preprocessing can be turned off without removing the preprocess section from the parameter file (see parameter preprocessing under general parameters).
When a tilt series is aligned, three sections are relevant for specifying parameters. These are window, reference, and align. Windows are extracted from the source images according to the parameters specified in the window section (Table 3). The source image can be a raw, a preprocessed, or a binned image, depending on the preprocessing and resampling parameters. The resampling is carried out with linear interpolation and uses the stored geometric parameters. The extracted, resulting window size after resampling is the size specified with the size parameter in the window section. If binning was selected, the window is extracted from the cached, binned data, otherwise it is extracted from the preprocessed or raw data. In the former case, the sampling factor used in the linear interpolation is automatically adjusted to take into account the binning factor, so that the overall sampling factor remains as specified by the sampling parameter. If the resampled area does not lie completely within the source image, an error is generated and the alignment is terminated. This can occur when the region of interest lies close to the edge of a raw image, or when it shifts by large amounts due to poor tracking during data collection. The error condition can be relaxed by specifying a minimal fraction of the extracted area that must lie within the source image. The pertaining parameter is called area, with a default value of 0.95. Extracted areas not covered by the source image are filled with zeroes.
The alignment reference construction is controlled by the sections window and reference. The size of the area that will be used for alignment depends on the windowing parameters; the reference construction algorithm depends on the parameters in the reference section (Table 5). The presence of the parameter body indicates that the reference is to be calculated by reprojecting a preliminary back-projection map, based on already aligned images in the tilt series. New images that have been aligned but not yet been merged into the reference constructed in previous cycles are extracted as described above, then masked and merged into the reference at this point. The parameters that apply to this step are listed in table 4. Rectangular and ellipsoidal masks should be apodized to avoid ripple effects when Fourier transforms are involved. Once the reference has been updated with the new images, a reprojection is computed with the geometric parameters of the image that will be aligned.
If no back-projection options are present, the nearest neighbor image is selected as a reference. The selection and exclusion parameters within the reference section define which images in the tilt series contribute to the reference construction. The format is the same as explained in “General options”. The image produced by the selected algorithm is subsequently filtered. Low-pass and high-pass filters are specified with the same parameter names as the real space masks, but only the diameter and apodization parameter make sense in this context. Units for Fourier space filter limits are reciprocal pixels. Note ,that the filter limit parameters are the lengths of the principal axes (“diameter”) of an elliptical or ellipsoidal region, not the spatial frequency of the cutoff, i. e. a value of 1 would filter at the Nyquist frequency (the Nyquist frequency itself has a value of 0.5 under our definition). If the image of such a transform is displayed, the filter would appear as an ellipse or circle inscribed in a rectangle or square, which represents the image.
Several cross-correlation methods are available: the conventional cross-correlation “xcf”, the mutual correlation “mcf”, phase only correlation “pcf”, and phase doubled correlation “dbl”. For diagnostic purposes, an image stack with the cross-correlation functions can be written to a file. The size parameter specifies the image window that is extracted from the cross-correlation function and written to the file. It does not affect the cross-correlation calculation nor the peak search described below, which are always based on the image window size defined in the window section.
By default, the whole cross-correlation function is searched for a maximum. The radius parameter restricts the search for the highest correlation peak to a smaller ellipsoidal or circular region. The two required values define the principal axes of this region. If cmdiameter is also present, a refined value (i. e. the center of mass) of the peak position and the peak height is calculated within a region centered at the maximum that was obtained by the preceding peak search. This region for the center of mass calculation has the dimensions defined by the cmdiameter values.
A coarse alignment can be carried out interactively with the graphical tool described in section “Standalone graphical tools”. It is primarily used to correct misaligned images or do an initial alignment of a raw tilt series when large shifts occurred during data collection. Individual images can be aligned manually, or a whole series can be aligned automatically with this tool. The automated alignment function performs a simple, sequential cross-correlation between neighboring image pairs. Differences in the foreshortening of the images due to the different tilt angles are accounted for in the cross-correlation computation. Area matching is not implemented in the graphical tool. For area matching, the Python module (see section “Python classes for tomography”) has to be used.
Mask parameters for aligned image, see table 4
See table 6.
See table 7.
Area matching operates on whole tilt series and starts with the image defined as the geometry reference in the geometry parameter file. This image is the first alignment reference in the iterative procedure. The procedure aligns the neighboring images at the next lower and higher tilt angle to this reference. The images that are being aligned are extracted and resampled with the stored geometry parameters, or modified parameters if certain alignment parameters are set (cf. Table 8). The mask applied to these images is also specified in the align section and can be different from the mask applied to the images involved in the reference construction. After successful completion of the alignment, the two aligned images are merged with the previous reference images, generating an updated preliminary map. From the new map, projections are computed to align the next two neighbors. This process is repeated until all images are aligned, or one of the termination criteria listed in table 8 is met. Early termination of the alignment is useful to reduce the computational effort in the initial cycles, until a sufficiently accurate geometry has been determined.
Dual-axis or multiple-axis tilt series can be aligned in two ways, either separately or simultaneously. For the first variant, the images are grouped by tilt axis first and the groups are aligned and merged one after another. For the second variant, for which the parameter startangle must be set, all images are aligned and merged in the order of increasing tilt angle magnitude, alternating between the axis groups. If startangle is greater than zero, the images up to startangle are aligned according to the first variant, at which point the alignment mode switches to the second variant.
are the input transformations, and
the corrections applied. To interpret the corrections, we compute the singular value decomposition:
The singular values of the diagonal matrix
indicate a stretch/compression correction in two orthogonal directions with an orientation angle given by the orthogonal matrix
indicates a correction of the in-plane rotation. A text file with correction factors and angles can be written to disk for inspection or plotting.
The geometry refinement tries to reduce matrix
to a unit matrix in the equation
by varying the alignment transformations
. The matrices
were obtained by area matching, and are constant here. After refinement,
define the new transformations.
3D reconstruction is carried out by weighted back-projection with general weighting functions. The back-projection parameters have the same meaning as in the reference construction section (cf. Table 5). The size of the map is set with the parameter in the map section, and is independent of the window size used during alignment. Additionally, a low-pass filter can be applied to the images for the map calculation to filter out the signal beyond the resolution limit.
See table 5
See table 4.
Tilt series alignment is carried out at the Python command level in SPARX. SPARX is a Python and EMAN2 based environment for image processing in electron microscopy. The protomo software package provides a Python extension module that integrates with SPARX/EMAN2 and provides the functionality needed for tilt series alignment and 3D reconstruction.
The protomo Python extension module is loaded with the following command at the Python or SPARX command prompt:
It makes four classes available for the current session: protomo.param for defining processing parameters (Table 11), protomo.geom for geometric parameters (Table 12), protomo.series for managing the alignment process (Table 14), and protomo.image for image manipulation and image input/output (Table 13). protomo.param and protomo.geom parse and convert text files containing parameters to an internal representation that is used by the other objects. During creation of a series object instance, a copy of the internal data is generated and associated with the new instance, so that modifications of individual parameters will not affect the original object instances. Since all object instances are stored in memory, care should be taken to delete unused instances, especially large images, to avoid memory allocation failures.
A typical alignment session first creates a parameter and geometry object instance from which a series object instance is then created that keeps track of the alignment parameters and the alignment status. The geometry and parameter object instances are stored in memory and are not persistent between sessions. The series object instance however, is backed by the geometry metadata file which keeps track of the alignment status, so that a session can be restarted. If the metadata file exists at startup, the geometry object is no longer needed to create the series object instance, since all the relevant data is stored in the metadata file.
The actual alignment is carried out with the align method. If interrupted, it can be restarted and it will resume with the first unaligned images. After the alignment is completed, the geometry is re-evalutated. In the first few cycles it is not necessary to align all images in the tilt series to compute the new geometry. This can save time if the initial estimate is not very accurate. The geometry is re-evaluated with the fit method. A minimal number of a few aligned images is required as input. If images are explicitly excluded from the alignment with the selection/exclusion parameters, a subsequent re-evaluation is based only on the newly aligned images, which could result in undesirable changes of the geometric parameters if those images show poor alignment. The re-evaluation can be carried out multiple times with different parameters. Previous results are overwritten with each invocation, because the new, re-evaluated geometry is stored in a temporary location first, and will be lost if the session terminates. The new geometry is only saved permanently for use in the next alignment cycle after the update method has been called.
The program “i3display” is a command line front end equivalent to the display method of an image object, and is invoked at the shell prompt as follows:
It displays the image with the file name image, and if the option -r is given, thresholds the densities below min and above max. If the option is absent, it first scans the image to find the minimal and maximal densities and uses these to scale the densities for displaying the image. It recognizes the keys/buttons listed in table 15. Stacks of 3D images cannot be displayed directly. Use the program “i3montage” to create a montage first, and display the montage.
The program “tomoalign-gui” is used for manual alignment of tilt series, or for displaying and saving animated sequences of tilt series:
If the -log option is present, diagnostic information is printed to the terminal. Option -zoom sets the initial zoom factor, and -r the density scaling as in the program “i3display”. If the specified tilt series is a new series and the geometry metadata file does not exist yet, the option -tlt is mandatory to specify the geometry parameter file. Key/button actions to display and manipulate the images are listed in table 16.
On startup, or when the overlay mode is selected from the menu, the program displays two images, a reference image in red, and the image to be aligned in green. The superposition of the two colors results in a more or less grey tone image if the two superimposed images are in register. To align manually, the green image can be dragged to the aligned position while holding down the left mouse button. The automatic alignment function, selected from the menu, performs a cross-correlation alignment. The differences in foreshortening of the images is compensated, but the reference construction scheme used in area matching is not applied, the two images are simply cross-correlated and the displacement calculated according to the correlation maximum.
The program tomoinit can be used to create the geometry metadata file (“.i3t”) from the geometry parameter file (“.tlt”):
geom is a geometry parameter file, and param a processing parameter file. The parameter files can be parsed to check for correctness with the programs “tomotilt”, or “tomoparam” respectively.
Taylor et al. 1997Taylor, K. A. and Tang, J. and Cheng, Y. and Winkler, H., “The use of electron tomography for structural analysis of disordered protein arrays”, J. Struct. Biol. (1997), 372-386.
Winkler 2007Winkler, Hanspeter, “3D reconstruction and processing of volumetric data in cryo-electron tomography”, J. Struct. Biol. (2007), 126-137.
Winkler and Taylor 2006Winkler, Hanspeter and Taylor, Kenneth A., “Accurate marker-free alignment with simultaneous geometry determination and reconstruction of tilt series in electron tomography.”, Ultramicroscopy (2006), 240-254.