• Welcome to the e-NMR Homepage >>


Gromacs on the GRID

Tutorial for performing MD simulations in a GRID environment

Introduction

The aim of this tutorial is to introduce the user to performing molecular dynamics simulations in a GRID environment. It is not a tutorial on molecular dynamics. For those interested in learning that technique, a number of tutorials is available online, e.g. at http://nmr.chem.uu.nl/~tsjerk/course/molmod/.

The emphasis in this tutorial lies on the interaction with the GRID resources and how to use these for performing protocollized molecular simulations. For this tutorial it is necessary to have access to the GRID and be logged in at an user interface (UI). GRID access can only be provided to those who posess a valid certificate. To be able to follow the tutorial it is also necessary to have registered the certificate with the eNMR (enmr.eu) virtual organisation (VO).

Getting started

After successfully logging in to a UI, first make a directory in which to put the files for the tutorial. A number of files are necessary to run the tutorial, which are listed below. Download each of these files and save them in the directory you just made.

LeuE.pdb
This is a structure file in PDB format containing the endorphic neuropeptides Leu-Enkephalin (LeuE.pdb).
mdsetup.sh
This is a protocol script for setting up and running MD simulations using Gromacs.

Molecular Dynamics Simulations

Molecular dynamics (MD) simulations are useful to gain insight into properties of molecules and to test hypotheses arising from experimental investigation, as well as to provide suggestions for new experiments. The relatively low cost of the technique, requiring none other than computers, has contributed to the establishment of MD as a standard tool within structural biology to complement experimental results.
Molecular dynamics simulations are based on the assumption that the behaviour of molecules can often be explained by classical mechanics of atomic particles. The classical mechanics follows from Newton's equations of motion which are integrated with very small time steps for each particle in the system for many steps. For this, it is necessary to know the positions, velocities and forces of and acting on to particles. The latter are determined from the positions of the particles and the interactions, which are laid down in the force field.
It follows that to start an MD simulation it is necessary to have a description of the system to be studied in terms of the positions of the particles and a description of the interactions. For this part of the tutorial we use LeuE.pdb, that contains the list of atoms and their positions for Leu-enkephalin.
The MD process is protocollized to facilitate automated processing, as is required on the GRID. This protocol is laid down in the script mdsetup.sh. Running mdsetup.sh -h will provide information on the available options. To understand the protocol have a look at the workflow depicting the different steps in the process. In short, the coordinate file is used to construct a topology by looking up the building blocks, the amino acids, in a library containing the interactions. Then the system is energy minimized to remove strain, solvated and energy minimized again. Subsequently, a number of short MD simulations are performed to equilibrate the system, and finally the production simulation can be started.
All this will be performed on the GRID, using the single coordinate file LeuE.pdb as the input. The procedure used here will work for most standard systems, containing protein and/or DNA, although it does not allow processing of systems with non-standard residues or ligands.

The gLite GRID interface

Writing a job file using JDL

For this tutorial we use the gLite middleware to submit and manage the jobs. The first step is preparing the job using the gLite Job Description Language (JDL). The JDL language allows specifying many options for control of the jobs to be run on GRID resources. Among the most important are the ones relating to the executable to be run, the input and the output.
The executable we will use is the script mdsetup.sh that forms a simplified interface to the Gromacs package. It allows controlling a number of parameters such as the force field to be used and the length of the simulation. Here a short simulation (0.1 ns) will be run using the GROMOS96 45a3 force field (G45a3). Electrostatic interactions will be treated using a cut-off with reaction-field correction (RF). On the command line this would be done using:

mdsetup.sh -f LeuE.pdb -ff G45a3 -elec RF -time 0.1

This will typically create many files (+/- 70). To reduce the number of files to be transferred, it is possible to create an archive of the output files, using the option "-archive":

mdsetup.sh -f LeuE.pdb -ff G45a3 -elec RF -time 0.1 -archive mdresults.tgz

In the job file this is specified as follows:

Executable = "mdsetup.sh";
Arguments = "-f LeuE.pdb -ff G45a3 -elec RF -time 0.1 -archive mdresults.tgz";

With this, the input is also already defined, namely the script to be run and the file LeuE.pdb that is used by mdsetup.sh as input. These files have to be transferred to the Working Node (WN), which is specified through the InputSandBox:

InputSandBox = {"mdsetup.sh", "LeuE.pdb"};

Likewise, the archive mdresults.tgz that will contain the results from the run has to be transferred back from the WN upon completion:

OutputSandBox = "mdresults.tgz";

With this, most of the job file is defined. However, since Gromacs is not installed on all sites available, we have to make sure that the job ends up on one of the sites that does have it installed. In other words, the job most be sent to a site which carries a tag indicating Gromacs is installed. The relevant tag is "VO-enmr.eu-GROMACS3.3.3". This can be specified in the job file through the Requirements:

Requirements = Member("VO-enmr.eu-GROMACS3.3.3",other.GlueHostApplicationSoftwareRunTimeEnvironment);

Accordingly, the JDL file to submit a simulation job with Gromacs is given by:

Executable = "mdsetup.sh";
Arguments = "-f LeuE.pdb -ff G45a3 -elec RF -time 0.1 -archive mdresults.tgz";
InputSandBox = {"mdsetup.sh", "LeuE.pdb"};
OutputSandBox = "mdresults.tgz";
Requirements = Member("VO-enmr.eu-GROMACS3.3.3",other.GlueHostApplicationSoftwareRunTimeEnvironment);

Submitting a job using the gLite middleware

The job file can be downloaded here. Note that this is a minimal job file, which means that a lot of options are set to default values. After making sure that the directory now contains the files mdsetup.sh, LeuE.pdb and gmx.jdl, the job can be submitted. For this, it is required to have a valid proxy. The proxy can be checked using

voms-proxy-info

This will show the status of your proxy if you have one, or display "Couldn't find a valid proxy.". If there is no valid proxy or if there is no time left on the current proxy, then start one, specifying enmr.eu as the virtual organisation.

voms-proxy-init -voms enmr.eu

Enter the password that corresponds to your certificate and afterwards check the proxy again using 'voms-proxy-info'. If the proxy is established the job can be submitted to the GRID, using 'glite-wms-job-submit'. This command takes the job file as input. It has several options for more advanced control. Upon submission the job will get an ID, that can be stored to a file with the option '-o'. To see a list of all options available, run the command with the option '--help' (glite-wms-job-submit --help). Now submit the job using:

glite-wms-job-submit -a -o gmx.id gmx.jdl

The job is now sent to the Resource Broker (RB) that tries to find the best matching CE. This choice depends on the requirements of the job, and possibly on criteria that can be specified in the job file regarding the ranking of sites. It is possible to list the CEs that match for a certain job using the command 'glite-wms-job-list-match':

glite-wms-job-list-match -a gmx.jdl

To see the CE to which the job has been sent by the RB, check the status of the job. This can be done using the command 'glite-job-status':

glite-job-status -i gmx.id

This gives a brief overview of the status of the job. For instance, the Destination gives the CE and the queue to which the job has been sent. But it also shows the submission time and the current status. Probably, the current status is set to 'Scheduled', indicating that the job is in the queue of the target site, or 'Running', which means that the job is already being executed. When the run is finished, the status will be set to 'Done'.
To get more elaborate information about the job, the verbosity level of glite-job-status can be increased. This is controlled through the option -v. By default the verbosity level is set to 1, but values from 0 to 3 are possible. A full account of the job status can be obtained by:

glite-job-status -v 3 -i gmx.id

The output now also contains detailed information about the CE the job was sent to as well as a description of the job as it was sent to the CE. The latter can be found under the line starting with '- Jdl'. Note how the job has been expanded from only 5 lines to about 25, by explicitly listing default settings.

If the job is still reported 'Scheduled' or 'Running', it's a good time to get a coffee. The run should take about 15 minutes in total. If the status is reported 'Done' then the output files can be retrieved using:

glite-wms-job-output -i gmx.id

This will transfer the output file 'mdresults.tgz', as specified through the OutputSandBox, to the UI. Unpack the file to have a look at the results:

tar xvzf mdresults.tgz

As was mentioned before the run has generated many files if processed succesfully. The 'production' MD results are tagged with 'fullmd':

ls *fullmd*

All other files were generated during setting up and equilibration of the system and are mainly for checking the process and debugging. The production MD results can be further analyzed using the methods described in the MD tutorial.

That's it! Note that this protocol will probably work for most protein structures, provided that there are no missing atoms or missing residues and no non-standard groups. Be sure to set the simulation time to something sensible, as 0.25 ns is far too short for most purposes to see effects.