Typical workflow

mutpath installs a number of scripts that perform the main operations of the package. A mutational path can be built by using these scripts along with BEAST. By default, the mutpath scripts are installed so that they can be run directory from the command line, taking a user-specified and created input text file.

The scripts themselves are documented individually in the following sections. This section gives an overall workflow of how these scripts can be used in conjunction with BEAST.

Create the BEAST XML file

You may have your own strategy to create the BEAST input XML file. If not, mutpath has a script to do this.

Specifically, you can use mutpath_parse_to_beastxml.py to create an input BEAST XML file from the FASTA sequence files. You can do this by manually creating the text input file and then running:

mutpath_parse_to_beastxml.py infile.txt

where infile.txt is a text file that you have created that specifies the input variables. Note that this script will create files that provide information about sequences that are outliers in terms of their divergence. If you think that these might be mis-annotated sequences that should not be part of the tree, specify them for exclusion using the options provided in infile.txt and then run the script again. The final output of interest will be a BEAST XML file.

Run BEAST

You will want to run BEAST on the XML file that you have created. Note that you will need to use BEAGLE. Below is an example command if you have installed both BEAST and BEAGLE locally, and have an XML file named infile.xml:

java -Xmx2024m -Xms2024m -Djava.library.path=/home/jbloom/BEAGLE_libs/lib -cp ~/BEAST/build/dist/beast.jar dr.app.beast.BeastMain -beagle infile.xml

This will create a .trees and .log file with the prefix specified in infile.xml. You may want to run several replicates of BEAST in different directories to create more MCMC data to analyze. Use Tracer to examine the .log files to see if you run enough steps of MCMC to give good effective sample sizes.

Compact the .trees file

Running BEAST using the Markov Jumps mutation mapping will create a very large output .trees file. It may be helpful (although it is optional) to compact this file by removing unnecessary information. You can do this using mutpath_compact_trees.py by:

mutpath_compact_trees.py treefile.trees

This will create a more compact file called treefile_compact.trees which you can use for all subsequent commands. You can delete the larger file using:

rm treefile.trees

This command is not necessary, but will greatly reduce (typically by around 20-fold) the size of the .trees files.

Get the mutational paths

After running BEAST and possibly compacing the mutational paths with mutpath_compact_trees.py, you will have a large collection of trees in one or more .trees files. Use the script mutpath_get_paths.py to get the mutational paths from these trees. This script also merges all of the trees (without the mutation annotations) into a single .trees file that you can analyze with TreeAnnotator to make a maximum-clade credibility tree. The output of this script is a .txt file that contains the mutational paths between the specified starting and ending sequence. To do this, you manually create the text input file and then run:

mutpath_get_paths.py infile.txt

The .txt file containing the mutational paths can then be analyzed by later scripts.

Making and annotating the maximum clade credibility tree

After running mutpath_get_paths.py, all of the various .trees files will have had their non-burnin trees combined into a single merged .trees file. Let’s say we called that file merged_trees.trees. You can then use TreeAnnotator to make a maximum clade credibility tree, which in turn can be visualized by FigTree. To do this, make sure that you have TreeAnnotator installed such that the treeannotator executable is accessible, and then run:

treeannotator merged_trees.trees maxcladecredibility.trees

to create the maximum clade credibility tree in maxcladecredibility.trees. This tree can then be visualized with FigTree. In order to format it for better viewing with FigTree, you can use the script mutpath_annotate_tree.py. Create an input file specifying how you want to label tip nodes in this tree, and then run:

mutpath_annotate_tree.py infile.txt

This will create an annotated .tree file with the name that you specify in the input file. You can visualize this tree (and if you want further annotate it) using FigTree.

Constructing the mutational trajectory

After running mutpath_get_paths.py to create the .txt file containing the mutational paths, you can then use mutpath_make_digraph.py to visualize and analyze the mutational trajectory. The mutational trajectory is the path from the starting sequence to the ending sequence through sequence space, and the probability of different paths in the trajectory is taken as proportional to the number of paths in the .txt mutational path file that take that portion of the trajectory. To create these paths, create an input file. Then run the script:

mutpath_make_digraph.py infile.txt

This will generate a DOT file containing the mutational trajectory as a directed graph through protein sequence space that can be visualized in GraphViz. You can also use GraphViz to save this trajectory in a PDF or other file format. In addition, this script will output information about the dates of mutations and the times for which nodes persisted.