mapmuts_preferencemeans.py

This is a simple script that computes the mean of the amino acid preferences or differential preferences inferred by several runs of mapmuts_inferpreferences.py or mapmuts_inferdifferentialpreferences.py.

Preferences

If using amino-acid preferences (preferencefiles argument in Input file), then we are working with the following data.

For each residue r in a protein, the script mapmuts_inferpreferences.py computes the preference of that site for each amino acid a, which is denoted as \(\pi_{r,a}\). These preferences sum to one, so \(1 = \sum_a \pi_{r,a}\). It is possible that you have run several replicates of mapmuts_inferpreferences.py, such as for several different replicates of an experiment. You then might want to compute the average preference for each amino acid taken over the replicates. This script does that, and writes the output to a file.

Specifically, let \(\pi_{r,a}^i\) denote the preference for amino acid a at site r computed for replicate i, where \(i = 1, \ldots N\) where N is the number of replicates. Then the mean preference of site r for amino acid a is \(\langle \pi_{r,a} \rangle = \frac{1}{N} \sum_i \pi_{r,a}^i\). Note that because the preferences sum to one for each individual library, it will also be the case that \(1 = \sum_a \langle \pi_{r,a} \rangle\).

Differential preferences

If using differential amino-acid preferences (differentialpreferencefiles argument in Input file), then we are working with the following data.

For each residue \(r\) in a protein, the script mapmuts_inferdifferentialpreferences.py computes the differential preference \(\Delta\pi_{r,a}\) of that site for amino acid \(a\). These differential preferences sum to zero, so \(0 = \sum_a \Delta\pi_{r,a}\). This script is designed for the case when you have performed several runs of mapmuts_inferdifferentialpreferences.py on different samples, and want to compute the average.

Specifically, then \(\Delta\pi_{r,a}^i\) denote the differential preferences for replicate \(i\), where \(i = 1, \ldots N\) where \(N\) is the number of replicates. Then the mean differential preference of site \(r\) for amino acid \(a\) is \(\langle \Delta\pi_{r,a} \rangle = \frac{1}{N} \sum_i \Delta\pi_{r,a}^i\). Note that because the differential preferences sum to zero for each individual replicate, it will also be the case that the average differential preferences sum to zero.

Dependencies

This script has no dependencies outside of mapmuts and the standard Python library.

Running the script

To run this script from the prompt, first create a text infile of the format described below. Then simply run the script with the single text file as the argument, as in:

mapmuts_preferencemeans.py infile.txt

Input file

The input file is a text file with a series of key / value pairs. The required keys are indicated below. The values should not include spaces.

Lines beginning with # and empty lines are ignored.

Keys for the input file:

  • preferencefiles : You should specify either this argument or differentialpreferencefiles, but not both. If using preferencefiles, this argument should be followed by a list of the *_equilibriumpreferences.txt files created by mapmuts_inferpreferences.py. You must specify at least two such files (if there were just one, you wouldn’t need to run this script). So this key should be followed by a list of two or more *_equilibriumpreferences.txt files.

    Note that it is optional whether a preference is specified for stop codons (indicated by a * character) in the last column. But all files specified here must be consistent – they either all must specify a preference for a stop codon, or all must not specify a preference for a stop codon.

  • differentialpreferencefiles : You should specify either this argument or preferencefiles, but not both. If using differentialpreferencefiles, this argument should be followed by a list of the differentialpreferences_selection_*.txt files created by mapmuts_inferdifferentialpreferences.py. You must specify at least two such files (if there were just one, you wouldn’t need to run this script).

    Note that it is optional whether a preference is specified for stop codons (indicated by a * character) in the last column. But all files specified here must be consistent – they either all must specify a preference for a stop codon, or all must not specify a preference for a stop codon.

  • outfile is the name of the created output file. This output file is in the same format as the input files in preferencefiles, but now contains the average preference.

  • includestop specifies whether we include stop codons (denoted by a * character) as possible amino acids in the generated outfile. If the input preferencefiles / differentialpreferencefiles do not specify stop codons as an amino acid, then no stop codons are ever included in outfile and this option is meaningless. But the input files do specify stop codon preferences, the includestop is meaningful. In this case:

    • If includestop is True then outfile also contains these stop codons as possible amino acids.
    • If includestop is False, then stop codons are not included as possible amino acids. In this case, the \(\pi_{r,a}\) values for the 20 non-stop amino acids are normalized so that they sum to one.

Example input file

Here is an example input file:

# Input file for mapmuts_preferencemeans.py
preferencefiles WT-1_equilibriumpreferences.txt WT-2_equilibriumpreferences.txt N334H-1_equilibriumpreferences.txt N334H-2_equilibriumpreferences.txt
outfile mean_equilibriumpreferences.txt
includestop False

Output

The script writes some brief output to standard output.

The file specified by outfile is created (or overwritten if it already exists). This file has the exact same format as the input files listed in preferencefiles or differentialpreferencfiles, except the indicated preferences / differential preferences are now average values. The entropies or RMS differential preferences (third columns of these files) are recomputed.

The entries for the wildtype amino acid (second column, WT_AA) show either the single wildtype amino acid (if identical in all files) or a comma delimited list of all wildtype amino acids in the different files (such as N,N,H,H if the preferencefiles have wildtype amino acids N, N, H, and H for four such files).

The outfile created by this script is valid input to mapmuts_siteprofileplots.py.