epitopefinder_selectsites.py

This script helps with analysis of output from epitopefinder_getepitopes.py. Specifically, it allows you to specify a subset of sites in the protein. It then generates a file that lists the number of epitopes for each of the selected sites, sorted in order from most to least epitopes.

Running the script

epitopefinder_selectsites.py takes as input the name of a single file, the format of which is detailed below. If you have installed the package so that the scripts are the search path, you can run this script directly from the command line. For example, if you called your input file infile.txt then run:

epitopefinder_selectsites.py infile.txt

If the script is not executable on your platform, then run:

python epitopefinder_selectsites.py infile.txt

This will create the output described below.

Input file format

The input file is a text file that should contain the following lines in the order indicated below. Empty lines or lines beginning with # are ignored. Each entry takes the form of a key followed by its value:

  • epitopesbysitefile is the name of a file listing the epitopes by site as would be generated by epitopefinder_getepitopes.py as the value specified for epitopesbysitefile. This file must already exist, and contains the input data used by this script.

  • selectsitesfile is the name of the output file that we create. This file lists the number of epitopes per site for the selected sites specified by sites.

  • sites is a list of numbers specifying the sites for which we are trying to find the epitopes (in the same numbering scheme used by epitopefinder_getepitopes.py, which is typically sequential 1, 2, ... numbering). All of these sites must be present in epitopesbysitefile or an exception will be raised. If a site is listed multiple times, how it is handled depends on the value of retainmultiple.

  • retainmultiple specifies what we do if a site is listed multiple times in sites. The possible values are True and False. If retainmultiple is True, then if a site is listed multiple times in sites, then it will also be listed multiple times in the created selectsitesfile. If retainmultiple is False, then sites are only listed once in the created selectsitesfile even if they are listed multiple times in sites.

    This distinction is important if you are subsequently comparing the selected sites to the full set of sites in epitopesbysitefile using epitopefinder_plotdistributioncomparison.py. For instance, if you are looking at the set of sites that actually substitute during a protein’s evolution, you need to decide how to handle sites that substitute more than once. If you set retainmultiple to true, then sites that substitute multiple times will appear multiple times in the created selectsitesfile, which corresponds to counting each substitution rather than each site. So if you set retainmultiple to True here, then you will probably want to set pvaluewithreplacement to True in the input file for epitopefinder_plotdistributioncomparison.py. If you set retainmultiple to False here, then you will probably want to set pvaluewithreplacement to False in the input file for epitopefinder_plotdistributioncomparison.py.

Example input file

Here is an example input file:

# input file for epitopefinder_selectsites.py
epitopesbysitefile epitopesbysite_humanH3N2.csv
selectsitesfile selectedsites_humanH3N2.csv
sites 470 102 373 472 77 217 456 459 343 423 217 259 334 411 425 286 373 421 186 103 259 65 197 375 384 127 239 18 98 103 77 146 406 425 136 280 312 52 131 217
retainmultiple True

Output file

The output of this script is the file selectsitesfile. This file lists the number of epitopes per site sorted from the most to least epitopes. For example, running the script with the example input file above might generate a file with the following first few lines:

Site,NumberUniqueEpitopes
384,6
146,5
343,2
259,2
259,2
... (more lines will follow)

Note how site 259 is listed twice in this file since it specified twice by sites in the input file and retainmultiple is True.

Table Of Contents

Previous topic

Analysis scripts

Next topic

epitopefinder_plotlineardensity.py

This Page