.. _epitopefinder_selectsites.py: ================================================ epitopefinder_selectsites.py ================================================ This script helps with analysis of output from :ref:`epitopefinder_getepitopes.py`. Specifically, it allows you to specify a subset of sites in the protein. It then generates a file that lists the number of epitopes for each of the selected sites, sorted in order from most to least epitopes. Running the script --------------------- :ref:`epitopefinder_selectsites.py` takes as input the name of a single file, the format of which is detailed below. If you have installed the package so that the scripts are the search path, you can run this script directly from the command line. For example, if you called your input file ``infile.txt`` then run:: epitopefinder_selectsites.py infile.txt If the script is not executable on your platform, then run:: python epitopefinder_selectsites.py infile.txt This will create the output described below. Input file format --------------------- The input file is a text file that should contain the following lines in the order indicated below. Empty lines or lines beginning with # are ignored. Each entry takes the form of a key followed by its value: * *epitopesbysitefile* is the name of a file listing the epitopes by site as would be generated by :ref:`epitopefinder_getepitopes.py` as the value specified for *epitopesbysitefile*. This file must already exist, and contains the input data used by this script. * *selectsitesfile* is the name of the output file that we create. This file lists the number of epitopes per site for the selected sites specified by *sites*. * *sites* is a list of numbers specifying the sites for which we are trying to find the epitopes (in the same numbering scheme used by :ref:`epitopefinder_getepitopes.py`, which is typically sequential 1, 2, ... numbering). All of these sites must be present in *epitopesbysitefile* or an exception will be raised. If a site is listed multiple times, how it is handled depends on the value of *retainmultiple*. * *retainmultiple* specifies what we do if a site is listed multiple times in *sites*. The possible values are *True* and *False*. If *retainmultiple* is *True*, then if a site is listed multiple times in *sites*, then it will also be listed multiple times in the created *selectsitesfile*. If *retainmultiple* is *False*, then sites are only listed once in the created *selectsitesfile* even if they are listed multiple times in *sites*. This distinction is important if you are subsequently comparing the selected sites to the full set of sites in *epitopesbysitefile* using :ref:`epitopefinder_plotdistributioncomparison.py`. For instance, if you are looking at the set of sites that actually substitute during a protein's evolution, you need to decide how to handle sites that substitute more than once. If you set *retainmultiple* to true, then sites that substitute multiple times will appear multiple times in the created *selectsitesfile*, which corresponds to counting each substitution rather than each site. So if you set *retainmultiple* to *True* here, then you will probably want to set *pvaluewithreplacement* to *True* in the input file for :ref:`epitopefinder_plotdistributioncomparison.py`. If you set *retainmultiple* to *False* here, then you will probably want to set *pvaluewithreplacement* to *False* in the input file for :ref:`epitopefinder_plotdistributioncomparison.py`. Example input file --------------------- Here is an example input file:: # input file for epitopefinder_selectsites.py epitopesbysitefile epitopesbysite_humanH3N2.csv selectsitesfile selectedsites_humanH3N2.csv sites 470 102 373 472 77 217 456 459 343 423 217 259 334 411 425 286 373 421 186 103 259 65 197 375 384 127 239 18 98 103 77 146 406 425 136 280 312 52 131 217 retainmultiple True Output file ------------------- The output of this script is the file *selectsitesfile*. This file lists the number of epitopes per site sorted from the most to least epitopes. For example, running the script with the example input file above might generate a file with the following first few lines:: Site,NumberUniqueEpitopes 384,6 146,5 343,2 259,2 259,2 ... (more lines will follow) Note how site 259 is listed twice in this file since it specified twice by *sites* in the input file and *retainmultiple* is *True*. .. include:: weblinks.txt