INFOSTIP: INdividuals
FOr Sequencing by Total
Information Potential
We
have designed a tool for informed selection of individuals to sequence from a
population. It takes as input a collection of shared regions for each pair of
samples (in the form of a .match file) and a sequencing budget. It has been
implemented in C++.
Installation instructions
1. Download the source code.
2. Extract the archive using the command tar -zxvf ImputationTool.tar.gz
3. Then run command "make install."
Usage
Command: "./infostip
<Location-of-match-files> <Match-file-prefix>
<No-of-chromosomes> <Sequencing-budget> >
<Output-file>"
Optional Input: "<No-of-SNPs>"
Options
1. Location-of-match-files: Path to the folder containing the .match
files. (eg. /Path_to_folder/ )
2. Match-file-prefix: Match files should be named in the format prefix.chromosome_no.match (eg.
ABC.1.match). Here the prefix is "ABC.", chromosome_no is "1". Enter "prefix."
in the command line input.
3. No-of-chromosomes: Number of chromosomes on which you want to run the
method.
4. Sequencing-budget: Number of individuals you want to select from the
population to sequence.
5. Optional Input - No-of-SNPs: Specify this parameter if you want to
consider only shared regions having number of SNPs greater than the value you
specify here.
Output
1. Individual Picked: Individual ID of the individual
picked.
2. Utility: Total length (in bp) of the shared
region that an individual shares with all unsequenced individuals across al chromosomes.
Each
line in the .match file represents a shared segment for a pair of individuals
with the following fields:
1. Family ID 1
2. Individual ID 1
3. Family ID 2
4. Individual ID 2
5. Chromosome
6. Segment start (bp)
7. Segment end (bp)
8. Segment start (SNP)
9. Segment end (SNP)
10. Total SNPs in segment
11. Genetic length of segment
12. Units for genetic length (cM or MB)
13. Mismatching SNPs in segment
14. 1 if Individual 1 is homozygous in match; 0
otherwise
15. 1 if Individual 2 is homozygous in match; 0 otherwise
Match files can be generated using GERMLINE
NSF
CAREER award