This page provides links to HADiT,
the software written in Java that implements the Amplification
Distortion Test
(ADT). The following sections will guide
you through downloading, building, and running HADiT.
The program has been developed by Itsik Pe'er's Lab of
Computational Genetics at Columbia University. It is built in Java
1.5 and is tested in both the Windows and Linux environments. The
source code is distributed here in a jar package under the GPL license.
Dependencies
HADiT has dependencies
on the
following publicly available libraries.
Please download the indicated versions in order to compile HADiT.
- Colt Math Library (version 1.2.0): http://acs.lbl.gov/~hoschek/colt/
- Commons Math Library (version 1.1): http://commons.apache.org/math/
- JFreeChart (version 1.0.6): http://www.jfree.org/jfreechart/
Installation and Usage
HADiT requires Java
Development
Kit (JDK) 1.5 or higher in order to compile.
Assuming that the source code is in the directory $PROJECT_DIR/src, the libraries are in $PROJECT_DIR/lib,
and you are in the source code directory, the command to compile the
source
code, as well as link in the libraries, on UNIX systems is:
javac
*.java -d ../bin/ -classpath .:../lib/commons-math-1.1/commons-math-1.1.jar:../lib/jfreechart-1.0.6/lib/jcommon-1.0.10.jar:../lib/jfreechart-1.0.6/lib/jfreechart-1.0.6.jar:../lib/jfreechart-1.0.6/lib/jfreechart-1.0.6-swt.jar:../lib/colt/lib/colt.jar:../lib/colt/lib/concurrent.jar
The
class files will be placed in the $PROJECT_DIR/bin directory (make sure
you
create it before compiling).
Running HADiT
There
are several sample data files you will need to download first. These data files represent simulated data
instead of real data. They represent the
SNP and CNA information, as well as sample information, and the
nucleotide map
at each SNP marker. These files can be
downloaded in this rar file. Unrar the files
into a directory of your choice (using WinRAR,
for example), which we will represent as $DATA
Running
HADiT on this data signifies that you will
be running
the ADT on the data. The command for
doing this is:
java –cp . Hadit
–allmulti ascnprefix=$DATA\ascn.chr.
ascnsuffix=.txt outputdir=$DATA\Results\ chromrange={1-22}
snpmap=$DATA\Simulated.snpMap.txt samplefilter=$DATA\Simulated.uniqueSamples.txt
cancermap=$DATA\Simulated.uniqueSamples.txt
tasklist=$DATA\TaskList.AmplificationDistortion.txt
The
output will reside in the $DATA\Results\ directory (make sure you
create it
first before running HADiT).
The
most relevant output files will end in the .CountsSplit.txt extension,
one file
per chromosome. These files contain
amplification
distortion LOD scores for each allele or haplotype starting at each SNP. The columns are:
- Sliding
Window Number
- Chromosome
- Position
Start
- Position
End
- rsID
(without the “rs” prefix)
- Number
of amplified alleles or haplotypes within
that window
- The
allele or haplotype
- Number
of amplified instances of that allele or haplotype.
If we are examining a single SNP (sliding window
size of 1), this indicates the number of amplified instances of that
allele
within amplified heterozygous calls only.
- Number
of non-amplified instances of that allele or haplotype.
If we are examining a single SNP (sliding window
size of 1), this indicates the number of non-amplified instances of
that allele
within amplified heterozygous calls only.
- The p-value of the binomial test
for testing the number of amplified instances of the allele or haplotype
- The p-value of the binomial test
for testing the number of non-amplified instances of the allele or
haplotype
- A boolean
indicator variable depicting whether column 10
is nominally significant (p ≤ 0.05) or not.
- The LOD score, which is –log10(column
10)
Columns
after these indicate information that can be ignored.
Thus,
ADT returns LOD scores for every allele or haplotype.
However, only a fraction of LOD scores are
significant genome-wide. To calculate
the genome-wide significance threshold, run the following command:
java –cp . Hadit
–allmulti ascnprefix=$DATA\ascn.chr.
ascnsuffix=.txt outputdir=$DATA\Results\Perm\ chromrange={1-22}
snpmap=$DATA\Simulated.snpMap.txt samplefilter=$DATA\Simulated.uniqueSamples.txt
cancermap=$DATA\Simulated.uniqueSamples.txt
tasklist=$DATA\TaskList.PermutationTesting.txt
The
output will reside in the $DATA\Results\Perm\ directory (make sure you
create it
first before running HADiT).
The relevant
files are those starting with the prefix “Top_”.