HADiT: Haplotype Amplification Distortion in Tumors

This page provides links to HADiT, the software written in Java that implements the Amplification Distortion Test (ADT).  The following sections will guide you through downloading, building, and running HADiT.

The program has been developed by Itsik Pe'er's Lab of Computational Genetics at Columbia University. It is built in Java 1.5 and is tested in both the Windows and Linux environments.  The source code is distributed here in a jar package under the GPL license.

  Download: SourceForge HADiT Project Page

Dependencies

HADiT has dependencies on the following publicly available libraries.  Please download the indicated versions in order to compile HADiT.

  1. Colt Math Library (version 1.2.0): http://acs.lbl.gov/~hoschek/colt/
  2. Commons Math Library (version 1.1): http://commons.apache.org/math/
  3. JFreeChart (version 1.0.6): http://www.jfree.org/jfreechart/


Installation and Usage

HADiT requires Java Development Kit (JDK) 1.5 or higher in order to compile.  Assuming that the source code is in the directory $PROJECT_DIR/src, the libraries are in $PROJECT_DIR/lib, and you are in the source code directory, the command to compile the source code, as well as link in the libraries, on UNIX systems is:

javac *.java -d ../bin/ -classpath .:../lib/commons-math-1.1/commons-math-1.1.jar:../lib/jfreechart-1.0.6/lib/jcommon-1.0.10.jar:../lib/jfreechart-1.0.6/lib/jfreechart-1.0.6.jar:../lib/jfreechart-1.0.6/lib/jfreechart-1.0.6-swt.jar:../lib/colt/lib/colt.jar:../lib/colt/lib/concurrent.jar

The class files will be placed in the $PROJECT_DIR/bin directory (make sure you create it before compiling).

Running HADiT

There are several sample data files you will need to download first.  These data files represent simulated data instead of real data.  They represent the SNP and CNA information, as well as sample information, and the nucleotide map at each SNP marker.  These files can be downloaded in this rar file.  Unrar the files into a directory of your choice (using WinRAR, for example), which we will represent as $DATA

Running HADiT on this data signifies that you will be running the ADT on the data.  The command for doing this is:

java –cp . Haditallmulti ascnprefix=$DATA\ascn.chr. ascnsuffix=.txt outputdir=$DATA\Results\ chromrange={1-22} snpmap=$DATA\Simulated.snpMap.txt samplefilter=$DATA\Simulated.uniqueSamples.txt cancermap=$DATA\Simulated.uniqueSamples.txt tasklist=$DATA\TaskList.AmplificationDistortion.txt

The output will reside in the $DATA\Results\ directory (make sure you create it first before running HADiT). 

The most relevant output files will end in the .CountsSplit.txt extension, one file per chromosome.  These files contain amplification distortion LOD scores for each allele or haplotype starting at each SNP.  The columns are:

  1. Sliding Window Number
  2. Chromosome
  3. Position Start
  4. Position End
  5. rsID (without the “rs” prefix)
  6. Number of amplified alleles or haplotypes within that window
  7. The allele or haplotype
  8. Number of amplified instances of that allele or haplotype.  If we are examining a single SNP (sliding window size of 1), this indicates the number of amplified instances of that allele within amplified heterozygous calls only.
  9. Number of non-amplified instances of that allele or haplotype.  If we are examining a single SNP (sliding window size of 1), this indicates the number of non-amplified instances of that allele within amplified heterozygous calls only.
  10. The p-value of the binomial test for testing the number of amplified instances of the allele or haplotype
  11. The p-value of the binomial test for testing the number of non-amplified instances of the allele or haplotype
  12. A boolean indicator variable depicting whether column 10 is nominally significant (p ≤ 0.05) or not.
  13. The LOD score, which is –log10(column 10)

Columns after these indicate information that can be ignored.

Thus, ADT returns LOD scores for every allele or haplotype.  However, only a fraction of LOD scores are significant genome-wide.  To calculate the genome-wide significance threshold, run the following command:

java –cp . Haditallmulti ascnprefix=$DATA\ascn.chr. ascnsuffix=.txt outputdir=$DATA\Results\Perm\ chromrange={1-22} snpmap=$DATA\Simulated.snpMap.txt samplefilter=$DATA\Simulated.uniqueSamples.txt cancermap=$DATA\Simulated.uniqueSamples.txt tasklist=$DATA\TaskList.PermutationTesting.txt

The output will reside in the $DATA\Results\Perm\ directory (make sure you create it first before running HADiT).

The relevant files are those starting with the prefix “Top_”.