Usage

From the command line, extract DASH with tar xzvf dash-X-X-X.tar.gz, enter the extracted directory, and compile with make all. A simple test-case using inputs from the test subdirectory can be run by calling make test.

DASH uses a modified version of the Boost Graph Library subgraph.hpp class, with all of the neccessary files provided in this distribution. If you are having Boost related issues compiling, please make sure that a native copy of Boost is not superceding the one referenced.
Input

DASH accepts IBD segments through the standard input, one segment per line, with each line whitespace delimited with the following columns:

* Family ID 1
* Individual ID 1
* Family ID 2
* Individual ID 2
* Segment start (bp)
* Segment end (bp)

Execution

DASH makes several assumptions about the structure of the shared segments. First, all segments are expected to be on the same chromosome - we recommend splitting genomic data into separate chromosomes which can be easily parallelized. More importantly, DASH assumes that each individual in the pair represents a haploid sample. While DASH allows for some degree of error and attempts to exclude individuals from a haplotype to which they are loosely connected, when a single input individual is sharing both of it's haplotypes to many other samples, DASH will place that individual into the single most likely haplotype cluster rather than both.

Practically, we have addresed this by using the GERMLINE algorithm in -haploid mode to detect IBD segments, where each individual is split into it's two respective haplotypes. See example below.

We highly recommend providing DASH with a PLINK formatted list of all samples through the -fam parameter. With this flag, DASH will also code for each haplotype as a genetic marker (where carriers have the minor allele and non-carriers have the major allele) and output a ped file that can directly be used for association with any software that takes PLINK format input. The user can generate corresponding map files from the .hcl output as they see fit. If the fam file is provided, the input IDs in the matches are presumed to have a .0/.1 suffix specifying which of the respective haplotypes this segment is involved in.

The algorithm is run by calling:
cat [ ibd input ] | ./dash [ optional flags ] [ output prefix ]

which will generate a set of hcl, log, and optionally ped files with the output prefix.
Output

As it runs, DASH generates a .hcl Haplotype CLuster file where each line represents a cluster/haplotype with the following tab separated fields:

* Cluster identifier
* Cluster start position
* Cluster end position
* Family and Sample ID for cluster carriers ...

Example

A typical analysis, first generating IBD segments using our GERMLINE algorithm would be the following:
germline -haploid
cut -f 1,2,4,10,11 germline.match | dash -fam my_samples.fam my_clusters
cut -f 1-3 my_clusters.hcl | awk '{ print 0,$1,0,int(($2+$3)/2) }' > my_clusters.map
plink --ped my_clusters.ped --map my_clusters.map --pheno my_trait --assoc
Options

The program has several command line options to direct the clustering process:
Flag Default Description
-help - Print this list of commands
-fam - PLINK format .fam file listing sample ids. Used to generate ped/map files (see above).
-win 500000 Sliding window size.
-density 0.6 Minimum cluster density.
-r2 0.95 Maximum r^2 for which two haplotypes are considered different and printed, set to 1 to print all.
-min 4 Minimum haplotype/cluster size.