GermLineUsage

Usage

From the command line, extract germline with tar xzvf germline-X-X-X.zip, enter the extracted directory, and compile germline with make all. A simple test-case using shortened HapMap samples can be run using make test. The executable is run as germline <options> which prompts the user for input/output file information and runs the algorithm.

Input
GERMLINE accepts as input the following formats:

  • [ doc ] Plink / ped+map
  • [ doc ] PHASE / HapMap

NOTE: Although the PLINK format is not intended for haplotypes, GERMLINE expects the respective alleles to appear in
order; i.e. the first allele always corresponds to one haplotype and the second allele to the other. Also, PLINK arbitrarily re-orders the
alleles in processing the files, so we do not recommend handling phased data with PLINK prior to GERMLINE analysis because the haplotypes
may not be intact (use the -from_snp and -to_snp flags to target specific regions).

Output

Upon completion, GERMLINE generates a .match and .log file in the specified location. Each line in the .match file corresponds to a pairwise shared segment, with the following fields:

  • Family ID 1
  • Individual ID 1
  • Family ID 2
  • Individual ID 2
  • Chromosome
  • Segment start (bp)
  • Segment end (bp)
  • Segment start (SNP)
  • Segment end (SNP)
  • Total SNPs in segment
  • Genetic length of segment
  • Units for genetic length (cM or MB)
  • Mismatching SNPs in segment
  • 1 if Individual 1 is homozygous in match; 0 otherwise
  • 1 if Individual 2 is homozygous in match; 0 otherwise

Binary Output

To spave space GERMLINE can also generate binary output using the -bin_out flag. This flag will generate three files:

  • *.bsid   Two columns per line for each sample: FAM ID,SAMPLE ID.
  • *.bmid   Four columns per line for each marker: CHROMOSOME,RSID,GENETIC DISTANCE,PHYSICAL DISTANCE.
  • *.bmatch Binary match file containing integer pointers to samples (from bsid file), markers (from bmid file) and boolean meta-data.

The binary files can be converted back to the standard flat format described above by using the parse_bmatch utility provided with the code. Load the three generated files using parse_bmatch [BMATCH FILE] [BSID FILE] [BMID FILE] and the flat match output will be printed to standard out. See the parse_bmatch.cpp code for binary format details.

Options

The program has several command line options to direct the segmental sharing process:

FlagDefaultDescription
-map-File location for genetic distance map. Uses the PLINK map format.

-min_m3Minimum length for match to be used for imputation (in cM or MB).
-err_hom2The maximum number of mismatching homozygous markers for a slice to still be considered part of a match.
-err_het0The maximum number of mismatching heterozygous markers for
a slice to
still be considered part of a match.

-from_snp-Indicate the ID of the first SNP to start processing from.
-to_snp-Indicate the ID of the last SNP to end processing with.
-h_extend-Extends from exact seeds using haplotypes rather than genotypes; useful when
data is well-phased (e.g. trios)
-homoz-Allow self matches (test for homozygosity)

-homoz-only-Analyze and report only auto/homo-zygous segments, no IBD reported but significantly faster analysis.
-haploid-Treat each input individual as two distinct and separate haplotypes. Output IDs will have .0/.1 suffix corresponding to each haplotype. The -err_het flag will have no effect in this analysis.
-bin_out-Generate output matches in binary format, creates a *.bmatch *.bsid and *.bmid files. These files can be converted to flat output using the parse_bmatch utility included and compiled in the package.
-bits128Size of each slice (in markers) used for exact matching seeds.