MutaGeneSys Project Page

MutaGeneSys uses genome-wide genotype data to estimate individual disease susceptibility. Our system integrates three data sources: the International HapMap project(hapmap.org), whole-genome marker correlation data (description) and the Online Mendelian Inheritance in Man database (OMIM). MutaGeneSys accepts SNP data of individuals as query input and delivers disease susceptibility hypotheses even if the original set of typed SNPs is incomplete. Our system produces population, genotyping technology, and confidence-specific predictions in interactive time.

There are several ways to use MutaGeneSys:

MutaGeneSys is available for use free of charge. If you use our system for scientific research that results in a publication, please cite our most recent publication (see here).

FAQ for the On-Line Version of MutaGeneSys

Q When will the full version of MutaGeneSys be available as a web service?
A MutaGeneSys is schedules for release on October 18th, 2007.

Q How do I enter query SNPs?
A (short) rs140504:AA, rs140504:AC, rs140504:A are all valid; rs140504:A? is invalid
A (long) SNPs are specified using reference cluster IDs (rs#). See dbSNP for a complete set of reference cluster IDs. An rs# is followed by ":" and then by one or two typed alleles. rs140504:AA, rs140504:AC, rs140504:A are all valid entries. If only one allele was typed, enter rs140504:A, NOT rs140504:A?.
A space-separated list of SNPs may be entered directly into the text field. If you wish to enter more than 1024 characters, please use the file upload feature instead.

Q How do I use the file upload feature?
A Click the "Browse..." button, locate the SNP file, and click "Open". Entries in the file must be space-separater. You may break the file into multiple lines, or enter all SNPs on a single line. A sample snp file is available here.

Q How does MutaGeneSys use population selection?
A If you wish to restrict diagnostic hypotheses to a specific population, please select your group from the population pull-down. We use standard HapMap definitions of populations. EU stands for Utah residents with ancestry from northern and western Europe. JPT+CHB stands for Japanese in Tokyo, Japan and Han Chinese in Beijing, China. YRI stands for Yoruba in Ibadan, Nigeria.

Q What is the coefficient of determination?
A MutaGeneSys uses a whole-genome marker correlation dataset to identify whether any of the query SNPs are linked with causal SNPs reported by OMIM (i.e. whether there are any proxies to causal SNPs in the query set). Correlations between proxies and causal SNPs are associated with a coefficient of determination (squared Pearson's correlation coefficient), a number between 0 and 1 that quantifies the correlation. The closer this coefficient is to 1 -- the more significant the correlation.

Q How does MutaGeneSys use the genotyping technology pull-down?
A MutaGeneSys uses marker association data based on genome-wide SNP arrays. We work with SNP array data from two companies: Affymetrix GeneChip and Illumina HumanHap. If one specific technology, say Affymetrix GeneChip, is selected from the technology pull-down, MutaGeneSys will only use Affymetrix data to estimate disease susceptibility.

Q How do I interpret the output?
A MutaGeneSys processes a query and displays results on the Query Results page. Output is presented in the following columns:

Q How much data is there in MutaGeneSys?
A MutaGeneSys currently contains 906 single-marker associations and 393 two-marker associations. These are specific to population, and genotyping technology and resolution. Single-marker associations also include the trivial associations of an OMIM SNP with itself. The complete MutaGeneSys dataset can be downloaded from this site.

Q Where can I find more information about MutaGeneSys?
A Please refer to out Publications section. If your questions were not answered after you read the technical report, contact us.

Installing MutaGeneSys

To run MutaGeneSys on your own machine please follow the instructions below. In this distribution, you will find the following files: Please address your questions regarding this software to Julia Stoyanovich.

Download the MutaGeneSys Dataset

You may download the latest version of MutaGeneSys findings in comma-separated format (CSV). These findings are based on the B36 release of HapMap, and the SNP locus field refers to coordinates in that release. The first line of each file documents the fields.

Single_Marker_B36.csv
Two_Marker_B36.csv

Publications

Credits and Fine Print

This material is based in part upon work supported by the National Institute of Health grant 1U54CA121852-01A1. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Institute of Health.

This material is based in part upon work supported by the National Science Foundation under Grant s IIS-0121239. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.