PROJECT RESOURCES



This page will incorporate project ideas incrementally. Please consult the Older Project Resources Guide for additional project ideas and references.

General Bioninformatics References:

The following links provide access to general bioinformatics journals, conferences, datasets and research organizations.

PROJECT AREAS

Comparative Surveys


Consider competing techniques to address a given bioinformatics problem; establish metrics and scenarios to compare their performance; and pursue a comparative evaluation. Examples of areas, addressed by multiple techniques, include:

  • Multiple Sequence Alignment
  • Computing phylogeny
  • Extracting regulatory network model from microarray expression data
  • Discovering sequence motifs
  • Computing protein structural features using sequence homology

    New Sequence Analysis Techniques


    Several areas of sequence analysis may admit new techniques.

    Multiple Sequence Alignment (MSA)

    Recent work (e.g., Google: Muscle, ProbCons) have demonstrated significant improvements in MSA. Several ideas, underlying these improvements, may admit interesting extensions.

  • Seeding the MSA, using FASTA/BLAST-like strategy
  • Filtering low-scoring alignments to accelerate MSA computations
  • Using machine-learning training to improve the MSA guide tree and/or adapt the scoring to best reflect evolutionary distances
  • Developing new metrics for biological fidelity of an MSA; such metrics may possibly reduce the computational complexity of MSA
  • Using simulated annealing (or other global optimization) techniques to improve MSA

    A term project may exploit these or other principles to design new algorithms to improve MSA quality/speed.

    References:

  • Do, C., Mahabhashyam, M., Brudno, M., and Batzoglou, S. 2005. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research. 15: 330-340.
  • Edgar, R. 2004. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 5: 113.
  • Wallace, I., Blackshields, G., and Higgins, D. 2005. Multiple sequence alignments. Current Opinion in Structural Biology. 15: 261-266.
  • Edgar, R., and Batzoglou, S. 2006. Multiple sequence alignment. Current Opinion on Structural Biology. 16: 368-373.
  • Sze, S.-H., Lu, Y., and Yang, Q. 2006 A polynomial time solvable formulation of multiple sequence alignment. Journal of Computational Biology, 13, 309-319.

    Spectral Sequence Analysis Techniques

    Spectral analysis, most notably Fourier analysis, has played an extensive role in analyzing a broad range of natural phenomena. Initial applications of spectral techniques to sequence analysis have accomplished preliminary success (see references below) . Your project may investigate new or improved applications of spectral techniques.

    One may view a DNA sequence (likewise a protein) as a composition of 4 binary waves, one for each nucleotide. For example, the sequence ACGAATTACGAA may be uniquely represented in terms of the indicator sequences for the occurrences of A,C,T,G; XA=100110010011, XC=010000001000, XT=000001100000 and XG=001000000100. These indicator sequences may, in turn, be represented by respective discrete Fourier transforms and analyzed spectrally. Such spectral analysis of introns can often disclose periodic sequence structures, corresponding to codon periodicity. Unfortunately, evolutionary changes act non-linearly on these DNA indicator sequences and thus do not conserve periodicity features, limiting the power of spectral analysis.

    Protein structure, in contrast with DNA sequences, exhibit high degree of periodicity which is, furthermore, highly conserved through evolutionary changes. Protein folds, such as alpha helices or beta sheets, typically exhibit long-range periodicity which is retained through evolution. Thus, a spectral representation of folds, with evolutionary operators represented in terms of respective conformation changes, may admit valuable analyses of protein features.

    A term project may explore some initial such applications of spectral techniques to protein structure analysis. For example, you may investigate the spectral structure of transmembrane proteins; these have typically 7 highly organized helices. The statistical relationships between these conformation and the underlying protein sequences admit accurate prediction of transmembrane domains by Hidden Markov Models (HMM). This suggests that spectral analysis of the underlying protein sequences and respective folds may potentially be used for accurate discovery of transmembrane protein and, perhaps, computing their conformations.

    References:

  • R. F. Voss, ÒEvolution of long-range fractal correlations and 1/f noise in DNA base sequences,Ó Physical Review Letters, vol. 68, no. 25, pp. 3805Ð3808, June 1992.
  • Anastassiou, D. 2000. Frequency-domain analysis of biomolecular sequences. Bioinformatics 16: 1073Ð1082.
  • Kotlar D. and Lavner Y., Gene prediction by spectral rotation measure: a new method for identifying protein coding regions, Genome Res. 2003 13: 1930-1937
  • P. P. Vaidyanathan and B-J. Yoon, The role of signal-processing concepts in genomics and proteomics, 2004; www.systems.caltech.edu/dsp/students/bjyoon/journal/franklin_dsp_genomics.pdf
  • D. Sussillo, A. Kundaje and D. Anastassiou, Spectrogram analysis of genomics, EURASIP Journal on Applied Signal Processing 2004 (2004) 29Ð42.
  • J Gao, Y Qi,Y Cao,and W. W Tung, Protein Coding Sequence Identification by Simultaneously Characterizing the Periodic and Random Features of DNA Sequences,J Biomed Biotechnology 2005(2).



    Additional Sections To Be Added


    For now, please consult the Older Project Resources Guide.