This work addresses two major shortcomings of the standard approach to protein modeling. Linear HMMs typically contain several thousand parameters and therefore require large training sets, on the order of 200 protein sequences. Also, these HMMs imply a relatively simple model of molecular evolution. The Meta-MEME software toolkit builds motif-based HMMs that focus on the biologically important motif regions. These regions are highly conserved throughout the protein family due to functional or structural constraints. Meta-MEME models are smaller than standard HMMs, allowing for smaller training sets and faster database searching. Furthermore, Meta-MEME employs a non-linear topology that allows for the representation of large-scale evolutionary events, such as the deletion, copying and shuffling of protein domains.
The models produced by Meta-MEME provide biologists with insight into the general characteristics of the given family of related proteins. The models may also be used to produce multiple alignments and to search for remote homologs. For smaller training sets, Meta-MEME provides homology detection performance that is superior to that provided by standard HMMs.