(a) Hidden Markov models (HMM) are used to identify genes in genome sequencing projects. Describe how you would build a hidden Markov model to identify genes in a genome sequence.
(b) Give one other application of hidden Markov models.
(a) Hidden Markov Models are probabilistic frameworks where the observed data (for instance DNA sequence) are modeled as a series of outputs generated by one of several hidden internal states.
As we described earlier, HMMs can be used effectively to describe biological sequences. Let us consider an HMM that models protein-coding genes in eukaryotes as a basic example. It is well known that certain protein-coding regions show codon bias. The non-uniform use of codons results in various symbol statistics for different codon positions (Henderson, 97) and is also a source of the period-3 property in coding regions (Yoon 2009). These properties are not found in introns and are not converted to amino acids. It is therefore necessary to integrate these codon statistics when modeling protein-coding genes and constructing a gene-finder.
Fig 1 displays an HMM toy for the modeling of eukaryotic genes. The HMM in question seeks to capture statistical variations between exons and introns. The HMM has four states in which E1, E2 and E3 are used to model the base statistics in exons. Each EC uses a different set of emission probabilities to represent the symbol statistics at the kth position of the codon. The state I is used to model base statistics for introns. Note that this HMM may reflect genes with several exons, where the corresponding exons may have a variable number of codons, and the introns may also have variable lengths.
This example shows that, if we know the structure and the essential characteristics of the biological sequences of interest, the construction of the corresponding HMM is relatively straightforward and can be performed intuitively.
(b) Hidden Markov Models being statistical can be applied in a wide range of subjects for example in pattern regognition in speech or handwriting, in statistical mechanics, chemistry especially thermodynamics, etc.
Henderson, J., Salzberg, S., & Fasman, K. H. (1997). Finding genes in DNA with a hidden Markov model. Journal of Computational Biology, 4(2), 127-141.
Haussler, D. K. D., & Eeckman, M. G. R. F. H. (1996). A generalized hidden Markov model for the recognition of human genes in DNA. In Proc. int. conf. on intelligent systems for molecular biology, st. louis (pp. 134-142).
Yoon, B. J. (2009). Hidden Markov models and their applications in biological sequence analysis. Current genomics, 10(6), 402-415.