# Answer to Question #254629 in Bioinformatics for area08

Question #254629

The frequent words with mismatches problem

One way to solve the Frequent Words with Mismatches problem is to generate all 4k k-mers Pattern, compute ApproximatePatternCount(TextPatternd) for each k-mer Pattern, and then find k-mers with the maximum number of approximate occurrences. This is an inefficient approach in practice, since many of the 4k k-mers should not be considered because neither they nor their mutated versions (with up to d mismatches) appear in Text.

Genome= GCAAAATGGAGCAGGATCAGCAAAATGGAAAATAAATGGAGGATCAAAATAAATGGAGGAGGAAAATGGAGGAAAATAAATGGATCAGGAAAATGCAGCAGGATCATCATCAGGAGCAGGATCAAAATTCAGGAGCAGGAGGATCAGCATCAGGAGGATCAGCAGGAAAATGCAGGAGGAGGAGGAAAATTCAAAATGGAGGAGGAGGAGCATCAGCAGCATCAGGAGGAGGATCAGCAGCAGGAGGAGGAGGAGGAAAATGGAGGAGGAGCAGGAGGAGCATCAGGAGGATCAGGAGCATCAGCAAAATTCAAAATGGAGGAAAATGCAGGAAAATGGAGCAGGAAAATAAATTCATCAAAATGCAGGAGGA

k= 6

d= 2

1
2021-10-22T13:42:02-0400

You can see that ACTAT is a most frequent 5-mer of ACAACTATGCATACTATCGGGAACTATCCT, and ATA is a most frequent 3-mer of CGATATATCCATAG.

