Molecular Evolution

A natural question for a mathematical scientist, in any field with enough data, is: what are the underlying dynamics, and the equations describing them? For Newtonian systems, this gives us the laws of motion; for statistical mechanics, we get Brownian motion and diffusion.

In molecular evolution, the underlying dynamics involve sequences, populations, and trees. The equations describe mutation of sequences, migration of lineages, and genealogical coalescence of samples. These equations are necessarily statistical in nature; both because evolution is inherently random, and also because (as in statistical physics) we can approximate a lot of the complicated deterministic parts with random models too.

Modeling of evolutionary processes often opens up new mathematical territory. For example, a DNA sequence experiencing a neighbor-dependent point substitution process is rather like a one-dimensional Ising model. These models are well-studied in condensed matter physics (leading to the Potts models used in the 2010’s to find protein folding contacts). However, real DNA sequences also experience insertions and deletions (indels), which are far less well-studied in physics.

One part of our group’s research program has been to develop and analyze indel models, with the goal of reconstructing the natural history of genomes, especially mobile elements. We are also interested in using the structured coalescent to model viral phylodynamics, and in using evolutionary models to annotate viral genomes with conserved RNA and protein structure.