I have a second order Markov chain with 4 states {A,T,C,G} (the 4 DNA nucleotides).
the transition matrix looks like this:
A T C G
AA[0.1, 0.6, 0.2, 0.1]
AT[0.3, 0.1, 0.5, 0.1]
AC[0.5, 0.3, 0, 0.2]
AG[..., ..., ..., ...]
TA[..., ..., ..., ...]
TT[..., ..., ..., ...]
TC[..., ..., ..., ...]
TG[..., ..., ..., ...]
CA[..., ..., ..., ...]
CT[..., ..., ..., ...]
CG[..., ..., ..., ...]
GA[..., ..., ..., ...]
GT[..., ..., ..., ...]
GC[..., ..., ..., ...]
GG[..., ..., ..., ...]
I wanted to calculate the stationary probability vector for the 4 states to which this matrix converges. The Markov chain is regular.
In case of first order Markov chains this is easily done by calculating the limit of $P^n$ with $n\rightarrow \infty$.
I do not know how to approach the problem in case of second order Markov chains.
Also, having a limited dataset from which to determine the transition matrix, can I consider the stationary distribution of the 4 nucleotides as being the theoretical distribution I would have if I had a much larger pool from which to draw (with the same transition matrix)?
In other words, can I consider the stationary distribution like an estimation of the theoretical nucleotide frequency given the transition matrix obtained from limited data?