In general, you prove things by induction on well-ordered sets. Recall that a well-ordered set is a totally ordered set $(N,\leqslant)$ such that every non-empty subset has a minimum. If $N$ is a well-ordered set and $A\subset N$ is such that $a\in A$ whenever $\{b\in A:b<a\}\subset A$, then $A=N$. If it were not so, then $N\setminus A$ would have a minimum, say $a$, but then this contradicts the hypothesis, since $\{b\in A: b<a\}\subset A$.
If $N$ is a well-ordered set and for each $a\in N$, $P(a)$ is a proposition, then you can define $A$ to be the set of $a\in N$ such that $P(a)$ is true. Proving $P$ by induction is showing that $A$, defined in this way, has the property above.
I state it in this level of generality, because one can prove things by induction on ordinals, not just $\mathbb{N}$.
Nested induction as you've described it in the problem can be proving the statement by induction on $N=\mathbb{N}\times \mathbb{N}$, which is well-ordered lexicographically. So $(m,n)\leqslant (m',n')$ iff $m<m'$ or $m=m'$ and $n<n'$. OR, of course, you could use antilexicographic ordering, which would likely make a difference, since you can't yet use the fact that addition is commutative.
As a matter of practice, not logical necessity, when you're proving that $A$ has the condition stated above, you consider the cases $a=\min N$ and $a\neq \min N$ separately. Sometimes you handle several values of $a$ as isolated cases. These are the base cases vs. inductive steps. But it may be the case that you don't really have a "base case" vs "inductive step," but rather you have some $a$ which, by the nature of $A$, have to be dealt with differently than other $a$. Quite often, when proving things on ordinals, there's a base case of $0$, a limit ordinal case, and a successor case. Since $\mathbb{N}$ does not contain any limit ordinals (in fact, it is the smallest limit ordinal), the third case doesn't come up in induction on $\mathbb{N}$.
In your case, I think pairs of the form $(m,0)$ will be dealt with separately than $(m,n+1)$. The cases $(m,0)$ can be thought of as base cases, but it might be confusing, since they're interspersed throughout, rather than all at the beginning. You can also think about this as nested induction with an outer hypothesis, and within each outer hypothesis, you have an inner hypothesis. The outer hypothesis would be something like $P(m):$ For all $n$, $s(m+n)=s(m)+n$. The inner would be different for each $m$. $P_m(n)$ would be that $s(m+n)=s(m)+n$(this is for a fixed $n$, not all $n$). Then the $(m,0)$ case is the base case of the inner induction on $P_m$. But for each $m$, you have such a base case to prove $P(m)$. So you prove the result by induction on $n$ with $m$ held fixed. Then you prove $P_{m+1}$ using that $P_k$ holds for all $k\leqslant m$, which is the outer induction. Of course, you could switch which is the outer vs. inner. Have $n$ on the outside and $m$ on the inside. Since your goal is to prove commutivity of addition, you aren't using commutivity of addition, so whether you treat $m$ or $n$ as the variable of the inner vs. outer hypothesis will likely make a difference. So it could make things hard if you pick the wrong order. As someone who frequently does double induction proofs on ordinals (where addition and multiplication are not commutative, order definitely matters).
It's worth mentioning that induction on $\mathbb{N}\times \mathbb{N}$ ordered lexicographically is essentially the same as induction on $\omega^2$. You could have the statement $P(\omega m+n):$ $s(m+n)=s(m)+n$. Prove this by induction on $\omega^2$. Then the $(m,0)$ cases are either the $0$ case if $m=0$ or the limit ordinal cases when $m>0$.
The definitions are accurate, the naturals are a bit weirdly defined as strings. Let $\Sigma = {0, s}$, so we have $0 = 0$, $1 = s0$, $2 = ss0$ etc.
– Kliker Feb 29 '24 at 16:37