What is the probability that $n$-byte random byte array is a valid UTF-8 string?
It doesn't care if it's NFC or NFD.
What is the probability that $n$-byte random byte array is a valid UTF-8 string?
It doesn't care if it's NFC or NFD.
I will ignore the part about NFC and NFD in the question. Browsing over the standards, none of these normalized forms are closed under string concatenation nor there are simple criteria to determine whether something is in the normalized forms without some sort of database lookup.
I will stick to what is considered to be legal UTF-8 as specified in RFC 3629. In particular, we will ignore the possibility that the data stream begins with a BOM (byte order marker).
According to RFC 3629, UTF-8 encodes each of the $1,\!112,\!064$ valid code points in the Unicode code space using one to four 8-bit bytes. They have following bit patterns
$$ \begin{array}{|c:c:l|} \hline\\ k&\alpha_k&\text{bit patterns}\\ \hline\\ 1&128& \text{0xxxxxxx}\\ 2&30\times 64 & \text{110xxxxx}\; \text{10xxxxxx}\\ 3&15\times 64^2& \text{1110xxxx}\; \text{10xxxxxx}\; \text{10xxxxxx}\\ 4&4\times 64^3 & \text{11110xxx}\; \text{10xxxxxx}\; \text{10xxxxxx}\; \text{10xxxxxx}\\ \hline \end{array} $$ Not all the bit patterns corresponds to valid unicode code points. The column $\alpha_k$ above is the number of valid code points which uses $k$ bytes for encoding.
For $n \in \mathbb{Z}_{+}$, let $N_n$ and $P_n = \frac{N_n}{256^n}$ be the number and probability of valid UTF-8 string composed of $n$ bytes. Let $$ A(t) = \frac{\alpha_1}{256} t + \frac{\alpha_2}{256^2} t^2 + \frac{\alpha_3}{256^3} t^3 + \frac{\alpha_4}{256^4} t^4 = \frac{t}{2}+\frac{15}{512} t^2+ \frac{15}{4096}t^3+\frac{1}{4096}t^4. $$ By building up UTF-8 string from the one to four bytes UTF-8 representations of valid code points, we find:
$$1 + \sum_{n=1}^\infty N_n t^n = 1 + A(256t) + A(256t)^2 + A(256 t)^3 + \cdots = \frac{1}{1-A(256t)}$$
This leads to $$1 + \sum_{n=1}^\infty P_n t^n = \frac{1}{1-A\left(t\right)} = \frac{1}{1-\frac{t}{2}-\frac{15}{512} t^2- \frac{15}{4096}t^3-\frac{1}{4096}t^4} \tag{*1} $$ and hence the probabilities $P_n$ satisfy following recurrence relation:
$$P_n - \frac12 P_{n-1} - \frac{15}{512} P_{n-2} - \frac{15}{4096} P_{n-3} - \frac{1}{4096} P_{n-4} = \begin{cases}0,& n > 0\\1,& n = 0\end{cases}\tag{*2}$$
One can then use this recurrence relation to compute $P_n$ for any $n$ explicitly. Please note that for simplicity of presentation, we have set $P_0 = 1$ and $P_k = 0$ for negative $k$ in $(*2)$.
To deduce the asymptotic behavior of $P_n$. Let $\lambda_1, \lambda_2, \lambda_3, \lambda_4$ be the four roots of the equation $A(t) = 1$. Numerically, they are $$\begin{cases} \lambda_1 &= 1.770796029915762\\ \lambda_2 &= -16.29524359356811\\ \lambda_3, \lambda_4 &= -0.23777621817382 \pm 11.91183775521532 i \end{cases}$$ Notice $$\frac{1}{1-A(t)} = \frac{1}{\prod_{i=1}^4\left( 1 - \lambda_i^{-1} t\right)} = \sum_{i=1}^4 \frac{1}{\lambda_i A'(\lambda_i)\left(1 - \lambda_i^{-1} t\right)} = \sum_{i=1}^4 \frac{1}{\lambda_i A'(\lambda_i)}\sum_{n=0}^\infty \frac{t^n}{\lambda_i^n} $$ This leads to $$P_n = \sum_{i=1}^4 \frac{1}{\lambda_i A'(\lambda_i)}\lambda_i^{-n}$$
For large $n$, above expression will be dominated by the contribution from $\lambda_1$, the smallest root in magnitude. As a result, we get following approximation:
$$P_n \sim \frac{1}{\lambda_1A'(\lambda_1)} \lambda_1^{-n} \sim 0.87739479563671 \times 0.56471777839234^n\tag{*3} $$
Following is a table of the exact and approximated values of $P_n$ for $1 \le n \le 10$. As one can see, the approximation $(*3)$ is pretty good, the relative error falls below $10^{-4}$ starting at $n = 3$.
$$ \begin{array}{|r:cl:l|} \hline\\ n & P_n &&P_n (\text{approx})\\ \hline\\ 1 & \frac{1}{2} &= 0.5 & 0.49548043976496\\ 2 & \frac{143}{512} &= 0.279296875 & 0.27980661318093\\ 3 & \frac{647}{4096} &= 0.157958984375 & 0.15801176897502\\ 4 & \frac{23393}{262144} &= 0.089237213134766 & 0.089232055135417\\ 5 & \frac{52839}{1048576} &= 0.05039119720459 & 0.050390927937455\\ 6 & \frac{3819383}{134217728} &= 0.028456620872021 & 0.028456652875968\\ 7 & \frac{17255005}{1073741824} &= 0.016069975681603 & 0.016069977792599\\ 8 & \frac{623629417}{68719476736} &= 0.0090750024100998 & 0.0090750021578506\\ 9 & \frac{176087305}{34359738368} &= 0.0051248150703032 & 0.0051248150574871\\ 10 &\frac{101826182527}{35184372088832}&= 0.0028940741721897 & 0.0028940741739357\\ \hline \end{array} $$
Finally, there is one more point I would like to comment.
The exact value of $P_2$ is $\frac{143}{512}$. Out of the $2^{16}$ random two-byte byte-sequences, $2^{14}$ of them are valid UTF-8 strings because they composed of purely US-ASCII characters. If one subtract these from $P_2$, we find the probability of a random two-byte byte-sequence which is not pure ASCII being valid UTF-8 is
$$\frac{\frac{143}{512}-\frac14}{1 - \frac14} = \frac{5}{128} \sim 3.90625\%$$
Reproducing what has been quoted on the wiki page of UTF-8.