There are two parts to MCMC. The first part is Markov Chain and the second is Monte Carlo. Mostly, MCMC methods are used to do multi-dimensional integration when analytic methods are difficult or impossible. Difficulties may arise because regions are partly
described in terms of functions that do not have integrals in closed
form. Also, it is sometimes difficult to visualize
the regions of integration in higher dimensions.
Markov Chain. The task is to describe a Markov Chain with a stationary distribution from which the answer to an integration
problem can be deduced. Then the chain is simulated through very
many steps. After an initial chunk of steps called the 'burn-in period' one supposes that the simulation has settled down to match
the stationary distribution. Many additional steps beyond the burn-in period are then taken as a sample from the stationary distribution.
Typically, because of Markov dependence, these observations will not be independent. (More about that below.)
Monte Carlo. The Markov Chain must be described in such a way
that it can be simulated. The simulation should reach steady state
in a reasonable length of time. It should 'mix well', meaning that
it should move freely over the state space.
Often graphical methods are used to judge suitability. 'History plots' showing the movement of the chain over time in various dimensions can show how freely the chain moves around its state space. Plots of the autocorrelation function (ACF plots) are used to judge how persistent Markov dependence is. For example, an ACF plot may show that observations that are n steps apart are very nearly independent. In that case, one might 'thin' the steps of the chain, using only every nth step, to get a sample of almost-independent observations.
Simple 3-D example. Suppose we want to know the volume beneath
a correlated normal bivariate density function and above a certain
region of the plane. MCMC is one method of finding such a volume. (We choose it as an example here because it is relatively simple and because results can be verified by other
approximate methods.)
Specifically, suppose $X$ and $Y$ both have mean 0 and standard deviation 1, and that they have correlation $\rho = 0.8.$ We seek the probability that $W = \max(X, Y)$ exceeds 1.25. We make a Markov Chain using the relationship $$X|Y = y \sim Norm(\rho y, \sqrt{1 - \rho^2})$$ and the symmetric one for $Y|X = x.$ Starting at some arbitrary $y_1$ we sample the point $x_2$ from $X|Y = y_1$ and then $y_2$ from $Y|X = x_2$, and continue iteratively to obtain $(X_3, Y_3)$ from $(x_2, y_2).$
Clearly, this is a Markov Process because
the state at step $n + 1$ depends only on the state at step $n$ and a well defined probability rule for making the transition. In R, the
process can be simulated through many steps using the code below.
We assume steady state is reached before 15,000 of the 60,000 iterations and base the value $P\{W > 1.25\} \approx 0.16$ on the remaining steps. Because of Markov dependence these 45,000 observations are
not independent, but ACF analysis shows they have accuracy equivalent to that of more than 5000 independent observations.
This answer agrees with results from other methods. However, there are multidimensional integration problems for which MCMC methods are currently the only known tractable methods of approximation.
m = 60000; b = m/4; rho = 0.8; sg = sqrt(1 - rho^2)
x = y = numeric(m); y[1] = 3
for (i in 2:m) {
x[i] = rnorm(1, rho*y[i-1], sg)
y[i] = rnorm(1, rho*x[i], sg) }
mx = pmax(x[b:m], y[b:m])
mean(mx > 1.25)
## 0.1595691
Many practitioners would call this particular MCMC method a Gibbs Sampler because of the way it uses conditional distributions. Another MCMC method would be to use the Metropolis-Hastings algorithm. MCMC methods are widely used in Bayesian statistical inference because
difficult multidimensional integration problems often arise
in that area. One of the earliest uses of MCMC methods was in the Manhattan Project at the end of World War II, although the terminology 'MCMC' was not in use at that time.
Acknowledgment: The example above is similar to Example 7.9 in Suess and Trumbo: Introduction to Probability Simulation and Gibbs Sampling with R, Springer, 2010.