The number of possible linear combinations $\sum_{k=1}^i \alpha_k c_{j_k}$ of exactly $i$ vectors from $\{c_1,\ldots,c_j\}$ such that each of the coefficients $\alpha_k$ is nonzero is equal to $|\{ (\alpha_1,\ldots,\alpha_i): \alpha_1 \ne 0, \ldots, \alpha_i \ne 0 \}| = (q-1)^i$.
We want to pick a $c_j$ which is not in the linear span $W$ of $d-2$ or fewer vectors from $\{c_1,\ldots,c_{j-1} \}$. The set $W$ contains the $0$ vector. The subspace $W$ also contains all nonzero scalar multiples of the vectors $c_1,\ldots,c_{j-1}$. The number of such nonzero scalar multiples is ${j-1 \choose 1}(q-1)$ because there are ${j-1 \choose 1}$ ways to pick a vector and $q-1$ ways to pick the nonzero coefficient. The set $W$ also contains all linear combinations of the form $\alpha_1 c_1 + \alpha_2 c_3$, where $\alpha_1$ and $\alpha_2$ are nonzero, and $c_1$ and $c_3$ are two specific vectors in $\{c_1,\ldots,c_{j-1} \}$. The number of such linear combinations is ${j-1 \choose 2}(q-1)^2$ because there are 2 ways to choose the two vectors and $(q-1)^2$ ways in which both coefficients can be nonzero (the case where one of the coefficients is zero gives vectors which we already counted in the previous step). Continuing in this manner, we see that the number of vectors in $W$ is at most $\sum_{i=0}^{d-2} {j-1 \choose i}(q-1)^i$. This value is maximal when $j$ is largest possible, ie when $j=n$.
In the last step, suppose we have $\{c_1,\ldots,c_{n-1}\}$ as the columns of the parity-check matrix $H$. Suppose $\sum_{i=0}^{d-2} {n-1 \choose i}(q-1)^i < q^{n-k}$. Then, there exists a vector $c_n$ in $\mathbb{F}_q^{n-k}$ which is not in $W$. Putting $c_n$ as the last column of $H$ ensures that the following condition $P$ continues to be satisfied: any $d-1$ columns of $H$ are linearly independent. This ensures that the minimum distance of the code is at least $d$.
We say at most $d$ rather than equal to $d$ because it is possible that a stronger condition than $P$ is satisfied by the code corresponding to the matrix $H$ we constructed - perhaps every $d$ columns of $H$ are linearly independent, in which case the minimum distance is at least $d+1$. Also, if the constructed matrix $H$ is full rank (ie has rank $n-k$), then the dimension of the code is $k$. But if $H$ is not full rank, then the dimension of the code will be larger than $k$.
In other words, have only proved that (if the bound is satisfied then) there exists an $(n-k)\times n$ matrix $H$ satisfying the condition that any $d-1$ columns are linearly independent. We have not proved that the value of $d-1$ is maximal with respect to the property of linear independence or that $H$ is full rank.