Your intuition is correct; the language
$$
L=\{\mathtt{0}^n\mathtt{1}^n\mathtt{0}^n\mathtt{1}^n\mid n\ge 0\}
$$
is not context-free. Your informal argument—that we could use the stack to match up the first groups of $\mathtt{0}$s and $\mathtt{1}$s but then we've lost track of how many there were—is good, but of course isn't a proof.
For a proof, let's take a nearly-identical language
$$
L_1=\{\mathtt{a}^n\mathtt{b}^n\mathtt{c}^n\mathtt{d}^n\mid n\ge 0\}
$$
where $\mathtt{a}\ne \mathtt{b}, \mathtt{b}\ne \mathtt{c}, \mathtt{c}\ne \mathtt{d}$. If we could show this isn't a CFL then we could conclude that the original language, $L$, wasn't a CFL.
The next step depends, as dtldarek noted, upon whether or not you know that
$$
L_2=\{\mathtt{a}^n\mathtt{b}^n\mathtt{c}^n\mid n\ge 0\}
$$
isn't a CFL. If you know that fact, you can pass to the next step. If you don't, then you can ask about a proof using the Pumping Lemma for context-free languages, and I'll be happy to expand my answer. For now, let's simply accept that $L_2$ isn't context-free.
Now assume, to the contrary, that $L_1$ was a CFL. Consider the language
$$
R=\{\mathtt{a}, \mathtt{b},\mathtt{c}\}^*
$$
namely the set of all strings over the alphabet $\{\mathtt{a}, \mathtt{b}, \mathtt{c}\}$. This is clearly a regular language. We know that the intersection of two CFLs isn't necessarily a CFL, but the intersection of a CFL and a regular language will always be a CFL. Keep this in mind and note that
$$
L_2=L_1\cap R
$$
and if $L_1$ were a CFL, then we'd have that $L_2$, being the intersection of a CFL and a regular language, would have to be a CFL, which it isn't. Thus, we conclude that $L_1$ isn't a CFL and so your original language, $L$, isn't either.