I'm going to approach this from a less formal direction.
Suppose you have $t(x) = k * u(x)$. $t$ is a function that is another function, $u$, times a constant $k$. The slope of $t$ is then $k$ times the slope of $u$, right? Or, $t' = k * u'$.
If you have $h(x) = f(x)*g(x)$ what happens to the slope? Well, we can figure it out formally or we can tell ourselves a little story.
Imagine if $g(x)$ did not vary -- it was equal to a constant $k$. Then the slope would be the slope of $f(x)$ times the constant $k$.
Now imagine if $f(x)$ did not vary -- it was equal to a constant $m$. Then the slope would be the slope of $g(x)$ times the constant $m$.
Now they both vary. We end up with the rule $h'(x) = k*f'(x) + m*g'(x)$ where $m=f(x)$ and $k=g(x)$, because as you move a bit to the left of $x$ both components end up moving the result up, each by a different amount.
Once we replace $m$ and $k$ with $g(x)$ and $f(x)$ respectively:
$$h'(x) = g(x)f'(x) + g'(x)f(x)$$
which is the chain rule.
Intuitively, the rule is that the slope at that point is the sum of the slopes of each component. And each component gets its slope multiplied by the current magnitude of the other component.
In a real proof, there are second-order effects that the above glosses over. A formal proof might show that the second-order effects can be neglected (that they disappear in the limit).
Now let us apply this to $f(x) = x^2$. If $h(x) = x$, then $f(x) = h(x)*h(x)$.
We apply the chain rule: $f'(x) = h(x)h'(x) + h'(x)h(x)$.
If we know that $h'(x) = 1$ (because we know the function $y=x$ has a slope of 1), then we get:
$$f'(x) = x * 1 + 1 * x = 2x$$
We can back up to the chain rule, each of the two $x$s contribute a slope of 1, times the magnitude of the other component (which is $x$). These are then added together.
We can then look at either $x^3$ or $x^4$. For $x^4$ this is $x^2 * x^2$, so the derivative is $2x x^2 + 2x x^2 = 4x^3$. For $x^3$ this is $x * x^2$, so the derivative is $1 * x^2 + x * 2x = 3x^2$.
This pattern continues to infinity -- that the derivative of $x^n$ is $n x^{n-1}$. There are many ways to prove this, from combinatorial to calculus to induction on the chain rule.
The derivative is often defined as a limit. But the derivative is actually a way to approximate a function with a linear one around a particular point. The $f'(7) = k$ means that close enough to $7$, the function $g(x) = k (x-7)$ is a "good" approximation of $f(x)-f(7)$.
The formalism -- as a limit -- all happened long after people where working with derivatives in the real world. The formalism put it on solid enough ground that we can work out the places where the intuitive definition is probably going to fall apart, instead of blindly walking over a logical cliff in the middle of a mathematical argument, with little ability to work out where you went wrong.