Let $p$ be a degree $m\in \mathbb N$ polynomial and WLOG assume
$1=p'(1)=\max_{z\in S^1}\big\vert p'(z)\big\vert\leq \max_{z\in S^1} m \cdot \big\vert p (z)\big\vert =1$
i.e. Bernstein's Inequality has been proven in general and we are going to investigate the equality conditions in a particularly simple form.
justification: if we had some other equality case, we could instead run the argument on $r(z):=\lambda \cdot p\big(\alpha \cdot z\big)$, for $\lambda \in \mathbb C-\big\{0\big\}$ and $\alpha \in S^1$ such that $r'(1)=1=\max_{z\in S^1} m \cdot \big\vert r (z)\big\vert $; the below would then prove that $r(z)$ must be a monomial which implies $p(z)$ is as well. Thus it suffices to assume WLOG that line 2 is true.
remark:
The below linked inequality combined with Gauss-Lucas gives the key geometric insight behind this problem. One could also combine them to verify the Bernstein equality conditions by writing $\frac{m\cdot p(z)}{z^m} = 1 +\big[H(z)\big]^k$ for $z\in B(1,\delta)$ where $H$ is univalent and investigate the (lack of) angle preserving properties of $H$, though for purposes of making the argument exact, I find it preferable to use winding numbers (as defined in Beardon's Complex Analysis or e.g. chapter 3 of Fulton's Algebraic Topology) .
proof:
When $m=1$, equality in Bernstein trivially implies $p(z) = z^m=z$, e.g. by inspection.
For $m\geq 2$: Suppose for contradiction that the polynomial $P(z):=\big(m\cdot p(z)-z^m\big)$ is not the zero polynomial.
(i.) $\text{local degree (k) of P must be }\geq 2$
Now $P'(1)=0$ and by Gauss-Lucas the roots of $P'(z)$ are in the convex hull of the roots of $P(z)$ but we also know for any $\delta\gt 0$ that $P(z)$ has exactly $m$ roots in the ball $B\big(0,1+\delta\big)$ per Rouche since $\big\vert z \big\vert \gt 1\implies \vert z\vert^m \gt m\cdot \vert p(z)\vert$. Combining these two points: all the roots of $P(z)$ are in the closed unit disc and $1$ is in the convex hull of those roots so $1$ must be a root for $P$ as well, i.e. $P(1)=0=P'(1)$. The Argument Principle tells us that the local degree of $P(z)$-- i.e. the winding number $n\Big(P\circ \gamma,0\Big)$ as defined below in (ii.)-- is $k\geq 2$.
Ref Suppose $P(z) = a_0+a_1z + \dots + a_nz^n$ is bounded by 1 for $|z|\leq 1$. Show that $|P(z)|\leq |z^n|$ for all $|z^n|\geq 1$ ; note the inequality must be strict as $m\cdot p(z)\propto z^m$ iff $m\cdot p(z)=z^m$ since $p'(1)=1$ but this cannot occur since we have assumed $P(z)$ is not the zero polynomial.
(ii.) $\text{local degree (k) of P must be } \lt 2$
This relies on a lemma for estimating the winding number of a non-closed curve, whose proof has been deferred to the end.
Define $\gamma: \big[\frac{-1}{4},\frac{3}{4} \big]\longrightarrow \mathbb C$ given by $\gamma(t):=1+ r\cdot\exp\big(2\pi i \cdot t\big)$ for $r\gt 0$ small enough. Application of the Argument Principle tells us
$k =n\Big(P\circ \gamma,0\Big)$
$=n\Big(\big(z^m\cdot (\frac{m\cdot p(z)}{z^m} - 1)\big)\circ \gamma,0\Big)=n\Big(\big(z^m\big)\circ \gamma,0\Big)+n\Big(\big(\frac{m\cdot p(z)}{z^m} - 1\big)\circ \gamma,0\Big)$
$=0+n\Big(\frac{m\cdot p(z)}{z^m}\circ \gamma,1\Big)$
$h(z):= \frac{m\cdot p(z)}{z^m} =1 + (z-1)^k\cdot g(z)$ and $q(z) :=1 + (z-1)^k\cdot \lambda$
where $g(z)$ is an analytic function that is locally non-zero and $g(1)=\lambda \in \mathbb C-\big\{0\big\}$.
Now with $a:=\frac{-1}{4}$ and $b:=\frac{1}{4}$ we may apply the below Key Lemma with $\gamma_1(t):=h\circ \gamma_{\big\vert [a,b]}$ and $\sigma_1(t):=q\circ \gamma_{\big\vert [a,b]}$.
In words: we restrict $\gamma$ to the counter clockwise rotation from $1-r\cdot i$ to $1+r\cdot i$, which lies outside $S^1$, and compose with $h$ and $q$ respectively. To be clear, with $w:=1$, for any $\epsilon \in (0,1)$ continuity of $g$ tells us there is a $\delta \gt 0$ such that for all $r\in(0,\delta)$ we have, for arbitrary $t\in [a,b]$
$\Big \vert \gamma_1(t)-\sigma_1(t)\Big \vert =\Big \vert 1 + (z-1)^k\cdot g(z)-\big(1 + (z-1)^k\cdot \lambda\big)\Big \vert = \Big\vert z-1\Big\vert^k\cdot \Big \vert g(z) -\lambda\Big \vert= r^k\cdot \Big \vert g(z) -\lambda\Big \vert$
$\lt \epsilon \cdot r^k\cdot \Big \vert \lambda\Big \vert = \epsilon \cdot \Big \vert 1 + (z-1)^k\cdot \lambda -1 \Big \vert= \epsilon \cdot \Big \vert \sigma_1(t)-w\big \vert $
And selecting $\epsilon:=\frac{\pi}{4+\pi}$ gives
$ \frac{k}{2}-\Big \vert n\big(\gamma_1, 1\big)\Big \vert$
$ \leq \Big \vert n\big(\gamma_1, 1\big)-\frac{k}{2}\Big \vert= \Big \vert n\big(\gamma_1, 1\big)-n\big(\sigma_1,1\big)\Big \vert = \Big \vert n\big(\gamma_1, w\big)-n\big(\sigma_1,w\big)\Big \vert $
$\lt \frac{ \left(\frac{\pi}{4+\pi}\right)}{\pi\cdot\left(1-(\frac{\pi}{4+\pi})\right)}=\frac{1}{4}$
$\implies \frac{k}{2}\lt \Big \vert n\big(\gamma_1, 1\big)\Big \vert + \frac{1}{4} \lt \frac{1}{2} + \frac{1}{4} \lt 1\implies k \lt 2$
where the key step is that $\big \vert n\big(\gamma_1, 1\big)\big \vert \lt \frac{1}{2}$ because the link in (i.) tells us that the curve is entirely contained in the unit disc; equivalently, after translation by $1$, the curve is contained in the (open) left half plane hence it winds around zero less than $\frac{1}{2}$ of a time. This follows from the definition of the winding number but is useful enough that it is stated explicitly under $I5$ in Chapter 7 of Beardon's Complex Analysis.
After combining (i.) and (ii.) we conclude $2\leq k\lt 2$ which is a contradiction.
$\implies P(z) \text{ is the zero polynomial}$
Key Lemma
Consider the not necessarily closed curves
let $\gamma_1, \sigma_1: [a,b]\longrightarrow \mathbb C$, where $-1\leq a\leq b\leq 1$, each of the form
$\gamma_1(t) = f\Big(c + r\cdot\exp\big(2\pi i \cdot t\big)\Big)$ and $\sigma_1(t) = g\Big(c + r\cdot\exp\big(2\pi i \cdot t\big)\Big)$
and further suppose we have the relationship that for any $\epsilon \in (0,1)$, there exists some $\delta \gt 0$ such that we have
$\big \vert \gamma_1(t)-\sigma_1(t)\big \vert \lt \epsilon \cdot \big \vert \sigma_1(t)-w\big \vert$
for all $t\in [a,b]$, for any $r \in (0,\delta)$
If the above conditions hold, then we may conclude
$\big \vert n\big(\gamma_1, w\big)-n\big(\sigma_1,w\big)\big \vert \lt \frac{ \epsilon}{\pi \cdot(1-\epsilon)}$
This is essentially a continuity argument that makes the following somewhat obvious point: if a criterion of closeness is met between the two curves, then the associated winding numbers must be extremely close. The below mimics Beardon's analytic proof of Dog-on-a-Leash Lemma ($I6$ in chp 7 of Complex Analysis), to extend the result to consider non-closed curves under the above conditions.
proof:
$\Gamma(t) := \frac{\gamma_1(t)-\sigma_1(t)}{\sigma_1(t)-w}\implies \big \vert \Gamma(t)\big \vert=\big \vert\frac{\gamma_1(t)-\sigma_1(t)-\gamma_1(t)}{\sigma_1(t)-w}\big \vert\lt \epsilon$
Thus all points of $\Gamma(t) +1$ lie within an epsilon radius of $1$ hence
$\big \vert n\big(\Gamma(t)+1,0\big)\big \vert=\frac{1}{2\pi}\big \vert \text{Arg}_{-\pi}\big(\Gamma(b)+1\big)-\text{Arg}_{-\pi}\big(\Gamma(a)+1\big)\big \vert \leq \frac{1}{2\pi}\Big(\big \vert \text{Arg}_{-\pi}\big(\Gamma(b)+1\big)\big \vert + \big \vert\text{Arg}_{-\pi}\big(\Gamma(a)+1\big)\big \vert\Big) \lt\frac{1}{2\pi}\frac{2 \epsilon}{1-\epsilon}=\frac{ \epsilon}{\pi \cdot(1-\epsilon)}$
where the upper bound comes from routine bounding of the standard logarithm associated with angles $\in (-\pi, \pi)$ (see blurred out section at end).
Finally, the definition of $\Gamma(t)$ implies
$\gamma_1(t)-w = \big(\Gamma(t)+1\big)\cdot\big(\sigma_1(t)-w\big)$
$\implies n\Big(\gamma_1,w\Big)= n\Big(\gamma_1-w,0\Big) = n\Big(\big(\Gamma+1\big)\cdot\big(\sigma_1-w\big),0\Big)= n\Big(\Gamma+1,0\Big)+n\Big(\sigma_1,w\Big)$
$\implies \left \vert n\Big(\sigma_1,w\Big)-n\Big(\gamma_1,w\Big)\right \vert= \left \vert n\Big(\Gamma_1+1,0\Big)\right \vert \lt \frac{ \epsilon}{\pi \cdot(1-\epsilon)}$
To be explicit: for $z \in \mathbb C-L_{-\pi}$ (i.e. in the standard slit plane)
$\big\vert \text{Arg}_{-\pi}(z)\big \vert \leq \sqrt{\big\vert \ln(\vert z \vert)\big \vert^2+\big\vert \text{Arg}_{-\pi}(z)\big \vert^2} = \big\vert\text{Log}_{-\pi}(z)\big\vert$
and in particular since $\Gamma(t)+1 \in B\big(1,\epsilon\big)$ for any selection of $t$ write $\Gamma(t)+1=1-w$ with $\text{Log}(1-w) = -\sum_{j=1}^\infty \frac{w^j}{j}$
$\big\vert \text{Arg}_{-\pi}\big(\Gamma(t)+1\big)\big \vert=\big\vert \text{Arg}_{-\pi}(1-w)\big \vert\leq \big\vert\text{Log}_{-\pi}(1-w)\big\vert =\vert \sum_{j=1}^\infty \frac{w^j}{j}\big\vert\leq \sum_{j=1}^\infty \frac{\vert w\vert^j}{j}\lt \sum_{j=1}^\infty \epsilon^j=\frac{\epsilon}{1-\epsilon}$