Note: your matrices are unitary, hence normal, i.e. $A^*A=AA^*$. Everything here is true more generally if $T$ is contained in the set of normal matrices. Except that the diagonal elements $\lambda_j$ in the second property, need not be in the unit circle in this case. Since these are the eigenvalues of $A$, they are obviously in the unit circle with the additional assumnption that $A$ is unitary. So I'll do it in the general normal case. Recall that a matrix is normal if and only if it is diagonalizable in an orthonormal basis. See here, in case.
First part: $S=T\cup T^*$ is a Schur set.
This follows trivially from the definition of $S$, since we added to $T$ set of its adjoints.
This relies on a general fact for operators on Hilbert space. It is called Fuglede's theorem: if $A,B$ are bounded linear operators on $H$, if $A$ is normal, and if $AB=BA$, then$A^*B=BA^*$. For matrices, it can alternatively be deduced from the following fact.
Remark: in your case, where matrices are unitary, this is actually much easier. Indeed $A^*=A^{-1}$ for every $A\in S$. And it straightforward to check that if $B$ commutes with $A$, it commutes with $A^{-1}$. SO if you are not really interested in the normal case, you can go to second part directly.
Fact: a matrix $A$ is normal if and only if $A^*=p(A)$ for some polynomial $p\in\mathbb{C}[X]$.
Proof of fact:
The implication $\Leftarrow$ is clear. For the converse, take a unitary matrix $U$ such that $A=UDU^*$ with $D$ diagonal, $D=\mbox{diag}\{\lambda_1,\ldots,\lambda_n\}$. Then $A^*=UD^*A^*$ and $p(A)=Up(D)U^*$ for every polynomial $p$. So it suffices to find $p$, such that $D^*=p(D)$, i.e. $p(\lambda_j)=\overline{\lambda_j}$ for every $j$. This can be done by Lagrange interpolation. QED.
Proof of 2:
Now if $A,B$ belong to $T$, we have $AB=BA$, whence $B^*A^*=(AB)^*=(BA)^*=A^*B^*$. And by the fact, $A^*=p(A)$, so $B$ commutes with $p(A)=A^*$ and, likewise, $A$ commutes with $B^*$. If follows that any two matrices in $S$ commute.
Second part: every matrix in $S$ is normal, hence diagonalizable in an orthonormal basis. The fact that $S$ is a Schur set guarantees (and is therefore equivalent) that they can be diagonalized simultaneously in the same orthonormal basis.
Proof:
We prove this by induction on $n$, the size of the matrices. For $n=1$, this is trivial. Assume this is true in any dimension $1\leq k\leq n-1$. If all the elements of $S$ are scalar, the result is trivial. So assume there exists $A_0\in S$ which is not scalar. It follows that it has an eigenvalue $\lambda_0$ such that $V_0:=\mbox{ker}(A_0-\lambda_0I)$ is non trivial, i.e. has dimension $0<d<n$. This yields a nontrivial orthogonal decomposition $\mathbb{C}^n=V_0\oplus V_0^\perp$.
The key observation comes now: this decomposition is invariant under every element of $S$.
Indeed, if $A$ belongs to $S$, it commutes with $A_0$, so it leaves invariant its eigenspaces, $V_0$ in particular: $AV_0\subseteq V_0$. But $A^*$ also commutes with $A_0$, so $A^*V_0\subseteq V_0$. Now the latter is equivalent, as usual, to $AV_0^\perp\subseteq V_0^\perp$.
Taking an orthonormal basis relative to this orthogonal decomposition, we get a unitary matrix $U$ such that
$$U^*AU=\pmatrix{B&0\\0&C}\qquad\forall A\in S.$$
Now observe that each diagonal block of $S$ induces a Schur set of smaller size. So it only remains to apply the induction hypothesis to both. QED.