A 3D rotation around an arbitrary point $(x_0, y_0, z_0)$ is described by
$$\left[ \begin{matrix} x^\prime \\ y^\prime \\ z^\prime \end{matrix} \right] =
\left[ \begin{matrix} X_x & Y_x & Z_x \\ X_y & Y_y & Z_y \\ X_z & Y_z & Z_z \end{matrix} \right] \left[ \begin{matrix} x - x_0 \\ y - y_0 \\ z - z_0 \end{matrix} \right] + \left[ \begin{matrix} x_0 \\ y_0 \\ z_0 \end{matrix} \right]$$
which, as OP noted, first subtracts the center point, rotates around the origin, then adds back the center point; equivalently written as
$$\vec{p}^\prime = \mathbf{R} \left( \vec{p} - \vec{p}_0 \right) + \vec{p}_0$$
We can combine the two translations, saving three subtractions per point – not much, but might help in a computer program. This is because
$$\vec{p}^\prime = \mathbf{R} \vec{p} + \left( \vec{p}_0 - \mathbf{R} \vec{p}_0 \right)$$
In other words, you can use the simple form, either
$$\left[ \begin{matrix} x^\prime \\ y^\prime \\ z^\prime \end{matrix} \right] =
\left[ \begin{matrix} X_x & Y_x & Z_x \\ X_y & Y_y & Z_y \\ X_z & Y_z & Z_z \end{matrix} \right] \left[ \begin{matrix} x \\ y \\ z \end{matrix} \right] + \left[ \begin{matrix} T_x \\ T_y \\ T_z \end{matrix} \right]$$
or, equivalently,
$$\left[ \begin{matrix} x^\prime \\ y^\prime \\ z^\prime \\ 1 \end{matrix} \right] = \left[ \begin{matrix} X_x & Y_x & Z_x & T_x \\ X_y & Y_y & Z_y & T_y \\ X_z & Y_z & Z_z & T_z \\ 0 & 0 & 0 & 1 \end{matrix} \right] \left[ \begin{matrix} x \\ y \\ z \\ 1 \end{matrix} \right]$$
where
$$\left[ \begin{matrix} T_x \\ T_y \\ T_z \end{matrix} \right] =
\left[ \begin{matrix} x_0 \\ y_0 \\ z_0 \end{matrix} \right] -
\left[ \begin{matrix} X_x & Y_x & Z_x \\ X_y & Y_y & Z_y \\ X_z & Y_z & Z_z \end{matrix} \right] \left[ \begin{matrix} x_0 \\ y_0 \\ z_0 \end{matrix} \right]$$
to apply a rotation $\mathbf{R}$ around a centerpoint $(x_0, y_0, z_0)$.
The 4×4 matrix form is particularly useful if you use SIMD, like SSE or AVX, with four-component vectors; that's one reason why many 3D libraries use it. Another is that the same form can be used for projection.
For the time being I have to implement every basic arithmetic operation for the linear algebra, and the bulk of the work is in calculating the rotation matrix itself. I should have been clearer in the questions, but I was also hoping for a method that required fewer arithmetic operations than multiplying 5 3x3 matrices. Any hope for that?
– K0ICHI Jun 11 '20 at 13:43