I recently read the paper Nonparametric Variational Auto-encoders for Hierarchical Representation Learning
And I am confused about how the nCRP prior is ever conditioned on the input data sequence. All the nCRP ever sees is the latent Zmn output of the VAE encoder, is this because during training the nCRP will ultimately output the $z_{mn}$ after receiving the conditioned VAE encoder output?
Note: By conditioned I mean during training the $q(z|x_i)$ term is a specific instance of $z$ for a specific $x_i$