In my Natural Language Processing class, we just talked about the Generalization of a Jacobian Matrix. So far I’ve been understanding the material, okay, but now I’m very confused.
I came across these slides Natural Language Processing with Deep Learning CS224N/Ling284, in the context of natural language processing, which talks about the Jacobian as a generalization of the gradient.
I'm not clear on when and why to perform a Generalization?
I know there is a lot of topic regarding this on the internet, and trust me, I've googled it. I got answers to generalizing the matrices and distributions but no proper explanation on "Generalization" and why it is so important?