To perform knowledge distillation for compressing a generative model, you train a smaller student model to mimic the outputs (e.g., logits, features) of a larger teacher model while using a combination of task-specific and distillation losses. Here is the code you can refer to:
In the above code, we are using the following:
By referring to above you can use knowledge distillation for model compression in a Generative model without losing performance