You can evaluate the quality of generated outputs by following techniques:
- BLUE Metric Score[14] : Used for code Generated outputs
- ROGUE Score : Used to evaluate quality of text summarizer generated.
Here is the code reference:
Note that you can use ROUGE Score , perplexity and human aligned metrics like coherence , sentiments or relevance to content.
Hence by following these techniques you can you can evaluate the quality of generated output.