Most efficient techniques to cache or pre-compute frequently generated response are as follows:
- Response Caching
- Memoization
- Embeddings Caching
- Indexing
- Pre-Training with Fixed Responses
Note that these techniques will help in reducing model load and improving efficiency also.