You can set up an attention visualization tool in code to interpret and debug transformer model outputs by referring to a short example below using PyTorch and Matplotlib to visualize attention weights from a transformer model like BERT:
The code above plots the attention map for Layer 1, Head 1. You can iterate through layers/heads for deeper analysis.
Hence, by referring to the above, you can set up an attention visualization tool in code to interpret and debug transformer model outputs.