You can detect nonsensical sequences in generated text using NLTK's semantic tools like WordNet. By analyzing word relationships and ensuring semantic coherence, you can flag sequences with disconnected or contradictory meanings.
Here is the code showing how it is done:
In the above code, we are using the following steps:
- WordNet Synsets: Uses wordnet.synsets to fetch meanings of words.
- Wu-Palmer Similarity: Measures semantic similarity between consecutive words.
- Threshold: Flags text as nonsensical if similarity between word pairs is below a certain threshold (e.g., 0.1).
Hence, this is a basic heuristic; for more robust detection, you can combine this approach with contextual embeddings (e.g., BERT).