You can transform text data for classification using TfidfVectorizer in Scikit-learn to convert text into numerical features based on term frequency-inverse document frequency.
Here is the code snippet you can refer to:
In the above code we are using the following key points:
- TfidfVectorizer() converts text data into a numerical matrix of TF-IDF features.
- fit_transform(texts) learns the vocabulary and transforms the text data into a feature matrix.
- Logistic regression is used to classify the transformed text data.
- accuracy_score() evaluates model performance on test data.
Hence, TfidfVectorizer effectively transforms text into meaningful numerical features, enabling traditional machine learning models to perform classification tasks on text data.