ylliX - Online Advertising Network
Vertex AI - Antrophic and Mistral models: Why does it require Imegen access?

Cosine similarity between three text files


I have three .txt files that contain text (they are novels). I need to compute the cosine similarity between the three texts and then produce a multi-dimensional graph that places the 3 texts in relation to each other based on cosine sim scores. The final output should be the graph. This is what my text looks like. I got to the point of getting the tokens, but not as far as the cosine scores. Once I run

from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer()

dtm = vectorizer.fit_transform(tokens)
vocab = vectorizer.get_feature_names_out()
`matrix = dtm.toarray()

my kernel restars and I have to run everything again. I would appreciate a thorough explanation from the top because I feel maybe I didn’t start out right.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *