I have three .txt files that contain text (they are novels). I need to compute the cosine similarity between the three texts and then produce a multi-dimensional graph that places the 3 texts in relation to each other based on cosine sim scores. The final output should be the graph. This is what my text looks like. I got to the point of getting the tokens, but not as far as the cosine scores. Once I run
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
dtm = vectorizer.fit_transform(tokens)
vocab = vectorizer.get_feature_names_out()
`matrix = dtm.toarray()
my kernel restars and I have to run everything again. I would appreciate a thorough explanation from the top because I feel maybe I didn’t start out right.