Models
Polar Word Embeddings
The Word2Vec model is trained on 10 million Thomson Reuters news articles (2.5 billion words) covering the period from 1996 to 2017. To train the model we use the Python Gensim library. These embeddings are further enhanced by incorporating a measure of sentiment (we use the feedback of the stock market), which allows a differentiation between positive and negative words. Details can be found in the research paper (Section 5.2.4 Polar Word Embeddings).
Download the polar Word2Vec Model
Phrases Model for Bigrams
We trained a Gensim Phrases model on the Thomson Reuters news data from 1996 to 2017. Apply this model on your text data to identify the same bigrams as present in the GTM topics.