How to separate the wheat from the chaff?
Imagine you have 5000 kinds of treats every day and you have to give them to more than half a million people, with each of them having their unique taste. How do you do it?
This is the scope of challenge we face here at daily.dev when trying to bring the most relevant posts to each user. There are a myriad of ways to build recommender systems, from simplistic statistical models to more advanced deep learning algorithms, but they tend to be divided into three distinct categories – collaborative filtering, content-based filtering and hybrid recommendation systems.
Collaborative filtering systems rely on the similarity between user preferences, given by explicit (such as ratings or reviews) or implicit (clicks, upvotes, shares) signals, to recommend new items.
Content-based filtering systems, on the other hand, assume that a user’s preference for a given item means that they will also find other similar items relevant.
Hybrid approaches combine both collaborative and content-based filtering, as well as other approaches, and have been found to perform better than the two previous “pure” approaches.
Why Project Sauron?
Why Project Sauron? To ensure the best user experience here at daily.dev, we need to find the best ways to bring the most relevant content to each user. During the past year, we relied on two different models working together to bring personalized recommendations to “My Feed”:
- a statistical model that would use the user-selected tags and the post tags, as well as the users’ preferences given by implicit signals – clicks, upvotes, shares, etc – to recommend new items to the users
- a vector-similarity model which would recommend new posts that other similar users interacted with. The similarity between users was calculated in real time using user embeddings based on reading history. These embeddings were initially build based on tag preferences, but were later replaced by aggregating the embeddings of post textual features – post title, summary and tags – from the user history.
While we had success with the vector-similarity model – a mainly collaborative-filtering system, we found the personalized recommender system based on the statistical model to be lacking.
Enter two-tower retrieval models, aka, Project Sauron.
Two-Tower Retrieval Models
Two-tower retrieval models are an evolution of the classic matrix factorization models. Matrix factorization retrieval will learn an embedding representation of each query and candidate – user and post, respectively, in daily.dev’s use case – in a shared embedding space. The hypothesis behind the algorithm is that there is a user embedding and post embedding matrix pair whose dot-product results in the user-post interaction matrix – in practice, the algorithm will learn the user and post embedding matrices allow us to get the best approximation of the interaction matrix.
Two-tower retrieval models use deep learning to learn these embedding representations, using connected network layers to process multi-modal features – numerical, textual and even images and audio – from both users and the posts and learning more complex relationships between them. The model architecture consists of two separate networks, called towers – the user tower and the post tower. Each tower processes user and post features to create the embedding representation of those features. During training, the model calculates the distance between the embeddings generated by the user and post towers using their dot product, generating an approximation of the user-post interaction matrix. The difference between the calculated and the actual interaction matrices constitutes the error, or loss, that the user and post towers will use to tweak the network layers and ideally improve the embeddings.
Besides the improved potential accuracy and the ability to capture complex relationships between the post and user features, two-tower retrieval models offer interesting performance benefits when serving. Since the user and post towers are separate from each other, we can precompute the embeddings and put them in a low-latency vector similarity index – we use Redis as our vector search database – and use the approximate nearest neighbors (ANN) algorithm to generate thousands of post recommendations for each user. Furthermore, if the user embedding has to be generated in real time – when using features related to the current user session, for example – we can easily deploy the user tower to an inference endpoint.
New recommender system: Sauron
In an obvious reference to the Two Towers in Lord of the Rings and the all-seeing eye, Project Sauron started as our attempt at learning daily.dev users’ preferences and recommending likely relevant posts on “My Feed”, using the power of two-tower retrieval models.
Despite the intimidating name, the current model uses a very simple feature set, while achieving great results. The current model as of the writing of this post uses the following post features:
- post title
- summary/tl;dr
- post tags
- post source and author
- post type (article, video, etc)
- content type (news, release, opinion piece, etc)
- post age in days
- post language
as a mixture of both textual and numerical features. Textual features are usually transformed into text embeddings – a vector representation for each word or document – except for the post language, as we found it to achieve better results when one-hot encoded.
As for user features we use… the user ID, i.e, an internal hash generated randomly when the user first registers at daily.dev. While other features, such as the users’ country, selected and blocked tags or even seniority level can be used, we achieved very good accuracy using only the user ID.
The model is implemented using tensorflow and the retrieval model from the tensorflow-recommenders package. We encourage you to explore these packages and try the tutorials. Training models and seeing the metrics improve as we tweak them can be a very satisfying exercise.
As mentioned before, the Sauron’s recommendations are generated using a vector search index in Redis with an ANN algorithm, achieving approximately 100 ms latency for X recommended posts per user.
The impact of Project Sauron couldn’t have left us more satisfied. When A/B tested in “My Feed” against the statistical personalized model, we saw improvements across all our testing metrics – 5% increase in significant users interacting with the feed, 13% uplift in significant events, 11% in unique posts clicked, 5% in upvotes and 44% in unique bookmarked posts. Overall, the Sauron recommender model greatly improved “My Feed” as a tool in service of developers.
With the momentum of these results, we decided to use this model to generate the recommendations for the Digest emails we periodically send to every user that opted to receive notifications. Again, we saw a big improvement in our test metrics, with a 2% increase in clicks for inactive users, and 17% and 11% increase in clicks for old and new registered users, respectively.
What’s next for the Sauron recommender?
We are continuously working on improving the model by adding new features, tweaking the network architecture and finding the best hyperparameters. We are currently testing using two-tower models to solve the user-coldstart issue, when freshly registered users, for whom we don’t have historical data, start exploring “My Feed”.
Another point of research, and a concern raised by some users and our data scientists, is maintaining feed diversity and avoiding echo chambers. While the social impact of echo chambers at daily.dev is not as big as in common social networks, maintaining feed diversity is a concern of ours, as we believe having a broader knowledge is increasingly more useful for a developer to excel in their role, but also increasingly more valuable in the job market.
Project Sauron has been a success, not only for the growth of daily.dev, but also by making daily.dev a more useful tool for our users and their own personal growth.