Ask HN: Has anyone built a recommendation engine in-house?

By theknight - 3 days ago

Showing first level comment(s)

I wrote the recommendation system at Netflix (still in use after 5 years). Primary problem was company politics. Many groups were not happy that one person could write a system that was better in A/B test, had more uptime and cheaper to run. All of it (ML, production, monitoring), was custom code.

sadikkapadia1 - 3 days ago

There are basically 3 types of recommender engines:

Content Based: If you can represent your products as a vector, you can have a distance between each product, then you have a item-item recommendation. You can use all kinds of embedding to achieve this results, some techniques that we tried are word2vec embedding of user navigation, auto encoding of features using neural networks, dimensionality reduction with PCA, ALS, etc. There are lots of libs for solving these problems as is a very studied field, usually numpy and for finding the neighbors we use ann from scikitlearn, because if you have millions of items, you cant just find the distance between all the pairs.

Collaborative - Filtering, here you use the pairs of behavior of the users, <user, item, ranking>. There is a surprise lib in python that works well, you have the MlLib from Spark too, this techniques are called matrix factorization techniques, and also gives you a embedding of the item or the user, and you can apply the techniques of content based to find user-user and item-item recommendations along the user-item recommendations

Hybrid Models: These are the models that use behavior and features of the user an items, LightFM is a good lib that works well, but you can model it with other tools like neural networks ( https://ai.google/research/pubs/pub45530 ).

The challenges are depending on the company, its not the same to recommended small amount of items to large number of users than large number of items to small number users.

There is a whole specialization in coursera that is really good https://www.coursera.org/specializations/recommender-systems

chudi - 3 days ago

I've been part of two efforts, one at a very large company, one at a startup. From where I stand, your biggest issue is going to be getting a sufficient data set, and having sufficient traffic in whatever you're recommending to be able to actually test your models.

Technical aspects in how you train your models and such are fun, but way, way down the list of things that are likely to matter in the short to medium term. Like, data scientists are nice to have, but you're not really going to be able to fully utilize them until you have the capability to build, deploy, and test a model at scale. If going third party helps you do this, you probably should.

splonk - 3 days ago

I built the one at Theneeds.com and, if you're interested, this is the one at Pinterest [1].

At Theneeds we were recommending news = fresh content based on user's interest and other features. Because the content is fresh, you can't easily have enough data for a proper collaborative system.

Our algo was essentially the Reddit algo, where a piece of content gets a rank based on time and log of score. Score in Reddit is the upvotes - downvotes. At Theneeds we had a more complex score including social signals (likes on fb / RT on tw) so we could compute a meaningful score also without a big community of users. The other difference wrt Reddit was having different scores and different paces (multipliers) based on categories of content, so for example news in tech and politics from newspapers were updating faster than news on travel from magazines. And by normalizing the ranks, you can merge multiple categories in one -- a feature that I think Reddit also added.

As for the code/stack, custom written in python. We were using Redis to cache user timelines using sorted sets (including the guest users, i.e. the default top news for each category). In Redis, you can merge sorted sets, and we used it as an efficient way to create the new timeline when a new user was signing up.

[1] https://medium.com/@Pinterest_Engineering/introducing-pixie-...

Edit: added more details about tech.

ecesena - 3 days ago

I've built a recommender system for my movie database app "Coollector Movie Database". It's based on Collaborative filtering and it took me 2 years to implement. I built it from scratch and it's unique in several ways (for example, you can view the reliability of each recommendation). The technical difficulty is to crunch fast enough a huge quantity of data. I've had to apply all the optimizations that I could think of.

https://www.coollector.com/help.html#recommendations

prades - 3 days ago

lightFM is my goto for prototyping Matrix Factorization models. It efficiently handles large data w/ sparse data structures and is CPU accelerated including optimizations like Hogwild!. It also has the WARP loss BPR variant which I have not seen implemented anywhere else.

I can train on multi-GB datasets w/ only lightFM and multiple CPUs.

Another interesting package is called Implicit. This package, although not as complete as LightFM when it comes to algorithms or APIs, really shines when it comes down to optimizations. Including native Cuda kernels for BPR and ALS it also has an important speedup called the Conjugate Gradient Method which makes it faster than spark in some benchmarks.

But usually, now-a-days my work requires more customized hybrid models of which I usually start w/ a base BPR implementation I have in Keras.

eggie5 - 21 hours ago

Have you checked Apache Mahout recommendation framework ( https://mahout.apache.org/docs/latest/algorithms/recommender... )? For 'small data' it can be used as Java library (algorithms for single-machine); if you prefer .NET C# port is also present: https://github.com/nreco/recommender . If you're new to collaborative filtering 'Apache Mahout in Action' book will help a much.

seektable - 2 days ago

I was in the same situation and learned about recommender systems by implementing a simple movie recommendation system in JavaScript. If you are interested, you can find the source code over here: https://github.com/javascript-machine-learning/movielens-rec...

rwieruch - 2 days ago

Yes, I have implemented a few content-based recommendation engines (referring to Chudi's taxonomy). The biggest existential threat is dealing with colleagues that want to question your work for not using f^xyz method that they have heard about. Having a straightforward evaluation framework in place to evaluate your results will go a long way towards ensuring the adoption and longevity of what you create.

I grow my own analysis code but use search APIs for storage and access (Lucene or Algolia)

itronitron - 2 days ago

I built a document recommendation project as part of a course, wrote it in python using the term-frequency inverse document frequency (TF-IDF) formula. It's actually a pretty straight forward method for recommending similar documents based on content.

https://github.com/ElementalWarrior/LearningAnalytics

Topgamer7 - 3 days ago