How we use machine learning and computer vision to recommend products

Have you ever stared for hours at a shop’s catalogue so big you couldn’t even reach the end of it? Have you ever thought “who buys this stuff???” when looking at glitter-gold boots?

Have you ever imagined being able to look through only a handful of things that you really like?

It happens to me all the time, with clothes. It happens to our consumers all the time, with bathroom furniture. We want to make their life easier.

Recommendation systems

Recommendation systems have been used for ages. Think Netflix, think Amazon, think almost every single website that sells things. There is even a scientific conference whose name is RecSys, which covers only papers in the field of recommendation systems. The idea behind recommendation systems is to understand what the user’s preferences are, and recommend something that is relevant. To do that, we need to have information on the user, and information on what the user likes (i.e., has bought or has rated).

It’s a very difficult problem in general, and it’s very difficult to evaluate a system that tries to solve it, as preferences are extremely personal. A few challenges researchers are very aware of are:

  • If someone hasn’t bought or rated something, that doesn’t mean that they don’t like it;
  • The number of items that are rated/have been bought by users is really really small with respect to the amount of items and users in the system. In RecSys terms, “the user-item matrix is sparse”;
  • If a user is new, we don’t have any way of knowing what they like (aka the cold start problem).

There are two main streams of recommendation systems: content-based and collaborative filtering. Let’s explain how they differ from each other using the old nice Netflix five star rating as an example.

Content-based

Alice rated “Crazy stupid love” with five stars (don’t even get me started). The recommendation system looks at the “content” of the product and recommends products which have similar content. “Crazy stupid love” is categorized as a romantic comedy and Ryan Gosling is starring in it. Alice is recommended “The Notebook”, which is a romantic comedy as well and Ryan Gosling is in it.

Collaborative filtering

Alice rated “Crazy stupid love” with five stars and “The Notebook”, which she has just watched and loved, with five stars. Beatrice, who is slightly more reasonable than Alice, has rated “Crazy stupid love” with four stars (still too high Beatrice!). She has also rated “The Notebook” with five stars and “Inception” with five stars. The recommendation system looks at Alice’s and Beatrice’s ratings and decides that they are similar. Alice has never watched “Inception” (really?), so the recommendation system recommends it to her. Alice surprisingly finds out that, even though it’s not her usual genre, she loves “Inception” (who doesn’t?).

Often content-based methods and collaborative filtering are used in conjunction; in fact, content-based methods are useful if we don’t have a lot of information on the users, whereas collaborative filtering is useful to recommend products that are not necessarily similar to what a user have liked in the past, but can still be relevant and offer a feeling of serendipity (like “Inception” for our friend Alice). In such case we talk about hybrid systems.

Let’s go back to bathrooms now.

When it comes to recommendations for bathroom refurbishment, we have to face one more challenge:

  • Unless she’s an interior designer, a user only refurbishes a bathroom once or twice in her life, likely buying products in different stores. This means that we don’t have a history of what the user likes.

At DigitalBridge, we try our best to recommend relevant products to our users, but if we don’t have any information on what the users like, how can we do this?

These sorts of problems have been studied as well, even though are less common than recommending movies. There are exponentially complicated things that can be done, but let’s start simple. In our situation, it is quite difficult to use collaborative filtering directly since the history of purchases is limited, so that leaves content-based methods.

What kind of content do we look at if we still don’t have a history of what the user likes? We have to be a little creative here. Well, we can ask. We ask the user what they like. Directly. Specifically, we ask them to give us an image of what they like. And then we use visual search.

How we do it

If a user is looking for a specific bathroom product, it’s likely that they’ve seen it somewhere. It’s also likely that they have a picture of it, if they’re really serious about refurbishing their bathroom. Therefore, we ask the user to upload an image of what they like, and we return items from our catalogue that are visually similar to the image that has been fed to us.

Our system is based on visual similarity, and works as follows:

Offline stage

During our offline stage, we get a deep network that has been trained on a classification task (better if it’s fine-tuned on bathroom images), we remove the head of the network and we compute, for each item in our catalogue, its embedding. We then save all the embeddings in a database that we can query easily.

Online stage

Every time that a new image is uploaded, we feed it to the same deep network used in the offline stage, and we look for the most similar embeddings in our database. We then return the three or four items that correspond to the most similar embeddings, and voila!

An overview of our system

Here are a couple of examples of what we recommend:

Results for a modern bath

In the first case, a modern bath is fed to our system. Our recommendation engine returns three baths which are very similar to the input image.

However, if a Victorian bath is uploaded, here is what happens:

Results for a Victorian bath

As you can see, the first two baths returned have very similar feet to the original bath. The third bath, even though modern looking, has a shape that is very similar to the input one, as well as a silver tap that has the exact same appearance as the tap in the original image.

Conclusion

In this blog post we have described one of the many ways we use computer vision and machine learning to recommend bathroom products. In our field, recommending products is very challenging because typically we don’t have access to the purchase history of customers to infer their preferences. Therefore, in this preliminary work, we simply ask our users to upload an image of what they like, and we recommend similar items. We are exploring a lot of other interesting methods to improve our recommendations. Stay tuned if you want to know more!

Follow us