Practice Exams:

DP-100 Microsoft Data Science – Recommendation System

  1. What is a Recommendation System?

Hello and welcome. Today we are going to learn about the Matchbox recommender from Azuramal. But before we get into solving recommendation problems using Azuramal, let’s first try to understand what is a recommendation system before we deep dive into Azuramal matchbox Recommender by end of this lecture, you will be familiar with what is a recommendation system. Different types of recommendation systems including collaborative filtering, contained based filtering, and their variations. We will also see how a recommendation system works using some common examples. So let’s get started with the definition of what is a recommended system.

A recommendation system or platform or engine tries to predict the rating or preference that a user would give to an item. For example, the products which are recommended to us when we shop online, people we may know, type of advertisements on LinkedIn, or matching hotels based on our preferences are all examples of recommendation systems. In fact, we all use the recommendation system in day to day life. Surprised? Well, on a daily basis, many people recommend us about shopping, which restaurants to go to, which movies to watch, which books to read, where to travel for that next vacation, and so on. We also seek advice from friends and family, colleagues, professors, and many people we know.

You will notice that we like almost all such items recommended by our near and dear ones. Sometimes we even take blind decisions based on recommendations by certain friends and family members. We are almost certain that they will almost always be true. So why does that happen? Well, over a period of time you have developed various features of your own, such as likes, dislikes, taste, style and so on. Your friends and family knows about this history or preferences from the past, which shirt you like the most, which movies you have loved, where you had the best holiday, and so on.

Similarly, every item has its own features. The human recommender in your friends and family actually matches these two before recommending any such product to you. The recommender system works in more or less the same manner. Let’s say you have two completely different individuals Lisa and John. Lisa likes pizza. Liza also likes wine as well as loves pastas.

John, on the other hand, also loves pizza, wine and pasta. Then you discover that Lisa likes a particular type of cold drink. Now, based on the previous history of tests and preferences, a recommender system will recommend the same to John as well. We will be more confident that John will like it due to similar test and preferences. All right, I hope that makes it clear on how the recommendation works. Let’s now go through different types of recommendation systems. They are broadly categorized as collaborative filtering, content based filtering, and hybrid, which is nothing but a combination of collaborative and contentbased systems. We also have a popularitybased recommendation which includes most bought items, most watched movies, most downloaded songs, or most heard songs.

And so on the popularity based recommendation systems are among the most simplistic forms of recommendation. All right? So let’s go through them one by one and try to understand what is collaborative and content based filtering. Well, collaborative filtering analyzes the user behavior, user activities and their preferences. And based on these or the similarity of recommended products to the other user, they recommend the products or services. It works on. The fundamental principle of people who agreed in the past, will agree in the future as well, and that they will like similar kind of items as they liked in the past. All right, the example we saw in the previous slides of people who buy X, also buy Y, is a type of collaborative filtering. Okay? The second type is the content based filtering and in the content based recommendation systems it recommends the items with similar content to the items the user has purchased or liked in the past. For example, if you have liked these three strategy books and a new book on the same topic is launched, the recommendation system based on the content filtering will most certainly recommend it to you.

Okay, so let’s try to solve a business problem using both these methods so that it will be more clear and we will also see how it gets implemented and the intuition behind it. Let’s say there is a travel booking website that books thousands of hotel rooms every day. And with such a huge growth, the task at hand is to improve the booking per customer by showing the best hotels as per the test and preferences of the user. So let’s try to understand the users and the hotels we are dealing with here. So we have these five hotels and each of them have different features.

We have selected only two for simplicity, purpose, but in reality you may have room sizes, recreational facilities, location, paid, friendly or not, and many such features that can be associated with a hotel. All right? So based on the information we have, we rank every feature and create an item feature metrics of hotel, gym and pool. Hotel one has good gym and a small pool. So we assign a numeric value of 0. 8 to gym and 0. 2 to the pool. For hotel one, we continue the same and build it for all the hotels. I want you to pause the video and take a look at various rankings and values we have provided here. All right? On the user side, we have these four users and their preferences. John prefers Jim over swimming pool, kevin likes Jim more than the pool and Bill needs a pool with basic gym, whereas France a pool is a must compared to gym. Okay? And if that sounds confusing, imagine the impossible task when we are dealing with millions of users and hundreds of thousands of hotels.

So what we do next is create a user feature metrics on similar lines as item feature metrics that we created for hotels. So we provide a value of 0. 9 for Jim and 0. 1 for pool for John and so on for other users. Again, I suggest pause the video and have a look at various values we have assigned based on the user preferences. All right. I hope you have understood how we have created the item, feature and user feature metrics. And the next step is to create a feature vector for every item as well as for every user. So a feature vector, h one for hotel one will be 0. 8, comma 0. 2 and so on for other hotels as well. All right. Similarly, we can also create a feature vector for users, as shown in this particular table. Pause the video and take a moment to understand how we have created these feature vectors for various features of hotels and users. Once we have the feature vectors, let’s see how the content based recommendation works.

Let’s say we want to predict which hotels to recommend to user one in the content. Best Recommendations we take a product of item feature vector and user feature vector, which is nothing but the product of these five vectors. And as we want to answer which hotel to recommend to John, we consider the maximum of the user and hotel feature vectors. All right? So what we do is multiply u one with h one, u one with h two, and so on, and get the maximum of these multiplications. It will provide us the array of values. And in case you are wondering how we did the multiplication, well, in case of vector multiplication, you multiply each feature with the corresponding feature of the other vector and sum them all up. I know it’s confusing, so let me explain it again. So in our case for u one multiplied by h one, we multiplied 0. 9 with 0. 8. That gave us zero point 72.

All right? And then we multiplied 0. 1 and 0. 2, that gave us 0. 2. The sum of these two, that is zero point 72 and 0. 2 is zero point 74. And we went on doing the same for all the other multiplications. So what should we recommend to user one now? So the first hotel we should recommend is h two, as it has the highest value of this particular product. The second would be h three and third as h one. I’m sure by now you have a fair idea of how the content based recommendation system works. Let’s now go through the collaborative filtering system. In case of collaborative filtering, a user expresses his or her preferences by rating the items. Ratings can be viewed as an approximate representation of the user’s interest in the corresponding domain. Then the system matches this user’s ratings against other users and finds the people with most similar tests and with similar users.

The system recommends the items that the similar users have rated highly but not yet being rated by this particular user. Presumably, the absence of rating is often considered as the unfamiliarity of an item. I know you must be confused. So what do we mean by all this verbose? Let’s understand and break it down with an example. So let’s say you have a set of users and hotels. As from the previous example, the user has expressed his or her preferences by giving ratings to the hotels. That’s the first step. So that represents the user’s interest for certain types of hotels. At this stage, we don’t even know what kind of hotel it is. We only know that this user has liked this item.

And this metrics might be intimidating first. So let’s break it down further. Let’s first highlight the user hotel sales, where we are trying to find answers. So we are trying to find out which hotel to recommend to these users. Next, let’s try to group those who have liked the same items. So, as you can see, both John and Kavin have liked Hotel Two and disliked Hotel Five. Similarly, both Bill and France have liked Hotel Five and disliked Hotel One. Now, based on this limited information, we can attempt to predict that cave in will like Hotel One. That’s because John, who has got similar taste, has liked Hotel One. Similarly, we can predict that John will like Hotel Three because Kevin has liked hotel Three. And we can also predict that France will like the hotel Four. All right, I hope that has made clear.

What is collaborative and content based filtering? That brings us to the end of this lecture on understanding the recommendation systems. In this lecture, we covered what is a recommendation system? Different types of recommendation systems, such as collaborative, content based, as well as the popularity based and hybrid systems, using an example of hotel recommendations based on hotel user features as well as user likes and dislikes for certain hotels. Those were the two examples that we have taken. Thank you so much for joining me in this one and I will see you in the next lecture.

  1. Data Preparation using Recommender Split

Hello and welcome to the Recommendation System section. In this section we are talking about the matchbox recommender. In the previous lecture we covered what is a recommendation system. We also saw what are the different types of recommendation systems including collaborative filtering and contentbased filtering. Using an example of hotel recommendations, we saw how a recommendation system works. In this lecture we will cover what are the various issues associated with the recommendation system and how the Recommender split is performed in Azuramal.

So let’s get started. Recommendation systems suffer from two major challenges. One is the Cold Start and second is the same scale judgment. We all know that content based filtering constructs user profile before system can recommend and collaborative filtering needs like minded users for recommendations. So, what should the recommendation system do if we encounter a completely new user or completely new product or item? During Cold Start, system is not able to recommend items to users and for every recommender system, it’s required to build User Profile by considering his or her preferences and likes. User Profile is developed by considering their activities and behaviors they perform with the system. On the basis of users previous history and activities, the system makes decisions and recommend items. Consequently, the problem arises when a new user or new item enters the system. For such user or items, systems don’t have enough information to make a decision. For example, a new user has not rated some items and not yet visited or viewed some items. Then it would be difficult for the system to build a model on that basis. Cold Start problem arises in three different situations.

One for new users, for new items, and for new community. That’s when we use the hybrid approach which helps us to some extent because you may not get all the new users and all the new items at the same time. So, depending upon whether the item is new or the user is new, a hybrid approach can help in giving recommendations. All right. However, during training we may want to ignore some of the cold users or cold items. We will see how to do that later in this section. Another problem that we may need to deal with is same scale judgment.

Now, what do we mean by that? It can arise due to two reasons first, because a user may rate all the items on the same scale and secondly, an item is rated on the same scale by all users and making it difficult to draw inferences out of the same. During training, we may want to remove part of such users or items. All right, let’s now see how the recommender split works. While we may not be able to immediately visualize it on large data sets, I have created the same on a subset of a large data set in Recommended split. Following parameters need to be considered fraction of training only users this parameter represents the fraction of users assigned only to the training data set. The rows would never be used to test the model.

Then the next one is fraction of test user ratings for training. It is the proportion of user ratings that can be used for training. Third one is fraction of Cold users. As we have seen in the previous slides, cold users are the users that the system has not previously encountered, typically because the system has no information on these users and they are valuable for training, but predictions might be less accurate on them. This parameter helps us get a fraction of those users as part of the training. Similarly, fraction of cold items are items that the system has not previously encountered and should be treated in the same manner as cold users. Fraction of ignored users and items are for specifying the percentage of users and items that should be ignored. All right, then the option remove occasionally produced cold items is typically set to zero.

This ensures that all entities in the test set are included in the training set. An item is set to be occasionally cold if it is covered only by the test set and it wasn’t explicitly chosen as cold. All right, to help you understand this better, I have done a recommended split on a small sample of data. In this experiment that I ran, I first extracted 100 observations and then applied a recommended split. The options I chose were 50% for training only users, 25% for test users, and also kept 20% of the users for ignored users.

And what we saw was these users were not part of the selection and some of them had given same ratings for the different restaurants. I have attached the Excel sheet as part of the coast material, and I suggest you go through the result of the same and analyze it further. It has got one set of original records and two sets of split one and split two. Specifying which records from original has gone to training or test data set. I hope for now it is sufficient to drive on the point of recommender split and how it is done. That concludes the lecture on recommender split. And in this lecture we learned about the data issues for recommendation system and also saw how the recommender split is performed using parameters of recommender split. In the next lecture, let’s learn specifically about Azur Anil Matchbox recommender. Thank you so much for joining me in this class and I will see you in the next one. Until then, enjoy your time.

  1. What is Matchbox Recommender and Train Matchbox Recommender

Hello and welcome to the Azure ML course. This is the section for Azur ML recommendation system in the previous lectures. In this section we learnt about what is a recommendation system, the type of recommendation systems such as collaborative filtering and content based filtering. We also learnt about how a recommendation system works using the hotel recommendations. In the subsequent lecture we also learnt about recommendation data issues such as cold start and same skill judgment. Finally, we also saw how the recommendation split is performed. In this lecture we are going to COVID what is the Azur Matchbox recommendation system, how to configure the Matchbox recommender in Azure ML and parameters to Azure ML train Matchbox Recommender? Well, we could have learned many of these things while doing the experiments, but I thought of including them here first because the topic is slightly different than the typical regression or classification problems and also because it requires quite a few details and slightly complex set of inputs and parameters.

We will anyway recap these terms during experiments when we perform a recommendation on the data set. All right, so what is Azur ML’s matchbox recommendation system. Well, it’s a system developed specifically by Microsoft research team and is based on the hybrid approach of contentbased and collaborative filtering approach. It is therefore considered as a hybrid recommender and when a user is relatively new to the system, predictions are performed or improved by making use of the feature information about the user, thus addressing the typical cold start problem that we discussed. However, once you have collected a sufficient number of ratings from a particular user, it is possible to make fully personalized predictions for them based on their specific ratings rather than on their features alone. Hence, there is a smooth transition from a contained based recommendation to recommendations based on collaborative filtering.

And even if user or item features are not available, matchbox will still work in its collaborative filtering mode. All right, the overall recommendation system in Azure ML requires Train Recommender and Score recommender modules. There is no separate untrained recommender module over here. And last but not the least, the Train Matchbox recommender reads a data set of User item rating triples and optionally some user and item features as well. So let’s see what we mean by that. The Train Matchbox Recommender, as we mentioned earlier, requires the first input as User item rating data for training. Okay, so the column it expects from this triple is first column should be a User Identifier, the second column should be Item Identifier. The third column is the rating for the user item pair and these ratings have to be either numeric or categorical variables. All right, the second input it requires is the User feature data set which will have the User IDs and User features.

This is the optional input, but if it is present then the first column should always be User ID and then it can have user features. Also, the third input is the item feature data set which will have item IDs and item features. This is also an optional input and similar to user feature data set. And even here the first column of the data set should be item ID and any number of features after that. All right, let’s now look at the parameters required for Train matchbox recommender. The first parameter it needs is number of traits. This parameter denotes the number of latent trades that should be learned for each user and item. The higher the number of threads, the more accurate our predictions will typically be. However, as we have always seen, in such cases the training will be slower and you should not specify this more than 20 and less than two.

All right. Also you should note that higher the number of traits, the training time will always be more. The second parameter is number of recommendation algorithm iterations. So it indicates how many times the algorithm should process the input data. The matchbox recommender is trained using a message passing algorithm that can iterate multiple times over the input data. Higher this number is you will have better and more accurate predictions. But as usual, the training time will be higher. I suggest keep this number between five to ten.

All right, let’s see. The last parameter that is required by this particular module is number of training batches, which is nothing but how many batches the training data should be divided into. By default, the training data will be split into four batches for now. That concludes this lecture on Train Matchbox recommender module. I hope you have now understood what is a Train Matchbox recommender module and how to configure it. We are anyway going to recap this when we do the experiment. So far in this lecture we saw what is Azur matchbox recommendation system how to configure matchbox recommender in Azure ML and we also saw parameters to Azure ML Train Matchbox Recommender. In the next lecture we will cover how to score the matchbox recommender and then moving on to doing an experiment on Azuremal Studio. Thank you so much for joining me and I will see you soon in the next one. Until then, have a great time.

  1. How to Score the Matchbox Recommender?

Hello and welcome to the Azur ML course. We are now learning about the Azur ML Matchbox recommender and in this section so far we have learned what is recommendation systems and what are the different types of it. We have also seen how the recommender split works and we also saw Train Matchbox recommender module. In this lecture we are going to COVID what is Score Matchbox recommender, what kind of input is required to score Matchbox recommender and we will also learn about different prediction types using Score Matchbox Recommender. All right, you can find the Score Matchbox recommender under the score or you can even search for it. As I told you in the previous sections, we are going to first learn about these various modules before we use them in the experiment. It will be easier to understand it once we have gone through them using these nodes. Great. So let’s see the inputs required for this particular module.

Well, it takes five different types of inputs. The first one is the trained Matchbox recommender, which will be the output of trained matchbox module. The second input is the data set to score or the test data set. Then it requires user features, item features and training data. All these three are optional. All right. The next are the type of recommendation predictions this module supports.

The Azure ML Matchbox recommender modules are very powerful and they support four different types of predictions that may be required. So let’s learn about these four types in detail. First is the rating prediction for a given user an item. The model calculates how a given user will react to a particular item given the training data. Therefore, the input data for scoring must provide both a user and the item to rate and it does not require any more parameters. This is relatively simple, so let’s move on to the next one. Item recommendation is one of the most used and powerful recommender options and when we use this option, the model uses its knowledge about existing items and users to generate a list of items that will most likely appeal to each user.

It’s but natural that it will expect the user and items as input. The parameters recommended item selection is used to indicate whether you are using the scoring module in production or for model evaluation only. If you specify from rated items for model evaluation it will run it for developing or evaluating a model, whereas the other options such as from all items can be used. If you are setting up an experiment to use in a web service or production then it also needs maximum number of items to recommend to a user. I think it’s self explanatory for minimum size of the recommendation pool per user we need to specify a value that indicates how many prior recommendations are required.

Value of two here means the item must have been recommended by at least two other users and it can only be used in evaluation mode for related users. The scored data set returned by the Score Matchbox Recommender lists the users who are related to each user in the input data set.

And by predicting related items, you can generate recommendations for users based on items that have already been rated. For the benefit of time, we are going to do the lab for item recommendations only, but I hope the information on other items is also useful. You can try various combinations of the same during your own practice. So in this lecture we learned about what is Score Matchbox Recommender, various input to Score Matchbox Recommender as well as for prediction types. In the next lecture, we are going to build a Matchbox Recommender using Azurml Studio. So log in to your Azure ML account and I will see you in the next lecture. Thank you so much for joining in this one.