Practice Exams:

DP-203 Data Engineering on Microsoft Azure – Design and Develop Data Processing – Azure Event Hubs and Stream Analytics part 4

  1. Lab – Adding multiple outputs

Right? So in the last chapter I had shown you an example of how to use the tumbling window of endowing function to get the summary results. But this was just in the query, in the test query, we didn’t actually run the job. Now, the reason for this is I want to go on to my next script. Now here I want to show how to add multiple outputs. This is also possible. So here, this is the same, I am selecting this. This is also the same, the tumbling window. At the same time, I can also make sure that I take all of my data right from my Event Hub and then place it in my dedicate SQL pool. So, this is something that we can do. Now here we do need to define another input that again takes data from Event Hub. Remember, we can have multiple receivers reading data from our Event Hub. So, let me copy this. First I’ll go on to the query, right? And now here, first of all, it should be into DB log.

That’s what we’ve defined as our data when it comes to the database logs. So here if I go onto SQL Server Management studio, if I look at DB Log, right? So I have all of the data. Let’s do one thing, let’s delete the data. So, this is done. Now I will define an input that’s DB all. Let me first save my query.Now here I’ll go on to the inputs. I’ll define a streaming input. I’ll give that as the alias name. I’ll choose DB Hub. It will create a new consumer group, choose the authentication mode, has the connection string, leave everything as this and click on Save. Once this is done, let me go on to the overview and let’s start the job. Now again, I’ll choose custom and here let me choose a previous time frame. So, we start getting data faster into our tables.

So, let me start the job. So now we are using multiple inputs and also using multiple outputs. This is also something that is possible. So, one thing we are doing is taking all these summary information about our metrics using the tumbling window function and then copying it onto a summary table. And then we are taking all of our metric information and putting it in another table. Let’s come back after some time. Now, once our job is running, let me do a select star from this table so I can see the data in place. Let’s look at our summary table as well. I’ll hit on execute and we can see our data here as well, right? So in this chapter, I want to complete what we did in the previous chapter wherein showing how to use windowing functions. At the same time, I want to show you how you can have multiple inputs and outputs.

  1. Lab – Reference data

Now in this chapter I want to go through reference data. Now, sometimes in your Azure Stream Antics job, your query might need to reference some sort of static data that rarely changes. You can also perform joins in your SQL query. So let’s say that you want to perform a joint join of your incoming stream data onto some static data. You can actually make use of reference data in Azure Stream antics. Now, your reference data can be in a storage account, or it can be in an Azure SQL database. So now the first thing I’m going to do is to upload my static reference data. So here I am in my Data lake gen two storage account. I’ll go on to my containers. Now here, let me create a new container. I’ll name this as Reference and hit on Create.

I’ll go on to the container and I’m going to upload a file from my local system. So I’m uploading a Tia CSE file. I’ll hit on open and hit on upload. If I go on to the CSV file, if I go on to edit, what I’ve done is I have the metric name here and I’ve given some tier information. So I am now trying to give some tier importance onto the metric names that are coming in from Azure Event Hub. So remember, all of our metric information is coming in from, let’s say, DB Hub that’s based on the metrics of Azure SQL Database. So here I have the metric name, the different metrics that are coming, and what is the tier. Now I’ll go on to our Steam antics job. Let me go on to Inputs and here, now let me add a reference input. So here I’ll give a name of tier. Here. I’ll choose our data lake storage account.

Here in the container I’ll choose the reference container. I’ll choose the connection string here. For the pattern I’ll give the file name, and here in the eventualization format I’ll choose a comma separate file, and then let me click on Save. Now in SQL Server magma studio. First let me delete the data that we have in our summary table, right, so that’s done. Once we have our reference input in place, I’ll go on to my query. So I’ve already added my query in place. So this is the query. So I again have my staging area, right, it’s staging logs. I am again selecting the metric name and what is a time using the cross supply here, I’m again selecting whatever information I want into the summary table based on my staging logs. But here I am performing a join on my tier reference data. This join is based on the metric name that is available in my tier input and the metric name that is available over here.

I only want to take those metrics where the tier information is equal to two. And again I have my tumbling window. So I’ve already gone ahead and put the query here. I also saved it. Let me go ahead now onto the overview and let’s start our Steam Attics job again. I’ll choose custom. I’ll choose a previous start time. And let me click on start. Let’s come back after a couple of minutes once our job is in the running state. Here, let me do select star from summary. And here I can see my metric names. Now here, my metrics should only be the ones where the tier information is equal to two. So if I go ahead onto my tier CSV file, so here I can see that the DTO, right? The consumption percent. If I scroll down, it is number two. It’s tier number two. If I look at, let’s say, the workers, that’s also tier two log, right? This is also tier two. So as far as I can see, everything is tier two over here. So I’m only now taking the information where the tier is equal to. And this is based on reference information.

  1. Lab – OVER clause

Now, in this chapter I just want to go through some clauses which are important from an exam perspective. So there are a lot of functions, there are a lot of clauses that you can actually use in Azure Steamatic Stop and we are trying to cover those that are important from an exam perspective. So first is the over clause. Now here, this clause actually groups a set of rows before an aggregated function can be used on top of them, kind of like using a timing window, but just a little bit different. We are not using any timing window as such. Yes, this is still based on the time aspect, but then the timing window actually serves a different purpose. So this clause directly works on the rows that are given in the query input. It is not affected by predicates in the where clause, the joint conditions in the join clause, or grouping conditions in the group I clause.

The current aggregate functions that are provided are some average, minimum and max. Now, as part of the clause, you can also specify the following so partition by so this specifies only the rows with the same partition key. These can be considered when it comes to the aggregate or the antical function. And you can also specify the limit duration. So this specifies how much history from the current row is included in the group. We’ll actually look at an example on this particular clause. Then we have the lag operator. So this operator helps to look up a previous event in the event stream. This operator can be useful to see the rate of growth of a particular variable or if the variable has crossed a particular threshold. Examples of some expressions are so let’s say you want to look at a particular change over a particular duration of time.

So here in the lag operation, over a duration of five minutes, you can look at the previous total, right? And then look at the most recent total and see that has the change. Here there is another variation of this particular command. So let’s say that again, you have metrics that are flowing in. So let’s say you are looking at the CPU percentage. So let’s say at different points in time you have the CPU percentage. It could be 20%, 25%, 30%, 35%, right, so on and so forth. Now, let’s say that you are looking at the input over a five minute duration. So here you can take the lag operator and say please find the change in the CPU percentage from the beginning onto the end of the duration of the five minute window. So you can look at this particular change using the lag operator. You also have the last operator.

This helps look up the most recent event in the event stream with defined constraints. This is good when you want to find the last good known value in your event stream. So I just want to show an example when it comes to using limit duration and the lag operator. So here again, I have the with clause to define a stage. Again, I’m selecting now this time my total. So, total is also the property that is coming in from my stream. And I’m taking the metric name. Now, here I am defining another stage, new logs. And here I am now selecting the data from my Staling logs. So I’m again breaking down stages into stages, right? To make this much more simpler, here I am only telling that please only select the metrics where the metric name is equal to storage.

See, from my event Hub, I am getting all sorts of metrics, metrics about the CPU percentage, the storage, et cetera. Here I only want to look at the storage aspect, that’s all. And I want to now look at these metrics over a duration of five minutes and I want to look at the average of the total. This is what I want to achieve. So I’ll take this all, I’ll copy it, I’ve stopped my theme at XTOP. I’ll go on to my query, I’ll just remove all of this, place this. Now I’ll again choose DB Hub. Now, note that over here I am not selecting any output, I’m not going to put this anywhere. I just want to show you how this query actually works. So we’ll just test the query, that’s it. Here what I’ll do is that let me select the time range. So let me select it for at least an hour so that we have more data.

I’ll click on sample. You can’t select time too much before in the past also because there’s a limit on the amount of sample data that you can have. So, once you have some sample data in place, I’ll test my query. So here you can see what is the average of the total over a duration of five minutes. So, here I am trying to again use aggregate functions and look at the data over a certain duration of time. So I’m saying that please limit my rows based on the duration of the last five minutes. So here my overclass is grouping my input into a batch of five minutes and then I’m performing an aggregation. Now I can also try out the lag operator as well, so I can just replace it.

Here, let me test my query. So here I’m looking at the changes. So here this is coming has null because there is no value before this. Here you can see what is the change, this value from the previous value. So here you are looking at all of the changes with the help of the lag operator. And then finally we can also use the last operator as well. So, here I am looking at whenever the total is greater than zero. So remember in the last quarter you could see there were some totals that were equal to zero. Here I am selecting the total. And I am also using when was the last time that the total was greater than zero? So again, depending upon your requirements, these are different operators that you can make use of.

  1. Lab – Power BI Output

Now in this chapter I want to have a demo on how you can use Power Bi as an output when it comes to your stream attics job. So you can stream your data directly into a data set in Power Bi. Now the first thing am going to do is I am going to go on to Azure Active Directory. Here I’ll go on to users and I’ll create a new user. I’ll just give the name as Power Bi. Here I’ll create or give a password. Now here I need to give a usage location. Now the reason I’m giving a usage location is because we need to attach a license on to this user and for that we need to define the usage location as hit on Create. So this is the location where the user is going to log in from. Now that I have this user in place, I’ll go again onto my default directory and here I can go on to licenses, here I can go on to All Products and here I actually have Power BIF free licenses.

Now I’m just giving a link onto a documentation page. So here you can actually go through this documentation page on how you can sign up for Power Bi. So I remember when I signed up for Power Bi I automatically got this free licenses in place. Even for me it was actually a surprise. I just went one day, right, I was playing with Power Bi and then suddenly I just went on to the all products page. This is for something else and then I could see Power Bi free licenses. So what I’ll do is that I’ll go on to the license and I’m going to assign it on to that user. So I’ll choose power. Bi. I’ll hit on Select, I’ll go onto Next. So it’s on, I’ll go on to review and assign the license on to the user, right, this is done. If I hit on refresh, it will take some time but we should be getting that user in place so it will take some time for the license to be assigned. Now for the user I’ll actually go on to my resource groups.

I’m going to go on to my datagrp resource group which has all of my resources which we’ve been working in so far. I am going to go on to Access control and here just for the moment I’ll add a role assignment. I’m going to choose the contributor role assignment and here let me search for that particular user. Please note that I go through rolebased assignments. When it comes to the section on security I’ll click on Save. I’m actually just trying to give contributor access onto this user, onto resources that are part of my resource group. And this is because when you create a new user, that user does not have permissions whatsoever on the resources as part of your subscription.

And we are going to log out and log in has this new user to add that power bi output onto our CMATIC stop. So that’s the reason why I have given this required access. Now I’ll go on to powerbi. com. See here I’ll just sign in as another user. So I’ll sign in with another account. So here it’s power, bi. I’ll go onto next. Again, if you want to get the full name, that’s very simple, you can actually go again onto Azure Active directory, go on to your users, search for Power bi, go on to it. And here is the entire user ID. So let me sign in. I will need to change my password. So normally this needs to be done for a new user. I’ll click on sign in get Started. So just preparing Power Bi, right, this should be fine. Now what I’ll do is let me login also onto the Azure Portal has this new user.

So here I’ll sign in with a different account. I’ll choose a Power bi account. Now here I’ll go on to my Steam antics job. Here. I’ll go on to my outputs. So one job is still running. So let me go ahead and just stop the job. Now what I’ve done is that I closed my browsers and I reopened them because I was not able to log into Power Bi. And this was purely from the license perspective. So even though I had attached the license on to the new user, in order to go ahead and ensure that that license was refreshed for that user, I closed the browser. I started everything again and now I’ve logged into Power Bi. I’ve also logged back into the Azure Portal as the Power bi user. Now here my steep addicts job is also stopped. I’m going to go onto outputs and here I’m going to add an output.

So I’m going to choose power. Bi. So here I’ll give a name of power. Bi. Here the workspace is my own workspace. Here the authentication mode is the user token. Here you can just give a data set name here. In terms of the table, I’ll name it as dBlog and I’ll click on Authorize. Here again, we need to log in with our Power bi account. Once this is done, I’ll click on Save. So this is in place. Just refresh this page so I can see Power Bi now in place. Now here is the code. Very simple. Again, I am taking everything from my DB hub and placing it into Power Bi. So I’ll copy this. I’ll go on to the query, just replace it here. I need to ensure it’s Power Bi right from DB hub. Let me save this and let me go onto the overview. Let me start this. Choose a custom time. So if you have some data in place and let me click on Start.

So let’s wait for a few minutes once our job is in the running state. Now, if you go on to Power bi. And if you go on to my workspace, here, you can see your DB set in place. You can actually go ahead onto the context menu and create a report from the DB set. And here you can see all of the fields, so you can start creating your reports. So, for example, you could create a clustered column chart. And here, if I just minimize a filter, I could choose what is the metric name, and here I could choose what is the maximum. So here we can see what is the maximum when it comes to the different metrics. So now your data is being continuously streamed on to Power Bi with the help of Azure Stream antics.