Pass Microsoft Certified: Azure Data Engineer Associate Certification Exam in First Attempt Guaranteed!
Get 100% Latest Exam Questions, Accurate & Verified Answers to Pass the Actual Exam!
30 Days Free Updates, Instant Download!
DP-203 Premium Bundle
- Premium File 379 Questions & Answers. Last update: Sep 09, 2024
- Training Course 262 Video Lectures
- Study Guide 1325 Pages
DP-203 Premium Bundle
- Premium File 379 Questions & Answers
Last update: Sep 09, 2024 - Training Course 262 Video Lectures
- Study Guide 1325 Pages
Purchase Individually
Premium File
Training Course
Study Guide
DP-203 Exam - Data Engineering on Microsoft Azure
Download Free DP-203 Exam Questions |
---|
Microsoft Microsoft Certified: Azure Data Engineer Associate Certification Practice Test Questions and Answers, Microsoft Microsoft Certified: Azure Data Engineer Associate Certification Exam Dumps
All Microsoft Microsoft Certified: Azure Data Engineer Associate certification exam dumps, study guide, training courses are prepared by industry experts. Microsoft Microsoft Certified: Azure Data Engineer Associate certification practice test questions and answers, exam dumps, study guide and training courses help candidates to study and pass hassle-free!
Design and implement data storage - Azure Synapse Analytics
4. Lab - Let's create a Azure Synapse workspace
Now, in order to start working with Azure Synapse, we first need to create a resource based on Azure Synapse itself in all resources. I'll hit on Create, and I'll search for Azure Synapse. I'll choose that. I'll hit "create." I'll choose a resource group. As a result, my Data GRP resource group I don't need to add a managed resource group here. We need to give the workspace a unique name. I'll leave the region as it is. Now, the workspace needs to have a Data Lake Storage Gentle account in place. I can choose the same data lake storage account that we have, or you can actually create a separate one just for the Synapse workspace. So I'll click "New" and give the account a distinct name. So this also seems to have been taken. I'll hit OK, and we need to create a new file system in that data lake. I'll hit OK. I'll leave the assignment as it is. Go on to security. Now, when you want to lock into the server that is hosting the SQL Data Warehouse, you'll use the Sequel Server admin login and the password that you specify over here. It's kind of the same when you're actually building it as your SQL database.
As a result, I'll use the same password as your other 8123. I'll keep this to allow pipelines to access my sequel pools. I'll move on to Networking, leaving everything on Tags, Review, and Create alone. And let's hit "create." Now, this might take around five minutes. Let's come back once you have the Synapse workspace in place. Now, once we have the Azure Synapse workspace in place, we just get a link to go on to the resource group. So we have our data in the GRP resource group. But I can access all resources again. I can filter based on DP 20 three. And here I can see my Synapse workspace in place if I go ahead and scroll down. So we have two compute options, SQL Pools and SQL Queries. And in the SQL Pools, you get something known as a "built-in serverless pool." You are not charged based on the amount of time this pool runs. In fact, you are charged based on the amount of data that is processed via the service SQL Pool. You can also create something known as a "dedicated SQL pool." And you can also create Apache Spark pools as well. So for now, we have gone ahead and created our Synapse Workspace.
5. Azure Synapse - Compute options
Now, when it comes to Azure Synapse, there are different compute options in place. Here I am comparing something known as the serverless sequel pool and the sequel pool. When you want to have a dedicated data warehouse in place, then you'll use the SQL Pool. If you just want to perform quick adhoc analysis of your data, you can look toward using the Serverless SQL Pool. Yeah, you can use TSQL to work with your data. The same thing holds true for your SQL Pool. In the Serverless SQL Pool, you can only create something known as "external tables." If you want to persist the data in actual tables, you have to go ahead and use the sequel pool. And when we go into our labs, then you will see or understand the clear distinction between a service SQL pool and a SQL pool. When it comes to the SQL Pool service, you are charged based on how much you use it. And that's because there is no underlying infrastructure here. You are charged based on the amount of data that you process.
Whereas, when it comes to the SQL Pool, you are actually charged based on something known as "Data Warehousing Units." So when it comes to the compute, when it comes to the amount of memory, when it comes to the amount of IOPS, all that is bundled into something known as a "Data Warehousing Unit." Remember, when it came to the Azure SQL Database, there was a similar concept known as a DTU, which was a database transaction unit? So somewhat of the same metric is also available over here. That's when it comes to the data warehouse now. Please know that apart from this, you also have the Spark Pool, which is also in place. But at this point in time, I first want to concentrate on the surface SQL pool and the SQL pool. If I go on to the pricing page for Azure Synapse, if I scroll down, So when we are looking at the dedicated SQL Pool, if I go ahead and scroll down, here you can see that the least service level that is available is DW 100C, and here the cost is $1.2 per hour. Again, it depends upon the region, but generally, the cost is around the same.
Whereas if you are just doing data exploration, if you scroll down, you will see a service with this option in place, where you are charged based on the amount of terabytes of processed data rather than the amount of data warehousing units. So, if you need to perform quick ad hoc analysis, you can use the server sequel pool. But if you want to have a dedicated data warehouse in place, then you need to have a SQL pool in place, right? So this is just a very quick comparison between these two computation options. We'll look at labs to learn how to use both the server sequel pool and the SQL pool, but the SQL pool will get more attention. So when it comes to the exam, there is more focus on building a data warehouse. And you can only do that when you have a dedicated SQL pool in place. Bye.
6. Using External tables
Now in this chapter, we are going to look at how to use the serverless pool that is available in Azure Synapse. We are going to look at defining something known as "external tables." This concept of an external table is not only available in Azure Synapse; it is also available when it comes to the SQL Server itself. So if you have a SQL database, you can also define external tables over there as well. You can define external tables both in the serverless pool and in the dedicated SQL pool, which is available for Azure Synapse.
The main purpose behind using external tables is that your data lies in another source, and it is just the definition of the table structure that will be present in Azure Synapse. When you make a query against the table, the data is actually fetched from an external source. An external table can point to data that is located in Hadoop, in Azure Blobstorage, or in Azure Data Lake Storage accounts. External tables are actually referenced by using a feature known as PolyBase. So PolyBase is actually the feature that is used to access the data in external tables. So keep in mind that your data will be in an external source, whereas your table definition will be in Azure Synapse. If you looked earlier on when we were working with the Sequel Server as your SQL database service here, your table definition and the rows in the table are on the Sequel Server itself.
But as I said over here, the main purpose of external tables is that your table definition is or will be in Azure Synapse and your data will be located somewhere else. This is useful if you don't want to actually store the data on the server itself. Assume you want to perform a join on a table that contains data from AzureSynapse and data from somewhere else. Instead of actually bringing that data from the external source onto that server and then performing the join, you can perform joins between external tables and your internal tables. So we're going to make use of the Log CC file that you have in our existing Azure Datalake Gen 2 storage account. We will then use the service pool in Azure Synapse to create that external table.
There are a few important steps that need to be in place in order to access the external data. The first thing is authorization. So we're going to be looking at commands on how to access that external data. The first thing is authorization. So you need to be authorised to actually use the data that is stored in your Data Lake Gentle storage account. Then you have to go ahead and define what the format of the external file is. Is the external file in CSV format? Is it in a paper-based format, etc.? And then you create and access the external table, right? So let's go on to the next challenge. We will see a lab on how to create the external table.
7. Lab - Using External tables - Part 1
So in this chapter, we are going to see how to work with external tables in Visual Studio code. So I have the sequel file that I will be working with, and it will execute all of the statements from Synapse Studio. So if we go on to our existing Synapse workspace, if I go ahead and scroll down, we can go ahead and open up Synapse Studio. So within Synapse Studio, you can also start working with SQL commands against both your server pool and your dedicated SQL pool.
You can also create pipelines when it comes to integrating your data. You can also view your data as well. So if I go on to the data over here, currently I don't have any databases in place. As a result, you can access both your ServerSQL Pool and your dedicated SQL pool. Now, most of the work that will actually be done will be done from SQL Server Management Studio, and I'll explain one of the reasons why I'll be working in SQL Server Management Studio. But when it comes to the Serverless SQL Pool, let's go ahead and actually execute the commands here itself.
So I'll go on to the developer section in Synapse Studio. So I said that this is kind of like having an integrated development environment to work with Azure Synapse itself. Here in the Develop section, I can create a SQL script, a notebook, and a data flow when it comes to integrating your data. When it comes to an ETL flow, what I'll do is go ahead and create a SQL script. Let me go ahead and hide this here. I can give a name to my script. We are connected to the built-in serverless pool. I'll just hide this now. I'll copy my entire script. I'll place it over here. The first thing we are going to do is create a database in our service pool. So our service pool does not come with any databases in place. So we'll execute the first command to create a database.
So I'll hit "run," so it will execute the query over here, right? So that's done. Now next we are creating something known as a master key. This master key will then be used to encrypt something known as a "database scope credential." The database scope credential will allow us to authorise ourselves to use the file we are going to be using in our Data Lake Gentle storage account. So what are we going to do? I said we are going to be creating an external table that will access our log CSV file in our Azure Data Lake Generation 2 storage account. So our log dot CC file is in the data container. It's in the raw folder. And this is the file that you want to access in a log CSC file. Again, just to reiterate, what is in this particular file?
So it's information about my activity logs. These are all the different columns that have a unique ID. I have the correlation ID; I have the name of the operation that is performed; I have the status of the operation; what is the event category; what is the level; what is the time that the operation occurred? If I scroll on to write, I have the subscription, the event initiated by, the resource type, the resource group, and that's it.
Now, one thing that I've actually done right in this log CC file is that, if you are going ahead and generating the activity log yourself, when it comes to the header information, there might be a space between correlation and ID, and there might be a space between the operation and the name. Now, you can go ahead and leave this hazardous and still access this as an external table, but there are some formats that might have a problem when it comes to having spaces between the column names. So even when it comes to the parquet-based file format, this could be an issue.
So what I've done is try to have kind of the same flow when it comes to the data across all of the labs. That's why what I have done is that I've ensured that there is no space in the column name. So I just want to tell you what I've actually done to this particular data. Because, when it comes to your data, there are a lot of things that you have to consider.
Whether it is passing your data, analysing your data, or storing your data, you have to understand whether the destination file format and the destination database engine actually understand how to represent your data. And that's where you actually come onto a phase known as "data cleansing." Your data should be in a format that can actually be understood, or it should be in the way that you want to understand what you want to get out of the data. So I said that this is something that I've done. I've gone ahead and removed these spaces from the column names. Let's go back.
So, going back to our sequel script, I said I needed to generate something known as the master key. This will encrypt our database access credentials. So this makes the storage of the scope credential much more secure. It will encrypt this because, in the end, this is like having a password that will actually connect to your data. Lake Gentle storage account, and we want to ensure that this is encrypted so that no one else will be able to get this information.
Now, what I've done is change the name of the script just to make sure it's an external table. You can call the script whatever you want. We must first change the context to our new database. So I'll just quickly, first of all, hide this. I don't need the properties anymore in terms of using the database. It's currently connected to the master database.
But we have to connect now to our app database. I'll click Refresh over here, and then change the context to my app database. Next, I'll execute the statement for creating the master encryption key in the app DB database. I'll click on "run." This is also done. Next, we need to create a scope database credential. For this, we first need to generate the shared access signature. That's why earlier on, I'll explain the concept of a shared access signature.
So I'll go on to the data lake. Account for Gen 2 storage I'll go on to share my access signature. I'll just hide this. I'll choose the Blob service. I'll choose the service container and object here. I only need the list and the reading permission. I'll generate the shared access signature. I'll copy the SAS token. So this is the SAS token. I'll copy this. I'll go on to my workspace.
I'll replace it over here. I need to remove the question mark, right? So I now even have the database scope credential in place. I'll run this particular command. Is this also done correctly? So let's mark an end to this chapter. We'll go on to the next chapter, where we'll complete this exercise of creating an external table.
Microsoft Certified: Azure Data Engineer Associate certification practice test questions and answers, training course, study guide are uploaded in ETE files format by real users. Study and pass Microsoft Microsoft Certified: Azure Data Engineer Associate certification exam dumps & practice test questions and answers are the best available resource to help students pass at the first attempt.