Practice Exams:

DA-100 Microsoft Power BI – Part 2 Level 4 – Transform – Text and Numbers

  1. Transform/Add Column – Text – Format

Activity that we just did. I created this year analysis and it’s looking okay, fairly good. It gives me information. It gives me, for instance, that east of England and southwest keep changing places. But the thing that it doesn’t quite do, if you have a look at the months is have the months in order january, February, May, April, March, June, July. It’s got May, March in the wrong order order. In fact it’s the right order. It just depends on what you are searching. Now, the reason why it’s not treating them in the right order is because it’s treating them descending order of the average price. And you can see that we did a drill through so that we’re looking at the year 2008. If I clear that filter, you can now see that the months go december, November, September, October. It’s largely in descending month of order, but not quite September and October in the wrong place. Now you can see how it is sorted. This little arrow here is showing that by default it is sorted by average price.

 And I got to this drop down by just double clicking on these three dots here. So if I wanted to sort by month, which could be a bit more ordinary and month ascending, you might to be expecting to get January, February, March, but you’re not getting April, August, December, because this is the order in alphabetical order. So what we need to do is convert this so that it actually means something to the computer. We need it to be in a date. Now, at the moment we have got the year and the month in two separate columns and that is quite a problem for the computer to actually understand that it is a date. We’re going to have year, month.

So month is just text, year is a number, neither of them are dates. Now I’m going to have a look instead of going to transform a date column, which you might think we’re doing immediately, we’re not, we’re going to have a look at the text column. Now, I should point out that most of this is replicated elsewhere. So the split column, which is the first thing we’ve already had a look at the split column. So you can split by the limiter, by number of characters, by position, when it goes from lowercase to uppercase, uppercase to lowercase, when it goes from being a digit to not being a digit and so on. So we’ve already had to look at split column, we don’t need to have a look at it again. The rest of it format, merge column, extract and pass is in the transform menu, but is also in the add column menu.

So what’s the difference between transform and add column? Well, the answer is do you want to overwrite the existing column that’s there, in which case use Transform? Or do you want a completely new column to be added, in which case use Add column. Of course, you could always just use add column and then manually delete the existing column if it’s surplus to requirements. So let’s just have a look at what we’ve got here in the text column. First of all, we’ve got the formatting. So if I click on the month, we could format everything as lowercase and you can see it’s using the table transform columns and there’s text lower.

We could transform it into uppercase. No surprise, watch will get there text upper. You could transform it into capitalize, each word also known as proper case and that’s what it uses, text proper. So we would have every word starting with a capital letter. So if I transform region, for instance, then the of in east of England would also be capitalized. You can also trim. So that removes white spaces, in other words, blanks or spaces. From the beginning that’s called leading and at the end that’s called trailing. So there are some languages where you have to do leading trim and trailing trim as two separate functions. But here you don’t need to, you only need to do it as one. So if I were to trim this, there would be nothing happening because there are no blank spaces. But if there were, then they would be removed. So if you’re having a problem with blank spaces, that is very useful. In addition to trimming, you’ve also got something called cleaning. So that is removing characters which don’t print. So you might have some very weird characters and that’s text clean.

Additionally, you can also add a suffix or a prefix. So prefix goes at the beginning, suffix goes at an end. So suppose I wanted a prefix of region, colon and a space. That’s fine, that can be done. And you can see here how the computer has done it. It has gotten ampersand to connect two text together. And the underlying that you can see here, that is your original text of have this column called region and it becomes what each row actually had with prefixed with region. So combine the two and you can see a similar formula for the suffix.

  1. Transform/Add Column – Text – Merge Columns

Now, these next two are related merge columns. These concatenate. These join together two columns into one. Of course, it needed to be two. It could be more than two. But what I’m going to join together is the year and the month in order to create something that’s looking like a date. So if I merge these columns together and I’m using Transform, you’ll notice, and it’s asking me what’s the new column going to become. So even though I’m using Transform, it’s still going to give me a new column. So I’m going to call this date combined and it’s saying, okay, how are you going to combine them? Well, I want a space between them, but equally, I could say I want something custom, so I could have the word and for instance, in between, but I’m going to have a space, so let’s combine those. And you’ll see that the computer has created a new column, date combined, and it’s removed the two existing columns. So it’s not like it’s taken an existing column, made that the combined, and then got rid of the others. It’s actually just merged them all into one big column and a new column at that. Now, if you wanted your existing columns to remain, what do you do?

You don’t use Transform merge columns, you use add column, merge columns. And you can see the formula slightly different. It’s table add column. Whereas previously it was table to transform columns, but it leaves the existing columns in place. So we’ve combined these as text. And how do we actually get them into a date format? Well, you might think we’ll be going over here into the date, but we don’t have to. All we have to do is change the data type. So we can change the data type by here, next to the column, or we could do it in transform any column, or we could do it in home transform data type. So I’m just going to change this to Date. Now, notice the format that the computer currently has. It in 1995 Feb. It doesn’t have it as Feb 1995. Will that make a difference? Let’s change it to Date, see what happens. And the answer is no. The computer has successfully worked out what it is that it’s worked out. That is a year followed by a month. In the English format, we will be looking at examples of how to do non your language date. So in my version, nonenglish date formats later. So now we’ve got this date combined, we can close and apply.

So let’s do that and we get back to our Power Bi desktop. And it’s saying something’s wrong with one or more of the fields average price. And what’s happened is that at some point, I’ve accidentally deleted the transformation into a number. So let’s just turn that back now into a whole number. There we go, close and apply. So if you see this sort of thing, it means something’s wrong with what you’ve just done, please correct. So now we have changed absolutely nothing because we have added a new field date combined. Had we deleted the columns month and year, then a lot of the stuff that we have previously done will no longer work. So rather than doing that, I’m adding extra columns. And then if I want to completely go on to this new date combined, I can do that and then delete these columns afterwards once they’re no longer used. So let’s click on this visualization, and it’s currently going by month. So instead, I’ll drill down into the hierarchy to this version of month. So, just as a reminder, we’re currently sorting by month, but it’s sorted in alphabetical order. Now, if I get rid of this month and drag in the date hierarchy month, you can now see that it is coming in in the proper order that we would expect January, then February, then March, and so forth.

Similarly, I could go through all the other examples that I’ve got, the region analysis, for instance, click on this and change the axis that I’ve currently got for year and month. So delete those drag in year and then month, same functionality, except that when you drill down, you actually get January, February, March, instead of having it in a different order. So I could go through the entirety of this dashboard, change all of my existing individual year and month references to the year and month in the date combined date hierarchy, and then I could edit the query again and delete year and month as separate because they’re no longer in use. Now, just to clarify, let’s just do that just to show that it is actually possible to remove these columns. Date combined relies on the year and month, but I’ve just got rid of the year and month. If you were to try to do this in something like Excel, so here’s an example. In Excel, if I was to then put up a formula date combined and just concatenate them together and then delete the existing columns, we would get an error because I’ve removed the data that I’m relying on.

However, in Power Bi, that’s not the case. It’s going through a series of steps, and it’s at this step here, the insert merge columns, that it’s done all the calculations. I no longer need to rely on these two columns to populate this. It’s something that’s done in the past as opposed to Excel, where it’s something that’s done in the present.

So I can now remove them and date combined will be unaffected. I’m not going to do that, however, because there are some elements in this visualization which actually rely still on the existing year and months, I think it may just be the year now. So I’d have to go through all of them, which is probably a good practice to do, to be honest. Go through all of them and change them to date hierarchy and only then. Now nothing is dependent on this individual column. Can I remove it? So in this video, we had a look at the start of the transform or add column for the date. We’ve had a look at formatting and merging. We’ve already seen that we already are splitting the column previously. And you saw once we have something that looks looks like a date. You can just transform it into a date just by changing the data type. In the next video, we’re going to have a look at how to extract data.

  1. Transform/Add Column – Text – Extract

Now in this video we’re going to have a look at the extract functions. Now, there may be a bit of similarity with split column because we are able to split a column by delimiter. So if we take this region for example and split this column by the delimiter of a spacebar at each occurrence, you can see that we got east of of England for instance in three separate columns. We could also say, well we don’t want to split all three, we just want the first instance. So we want the left and most instance, in other words, the first word east and then of England. Now, what we’re able to do in extract with delimiters is have text before a particular delimiter. So this doesn’t split it into two or more columns. This will in the case of transform overwrite your existing column. In the case of add column, it would create a new column with this information. So suppose I wanted the first word and I didn’t want to have to split by delimiter and then delete this second column, which is of course perfectly possible to do. Instead I could just extract all of the text before the delimiter so I can say I want all the text before the space mark. Now you could say I want to run from the end.

So in other words, instead of looking for the first space, I want to look for the last space. So if I just do this first one first. So you can see now we’ve just got the first word. If I go into the settings I can change that. So we’re still looking for space, but we’re now looking from the end of the input and so that is giving me everything but the last word. So this one here for instance is just one word, England. Everything but the last word is a blank.

Now, I’m personally not too keen on blanks. What I would do then is replace the values, replace a blank with a null or you could say well that’s fine giving me the first word, but what I actually want is the second word. And so we’re going to the advanced options and say we’re going to skip one space. So this will actually give me the first and second words. So Inner London, Northeast East Midlands and then I can have another extract which says I want all the text after the first space. So that would give me then the second word in total. So we have east west London. That was Inner London or out of London and east of England becomes of So what we’ve done is we’ve taken the first two words and then taken the second word of that. Now there is another way of doing that and that is to say I want to extract text between delimiters. So between a space and a space. And again you can say where you want to start and stop. So maybe you don’t want to start at the first space, you want to start at the second space, in which case that would be a number one.

So with this, if I’m looking text from a space to a space, that again gets me the second word. Now, extract can also give you things that you are probably quite familiar with. In Excel we’ve got length, for instance. So that just gives me the length of the text. So text length. Or you could say I want the first few characters. So I could say I want the first three characters. How many starting characters are you going to keep? So east of England becomes eastern becomes in and so forth. And that’s text start. So we’re not talking about left and right and mid as we’re doing in Excel. All of these start with text capital T and then a dot. And here we have start. So you won’t be too surprised to learn what the formula is for the last few characters. It’s text end. There is, however, one potentially surprising thing. Let’s go for extract range and let’s say I want from the third character onwards for four characters. Okay? So I’m going to say the starting index is three and number of characters four. So if you have a look at the top example, East Midlands.

What I want is the third character, which is here. And then I want four characters. So that should get me st space M. Let’s see whether that’s correct. Click OK. And you can see it doesn’t give me st space M, it gives me t space Mi. And the reason for that is because this one we are expecting in Excel that we have 123-45-6789 and so forth. However, that’s not the case in Power bi. The first character is not index number one, because power bi, like other things like C sharp and other programming languages, are what’s called a zero based index. And so the first character is not one but zero. And therefore, if I wanted the third character, I would need to go for index number two. So to get the third character, this text middle three comma four, we need to change that to two comma four. So now we have got starting from the third character and going forward the four characters by using two four. So you don’t have to worry about that with text start or text end.

You just have to worry about that for text middle. So this is a list of commands like len, left, right and mid, which are translated as text length, text start, text end and text middle. And it’s only the text middle that you have to remember to deduct one from your starting character.

  1. Transform/Add Column – Text – Parse

In this video we’re going to have a look at Pass XML Pass, which I just remind you is in the transform and in the add column. Generally you’d probably want to transform them, but first of all, what are XML and JSON? J s or n. Well, they are both things largely used on the internet. So if I start with XML which, which is the older one of the two that sounds for Extendable Markup Language and I’ll show you an example of an XML file that I’ve got in notepad. And this relates to a shopping trip. And I’m not going to go into all of the details of the XML, but suffice to say that it is fairly understandable. Yes, it’s got a very strict syntax, but this course isn’t about building XML or JSON files, it’s about understanding or how to get the data from XML or JSON files. So I had a shopping trip, the shopper name was me, the wife was nice and in fact I had two shopping trips. The first one I got bananas for $5, apples for four and cherries for three. And then the second time I got emeralds, diamonds and furniture.

I do my shopping in alphabetical order, roughly. And you’ll notice there’s no actual cost for any of those, it’s null. So this is a perfectly well formed XML file. Now, you don’t just have to have one XML file. Now, I’m going to happen to use just one example, but we could have example two and example three and so forth further down. So all of this is contained in just one cell in my Excel. Alternatively, you might be importing it from text documents or down from the web directly. There might be lots of things you’ll be doing with these XML files if you have calls to use them. So what I’m going to do now is load that XML file and we’ll see how it’s presented. So click OK and we have a new query. And as I say, all of this is in the one field, one row, one row. So it literally is one piece of information. Now so far it is a text file and it’s very difficult for me to drill down. For instance, I can’t tell you the cost of four Apples or Apples being $4. I simply can’t tell you that.

So what I can do is I can drill down and I do this by going to transform Path XML. And my XML is now changed into a table. Now we’ve previously had a look at how to get tables out. We’ve got to transform and structured column expand, or alternatively we can press this little button which also does the same job. So whichever you do, click on that and our table then will disappear in favor of the actual parts of it. So if we just have a look at our example again, we’ll see that we have a shopping trip. The two shopping trips we’ve got, and we have the shopping name and the weather. So let’s expand, and we can use the original column name as a prefix. I’m not going to, I just want to see the name, shopping trip and so forth. And shopping trip is still a table because you can see there are two shopping trips. So I’ll expand that further. And we can see we got an item and the shopping trip ID. So l one and l two, and then the actual contents of it as well. So let’s look again.

I’m not going to use the original column name as a prefix. So here we have shopping trip l one and shopping trip l two. So already our one roar has expanded to two rows because there are two shopping trips. So just to remind you, we had one row with an item number one. It contained two shopping trips. Now we have two rows, both with an item number one. So now I can expand each of these shopping trips, and in fact, if I wanted to click on the table and temporarily see what’s underneath, I can do. So you can see bananas, apples and cherries. So I’ll just undo that. Or I could just use the arrow keys and you can see it at the bottom. So now if I expand this further, you can see element and attribute. So the element being what it is, I got the bananas apples chair is the attribute being the thing which is within these brackets, within these less than and greater than.

But you can see it’s the cost really, you don’t need to focus too much on what elements are, what attributes are. All you need to know is you can keep expanding and expanding and expanding until you get to the actual data. So now we’ve got six rows of data, which is completely in line with what we’ve got in our XML file. And you can see bananas cost $5, apples cost four, cherries cost three, and emeralds, diamonds and furniture were done in shopping trip l two, with a null cost, an unknown cost. Now it’s at this stage that really all we’ve done is click expand, expand, expand, expand. That’s all we’ve done to get from here, apart from going to transform XML.

And we just click expand and expand and expand and expand. But it says here that you can be grateful that this is a macro, this is a series of instructions to the computer, because when the XML changes, assuming that it’s in the same structure, then you don’t have to do any of this, the computer’s done it for you. So you load the data, you refresh the data, and the analysis changes automatically. So imagine this was not one item. Imagine this was 100 items, 100 rows initially, right at the beginning. Well, it’s exactly the same principle. All I would do is exactly what I’ve done here, I will transform. I would pass the XML. And then I’ll click on the expand, expand, expand, either in the structured column or to be honest, I’ll just click this little icon here. So that is XML.

There is however, another slightly newer version and that’s called JSON. Now JSON stands for JavaScript Object notification and it’s used quite often in web pages. And you can see, I can describe the same sort of thing. It’s a different format. We use these little brackets, these little squiggly brackets, but it’s still all understandable, I think. So if I open up a JSON document, again, the computer has it all in one cell in Excel. So if I go to JSON, they’re all in one cell. And again what I can do is just transform it. I will pass it so that says to the computer, this is JSON, read it as such. So it’s a bit like changing the data type to JSON, except there is no JSON data type that the computer knows. And now you can see it is a record. So it contains the person’s name, in this case my name, because that was the first thing in the JSON. And then it contains another record within it, the shopping trip. So if I expand the record, you can see the shopping trip. There is one shopping trip.

I’ll expand this. Within the shopping trip there are items, each of those items is part of a record within a list. So I need to expand the list and that will expand it onto new rows and then expand the records. So list essentially going down, adding more rows, records, adding more columns. So now you can see just by clicking the same buttons as before, the computer is understood now because I’ve told it, it’s a JSON document that we have bananas at $5, apples at $4, cherries at $3. Now I just want to go back a step here to the list. You notice it said expand to new rows, but the alternative is extract values. So if I extract values with a comma between them, we can see that we get an error because it’s a bit too complicated for the computer to do. But if it wasn’t containing records, it was just containing numbers, then you could see or fruit, it would say apples, comma, bananas, comma, cherries.

But generally what I like to do anyway is expand to new records and then expand further. And then I’ve got everything back where I want. I want each column to represent a different unique thing. So this is the person who went shopping, this is their shopping number ID, this is what they bought, this is how much they cost. Whereas if you were to concatenate them into a single value, into a single piece of information, then I wouldn’t have that. I wouldn’t have apples on one raw column, cherries on another, raw bananas another. I would have apples, comma, bananas, commas, cherries. Not usually what I want. So if you get some JSON or some XML it’s very easy for you to be able to use them in Power query power Bi. All you’ve to do is import it and then pass it with JSON or XML.