Google Professional Data Engineer – Managed Instance Groups and Load Balancing part 2

Forwarding Rules Target Proxy and Url Maps In the last lecture, we saw a quick overview of all the components that make up an Http Https load balancer. In this lecture and in subsequent lectures, we look at each of these components in some detail. We’ll start off by looking at the global forwarding rule. These are rules that you configure on the Cloud platform in order to determine where particular a traffic will be sent. Every forwarding rule matches a particular IP address protocol, and optionally, you can choose…

Google Professional Data Engineer – Managed Instance Groups and Load Balancing part 1

Managed and Unmanaged Instance Groups In this section we’ll discuss managed instance groups and load balancing. Managed instance groups are basically a group of identical VMs that are generated using something called an instance template and this group of VMs can be scaled automatically to meet increasing traffic requirements. This is why managed instance groups and load balancing typically go together. Load balancing involves distributing requests across instances so that requests go to those instances which have the capacity to handle them. Managed instance groups are a pool of similar…

Google Professional Data Engineer – Appendix: Hadoop Ecosystem part 6

Streams Intro Let’s now introduce Apache Flink and as always, here is a question which I’d like you to ponder on as we discuss this technology. How, if at all, can Mapreduce be used to maintain a running summary of the real time data sent in from sensors? These sensors send in temperature readings every five minutes. You would like to calculate maybe the average of the temperature readings from all these sensors. And that average needs to be updated forever. And to perpetuity, your Mapreduce job should never stop…

Google Professional Data Engineer – Appendix: Hadoop Ecosystem part 5

Spark We’ve now finished discussing hadoop hive and pig in the Google Cloud platform world. Hadoop, Hive, and Pig are all used via Dataproc, which is a managed Hadoop service. It turns out that Pig, Hive, and Spark are services which are available by default on every instance of a Dataproc cluster. So let’s close the loop by now discussing Spark, which is an incredible popular technology these days. As usual, I have a question that I’d like you to think about as we discuss Spark. And that question is…

Google Professional Data Engineer – Appendix: Hadoop Ecosystem part 4

Windowing Hive We’ve explored partitioning and bucketing as ways to devy up table data into more manageable chunks. We’ve also seen two types of join optimizations the use of the smaller table in memory and the use of maponly joins. Let’s now turn our attention to the third bit of olap functionality offered by Hive, which is windowing functions. Windowing functions can be thought of as syntactic sugar. There is almost nothing that windowing functions can do for you, which traditional queries cannot. But the real value and the importance…

Google Professional Data Engineer – Appendix: Hadoop Ecosystem part 3

Hive vs. RDBMS Picking the right technology for the right use case is really important these days. And so I’d like you to think about this question why would we never use Hive or Bigquery for that matter? For OLTP applications. OLTP stands for online transaction processing. These are where traditional databases tend to dominate. So, as we discuss the differences between Hive and a traditional RDBMS, do keep this question in mind. Also, keep in mind the converse of this question why would we never do the reverse? Why…

Google Professional Data Engineer – Appendix: Hadoop Ecosystem part 2

MapReduce Let’s now move on to the next part, the next building block of Hadoop, and that is Mapreduce, which of course is the parallel programming paradigm, which really is at the heart of it all. While understanding Mapreduce, I’d like you to try and answer this question why is there such a strong need for some kind of SQL interface on top of Mapreduce? Both hive and bigquery are exactly this. They are both SQL interfaces on top of Mapreduce like activities. And the question for you is why…

Google Professional Data Engineer – Appendix: Hadoop Ecosystem part 1

Introducing the Hadoop Ecosystem Hello and welcome to this module on the Hadoop ecosystem. We are going to spend a fair bit of time discussing Hadoop and some of the important components in that ecosystem. And there are two reasons why this is a good use of time. The first reason has to do with the genesis of the Hadoop ecosystem. Recall that Hadoop and Mapreduce and HDFS were actually born out of Google technologies. And so there is a close mapping between the Hadoop ecosystem and the corresponding tools…

Amazon AWS SysOps – Security and Compliance for SysOps part 5

MFA + IAM Credentials Report So, as we all know, IAM can be integrated with MFA. And MFA is multifactor authentication. What would you use? MFA. Well, you use because it adds a level of security. That means that whenever you log in, you’re also prompted with a code and you have to enter that code. And that code must be in your possession. So that just guarantees another level of security. If your password gets compromised, a hacker cannot get access to also your phone or your codes or…

Amazon AWS SysOps – Security and Compliance for SysOps part 4

KMS Overview + Encryption in Place Okay, so now let’s talk about Kms. And Kms is for key management service. So anytime you hear encryption in an AWS service, most likely this will involve Kms. And Kms is an easy way to control access to your data. And the data is going to be encrypted by keys and AWS, Kms will manage this key for us. So, Kms is a store, we have some control over it, but some things we cannot do with this store. And so that’s how…

Uncategorized