As a data professional, I can tell you have at least built your own working space in either R and Python. There is no doubt. I guarantee that every people who call themselves a data analyst/business analyst/ data scientist have exposure to at least one of them. For me, it was R for all four (4) years of my undergraduate years. I was (and still am) more familiar with R language syntax, and data processing workflow (i.e. which package to use in what).
And when you looked into Kaggle to refine your data analytics/data science skills, you have probably noticed…
In this post, the implementation of K-means clustering in Google Cloud Platform (BigQuery) is shown in detail. Anyone who does not have previous experience and exposure to the Google Cloud stacks should be able to follow through easily.
In this post, I will explain the design and implementation of the ETL process using AWS services (Glue, S3, Redshift). Anyone who does not have previous experience and exposure to the AWS Glue or AWS stacks should easily be able to follow through.
Before we dive into the walkthrough, let’s breifly answer (3) commonly asked questions:
So, what actually is AWS Glue? What are the features and advantages of using Glue? And What is the real-world scenario?
So what is Glue? AWS Glue is simply a serverless ETL tool. ETL refers to (3) processes that are commonly needed in most…