In this post, the implementation of K-means clustering in Google Cloud Platform (BigQuery) is shown in detail. Anyone who does not have previous experience and exposure to the Google Cloud stacks should be able to follow through easily.
In this post, I will explain the design and implementation of the ETL process using AWS services (Glue, S3, Redshift). Anyone who does not have previous experience and exposure to the AWS Glue or AWS stacks should easily be able to follow through.
Before we dive into the walkthrough, let’s breifly answer (3) commonly asked questions:
So, what actually is AWS Glue? What are the features and advantages of using Glue? And What is the real-world scenario?
So what is Glue? AWS Glue is simply a serverless ETL tool. ETL refers to (3) processes that are commonly needed in most Data Analytics / Machine Learning process: Extraction, Transformation, Loading. Extracting data from a source, transforming it in the right way for applications, and then loading it back to the data warehouse. And AWS helps us to make the magic happen. AWS console UI offers straightforward ways for us to perform the whole task to the end. …