Cloud Vedas

Azure Crash Course - WebJobs

Azure WebJobs is similar to AWS Lambda a serverless technology.

You can run programs or scripts in WebJobs in your Azure App Service web app in three ways:
On demand – You trigger it when you need.
Continuously – It will keep on running in background always.
Schedule – You can schedule it to run on specific date and time.

Following file types are accepted:-

.cmd, .bat, .exe (using windows cmd)
.ps1 (using powershell)
.sh (using bash)
.php (using php)
.py (using python)
.js (using node)
.jar (using java)

Interestingly Azure WebJobs supports shell scripts which is missing in AWS Lambda.
The WebJobs SDK does not yet support .NET Core.
Typical use case for Azure WebJobs:-

Image processing or other CPU-intensive work.
Queue processing.
RSS aggregation.
File maintenance, such as aggregating or cleaning up log files.
Other long-running tasks that you want to run in a background thread, such as sending emails.
Any tasks that you want to run on a schedule, such as performing a back-up operation every night.

Do Note

Web apps in Free mode can time out after 20 minutes if there are no requests to the scm (deployment) site and the web app’s portal is not open in Azure. Requests to the actual site will not reset this.
Code for a continuous job needs to be written to run in an endless loop.
Continuous jobs run continuously only when the web app is up. Check what’s a Web App .
Basic and Standard modes offer the Always On feature which, when enabled, prevents web apps from becoming idle.
You can only debug continuously running WebJobs. Debugging scheduled or on-demand WebJobs is not supported.

This article was written to give you quick snapshot of WebJobs. You can follow this Azure Doc to check how to quickly deploy apps with WebJobs.
Azure Crash Course series is created to give you quick snapshot of Azure services. You can check other services in this series over here .

AWS Crash Course - Redshift

Redshift is a data warehouse from Amazon. It’s like a virtual place where you store a huge amount of data.

Redshift is fully managed petabyte-scale system
Amazon Redshift is based on PostgreSQL 8.0.2
It is optimized for data warehousing
Supports integrations and connections with various applications, including, Business Intelligence tools
Redshift provides custom JDBC and ODBC drivers.
Redshift can be integrated with CloudTrail for auditing purpose.
You can monitor Redshift performance from CloudWatch.

Features of Amazon Redshift
Supports VPC − The users can launch Redshift within VPC and control access to the cluster through the virtual networking environment.
Encryption − Data stored in Redshift can be encrypted and configured while creating tables in Redshift.
SSL − SSL encryption is used to encrypt connections between clients and Redshift.
Scalable − With a few simple clicks, you can choose vertical scaling(increasing instance size) or horizontal scaling(increasing compute nodes).
Cost-effective − Amazon Redshift is a cost-effective alternative to traditional data warehousing practices. There are no up-front costs, no long-term commitments and on-demand pricing structure.
MPP(Massive Parallel Processing) – Redshift Leverages parallel processing which improves query performance. Massively parallel refers to the use of a large number of processors (or separate computers) to perform coordinated computations in parallel. This reduces computing time and improves query performance.
Columnar Storage – Redshift uses columnar storage. So it stores data tables by column rather than by row. The goal of a columnar database is to efficiently write and read data to and from hard disk storage in order to speed up the time it takes to return a query.
Advanced Compression – Compression is a column-level operation that reduces the size of data when it is stored thus help in saving space.
Type of nodes in Redshift
Leader Node
Compute Node
What does these nodes do?
Leader Node:- A leader node receives queries from client applications, parse the queries and develops execution plans, which are an ordered set of steps to process these queries. The leader node then coordinates the parallel execution of these plans with the compute nodes. Good part is that you will not be charged for leader node hours; only compute nodes will incur charges. If you run single node Redshift cluster you don’t need leader node.
Compute Node:- It execute the steps specified in the execution plans and transmit data among other nodes to serve queries. The intermediate results are sent back to the leader node for aggregation before being sent back to the client applications. You can have 1 to 128 Compute Nodes.
From which sources you can load data in Redshift?
You can do it from multiple sources like :-

Amazon S3
Amazon DynamoDB
Amazon EMR
AWS Data Pipeline
Any SSH-enabled host on Amazon EC2 or on-premises

Redshift Backup and Restores

Redshift can take automatic snapshots of cluster.
You can also take manual snapshot of cluster.
Redshift continuously backs up its data to S3
Redshift attempt to keep at least 3 copies of the data.

Hope the above snapshot give you a decent understanding of Redshift. If you want to try some handson check this tutorial .
This series is created to give you a quick snapshot of AWS technologies. You can check about other AWS services in this series over here .

How to load data in Amazon Redshift Cluster and query it?

In the last post we checked How to build a Redshift Cluster? . In this post we will see how to load data in that cluster and query it.
Pre-requisite :-
Download SQL Workbench
Download Redshift Driver
Once you have downloaded the above mentioned pre-requisites let's move ahead.
First we will obtain the JDBC URL.

Login to your AWS Redshift Console .
Click on the cluster you have created. If you have followed the last post it will be "testdw" .
In the "Configuration" tab look for JDBC URL .
Copy the JDBC URL and save it in notepad.

Now open the SQL Workbench. You just have to click on 32-bit or 64-bit exe as per your OS version. In my case I am using Windows 10 64-bit so the exe name is SQLWorkbench64 .

Click on File > Connect window.
In the bottom left of the "Select Connection Profile" window click on "Manage Drivers"
In the "Manage Drivers" window click on the folder icon, browse to the location of the Redshift driver you downloaded earlier and select it.

Fill other details in "Manage Drivers" Window as below.

Name:- Redshiftdriver JDBC 4.2
Classname:- com.amazon.redshift.jdbc.Driver

Click OK
Now in the "Select Connection Profile" window

Fill details as below. You can also refer the image below for this.

Driver:- Select the Redshift driver you added.
URL:- Mention the JDBC URL you saved earlier.
Username:- DB username you mentioned during cluster creation.
Password:- Enter password of the DB user .

Check the Autocommit box.

Finally click on OK.
If everything is configured correctly. You will get connected to DB .
Try executing the query

select * from information_schema.tables;

If you connection is successful you will see results in the window.
Now we will load some sample data which is provided by AWS and kept on S3. In the SQL Workbench copy/paste the below query and execute to create a table.

create table users(
 userid integer not null distkey sortkey,
 username char(8),
 firstname varchar(30),
 lastname varchar(30),
 city varchar(30),
 state char(2),
 email varchar(100),
 phone char(14),
 likesports boolean,
 liketheatre boolean,
 likeconcerts boolean,
 likejazz boolean,
 likeclassical boolean,
 likeopera boolean,
 likerock boolean,
 likevegas boolean,
 likebroadway boolean,
 likemusicals boolean);

create table venue(
 venueid smallint not null distkey sortkey,
 venuename varchar(100),
 venuecity varchar(30),
 venuestate char(2),
 venueseats integer);

create table category(
 catid smallint not null distkey sortkey,
 catgroup varchar(10),
 catname varchar(10),
 catdesc varchar(50));

create table date(
 dateid smallint not null distkey sortkey,
 caldate date not null,
 day character(3) not null,
 week smallint not null,
 month character(5) not null,
 qtr character(5) not null,
 year smallint not null,
 holiday boolean default('N'));

create table event(
 eventid integer not null distkey,
 venueid smallint not null,
 catid smallint not null,
 dateid smallint not null sortkey,
 eventname varchar(200),
 starttime timestamp);

create table listing(
 listid integer not null distkey,
 sellerid integer not null,
 eventid integer not null,
 dateid smallint not null  sortkey,
 numtickets smallint not null,
 priceperticket decimal(8,2),
 totalprice decimal(8,2),
 listtime timestamp);

create table sales(
 salesid integer not null,
 listid integer not null distkey,
 sellerid integer not null,
 buyerid integer not null,
 eventid integer not null,
 dateid smallint not null sortkey,
 qtysold smallint not null,
 pricepaid decimal(8,2),
 commission decimal(8,2),
 saletime timestamp);

Now load sample data. Ensure that in below query you replace "<iam-role-arn>" with your ARN.
So your query should look like.

copy users from 's3://awssampledbuswest2/tickit/allusers_pipe.txt' 
credentials 'aws_iam_role=arn:aws:iam::123456789123:role/redshiftrole' 
delimiter '|' region 'us-west-2';

copy users from 's3://awssampledbuswest2/tickit/allusers_pipe.txt' 
credentials 'aws_iam_role=<iam-role-arn>' 
delimiter '|' region 'us-west-2';

copy venue from 's3://awssampledbuswest2/tickit/venue_pipe.txt' 
credentials 'aws_iam_role=<iam-role-arn>' 
delimiter '|' region 'us-west-2';

copy category from 's3://awssampledbuswest2/tickit/category_pipe.txt' 
credentials 'aws_iam_role=<iam-role-arn>' 
delimiter '|' region 'us-west-2';

copy date from 's3://awssampledbuswest2/tickit/date2008_pipe.txt' 
credentials 'aws_iam_role=<iam-role-arn>' 
delimiter '|' region 'us-west-2';

copy event from 's3://awssampledbuswest2/tickit/allevents_pipe.txt' 
credentials 'aws_iam_role=<iam-role-arn>' 
delimiter '|' timeformat 'YYYY-MM-DD HH:MI:SS' region 'us-west-2';

copy listing from 's3://awssampledbuswest2/tickit/listings_pipe.txt' 
credentials 'aws_iam_role=<iam-role-arn>' 
delimiter '|' region 'us-west-2';

copy sales from 's3://awssampledbuswest2/tickit/sales_tab.txt'
credentials 'aws_iam_role=<iam-role-arn>'
delimiter '\t' timeformat 'MM/DD/YYYY HH:MI:SS' region 'us-west-2';

Once you have loaded the data you can run sample queries like below in your SQL Workbench.

Congrats! You have finally created the Redshift cluster and run queries on it after loading data.
Refer this post if you want to reset the master user password.
Don't forget to cleanup the cluster or you will be billed.

For deleting the cluster just click on the Cluster(in our case it's testdw) in the AWS console.
Click on "Cluster" drop down and select delete.

That will cleanup everything.
Hope this guide was helpful to you! Do let me know in the comment section if you have any queries or suggestions .

How to create a multi node redshift cluster?

In this tutorial we will see how to create a multi node Amazon Redshift cluster. If you are new to Redshift refer to our AWS Crash Course – Redshift .
Pre-requisites :-
You will have to create an IAM role and Security Group for Redshift.
Refer How to create a Security Group and How to create an IAM role .
Once you have created the role and security group follow below steps:

Login to Amazon Redshift Console . For this tutorial we will be launching cluster in US East region.
In the Dashboard click on Launch Cluster.

In the cluster details section fill in the details as below:-

Cluster identifier:- testdwDatabase name:- testdbDatabase port:- 5439Master user name:- testuserMaster user password:- <your password>

In the node configuration section select details as in below screenshot:-

Note:- We will be creating 2 node cluster so select multi-node.

Once done click continue.
In the additional configuration screen leave the top section as default (refer image below). Just change “VPC security groups” to “redshiftsg”and “Available roles” to “redshiftrole” that we created earlier.

Click on Continue and Review everything. It will show you approximate cost of running this cluster but if you are using “free tier” you may run it for free if you have not exhausted your “free tier” limits.

Finally click on launch cluster.

If you have followed everything as in above tutorial your cluster will be ready in around 15 minutes.

Hope this tutorial was helpful to you. In the next post we will discuss how to load data in this cluster and query the data.

How to create an IAM role in AWS?

In this tutorial we will be creating an IAM role which we will use to launch an Amazon Redshift cluster.

Login to your IAM Console .
In the left Navigation Pane Click on “Roles”

Click on “Create New Role”
In the AWS Service Role Select “Amazon Redshift” .
In the Policy Type Select “AmazonS3FullAccess” and click on “Next Step” .
Enter details as below:-

Role Name as :-  redshiftroleRole description: - Amazon Redshift Role

Finally Click on Create Role .