Cloud Vedas: AWS

Showing posts with label AWS. Show all posts

Review of acloudguru course AWS Certified Solutions Architect –Associate

In this post we are writing a review of acloudguru course AWS Certified Solutions Architect – Associate on Udemy .
When I decided to go for AWS Certified Solutions Architect – Associate exam I was confused about how to start with preparation. Syllabus of the exam is vast and covers a lot of topics. I needed someone to guide me for this exam.
After going through many forums I came to the conclusion that I’ve two options.

To attend the course at one of the training institutes. The official training cost range from $800 to $1500. Even the institutes offering non-official training quoted $400 to $900. This was huge money considering that my last company was not sponsoring this course and I have to pay for it myself.
Second option was to look for online courses. These course are cheap ranging from $10 to $100 but I was not sure about the quality of content and how much I can learn with just online training.

After going through multiple forums and blogs I ended up listing online courses from two vendors Linux Academy and acloudguru.
Linux Academy has good material and people have written some good feedback for them. But the problem with them is that you have to buy their monthly subscription with plans starting from $29. I knew that with a full-time job I won’t be able to complete the course properly in 1 month.
So I checked on acloudguru course. Many people have given good feedback for this course too. The course was listed on acloud.guru website for $29. But I found the same course on Udemy for $10 in a special offer at that time. Udemy also provides you free lifetime access to the course. I found this was most suitable for me so purchased it. Also once you buy the course from Udemy you can get free access to the same course on acloud.guru website also.
Now coming to the acloudguru AWS Certified Solutions Architect – Associate course itself.
As of Apr-2018 course has around 22 hours of on-demand video.
The course is very comprehensive and it gives you a good understanding of topics. Instructor(Ryan) is very energetic and he tries to teach you the topic in a very simple way. So even if you don’t have background in AWS then also you will understand the topics with not much effort.
It covers a lot of practical labs which you can follow by creating your own AWS free tier account. It covers almost all topics of the exam. So you don’t have to worry about the syllabus.
It also has quizzes in between the course for knowledge check and also full length mock exams to test your understanding.
One of the recommendation for the exam preparation is to go through the white papers. Each of the white paper is a big pdf of 50 to 200 pages. And for the associate level you are supposed to read around 6 to 7 of them. This can be overwhelming but Ryan cover the gist of these whitepapers in his course.
The course is very useful to get you started on the AWS journey. But the course alone is not enough to clear the exam. You need to have good practical knowledge of AWS for clearing the exam and this course give you the correct direction.
You can get good sample exam questions for practice from another course of acloudguru on Udemy Exam Questions – AWS Solution Architect Associate or from Jon AWS Certified Solutions Architect Associate Practice Exams for which most students have written great feedback.

How to create an IAM user in AWS

In this post we will see how to create an IAM user which can be used to access S3 using CLI.

Login to AWS IAM console.
In the left pane click on “Users”
Click on “Add user”

a) User name: S3User
b) Access Type: Check Programmatic Access.

Click Next

a) Select Attach existing policies directly.
b) Search for AmazonS3FullAccess and select it.

Click Next.
Review everything and click “Create user”

It will show you “Access key ID” and “Secret access key”. Save them as you won’t be able to see the “Secret access key” once you close this page.
In this tutorial we have selected the existing S3 policy but you can also attach your own customized policy to make the access more secure.
Congrats you have created an IAM user successfully. You can use this to access S3 using CLI. Check this post for details.

AWS Crash Course - EMR

What is EMR?

AWS EMR(Elastic MapReduce) is a managed hadoop framework.
It provides you an easy, cost-effective and highly scalable way to process large amount of data.
It can be used for multiple things like indexing, log analysis, financial analysis, scientific simulation, machine learning etc.

Cluster and Nodes

The centerpiece of EMR is Cluster.
Cluster is a collection of EC2 instances also called as nodes.
All nodes of an EMR cluster are launched in same availability zone.
Each node has a role in cluster.

Type of EMR Cluster Nodes
Master Node:- It’s the main boss which manages the cluster by running software components and distributing the tasks to other nodes. Master node will monitor task status and health of cluster.
Core Node:- It’s a slave node which “run tasks” and “store data” in HDFS (Hadoop Distributed Filesystem).
Task Node:- This is also a slave node but it only “run tasks”. It doesn’t store any data. It’s an optional node.
Cluster Types
EMR has two type of clusters
1) Transient :- These are clusters which are shutdown once the jobs is done. These are useful when you don’t need cluster to be running all day long and can save money by shutting them down.
2) Persistent :- Persistent clusters are those which need to be always available to process the continuous stream of jobs or you want the data to be always available on HDFS.
Different Cluster States
An EMR cluster goes through multiple stages as described below:-
STARTING – The cluster provisions, starts, and configures EC2 instances.
BOOTSTRAPPING – Bootstrap actions are being executed on the cluster.
RUNNING – A step for the cluster is currently being run.
WAITING – The cluster is currently active, but has no steps to run.
TERMINATING – The cluster is in the process of shutting down.
TERMINATED – The cluster was shut down without error.
TERMINATED_WITH_ERRORS – The cluster was shut down with errors.

Types of filesystem in EMR
Hadoop Distributed File System (HDFS)
Hadoop Distributed File System (HDFS) is a distributed, scalable file system for Hadoop. HDFS distributes the data it stores across instances in the cluster, storing multiple copies of data on different instances to ensure that no data is lost if an individual instance fails. HDFS is ephemeral storage that is reclaimed when you terminate a cluster.
EMR File System (EMRFS)
Using the EMR File System (EMRFS), Amazon EMR extends Hadoop to add the ability to directly access data stored in Amazon S3 as if it were a file system like HDFS. You can use either HDFS or Amazon S3 as the file system in your cluster. Most often, Amazon S3 is used to store input and output data and intermediate results are stored in HDFS.
Local File System
The local file system refers to a locally connected disk. When you create a Hadoop cluster, each node is created from an Amazon EC2 instance that comes with a preconfigured block of preattached disk storage called an instance store. Data on instance store volumes persists only during the lifecycle of its Amazon EC2 instance.
Programming languages supported by EMR

Perl
Python
Ruby
C++
PHP
R

EMR Security

EMR integrates with IAM to manage permissions.
EMR has Master and Slave security groups for nodes to control the traffic access.
EMR supports S3 server-side and client-side encryption with EMRFS.
You can launch EMR clusters in your VPC to make it more secure.
EMR integrates with CloudTrail so you will have log of all activites done on cluster.
You can login via ssh to EMR cluster nodes using EC2 Key Pairs.

EMR Management Interfaces

Console :- You can manage your EMR clusters from AWS EMR Console .
AWS CLI :- Command line provides you a rich way of controlling the EMR. Refer here the EMR CLI .
Software Development Kits (SDKs) :- SDKs provide functions that call Amazon EMR to create and manage clusters. It’s currently available only for the supported languages mentioned above. You can check here some sample code and libraries.
Web Service API :- You can use this interface to call the Web Service directly using JSON. You can get more information from API reference Guide .

EMR Billing

You pay for EC2 instances used in cluster and EMR.
You are charged for per instance hours.
EMR supports On-Demand, Spot, and Reserved Instances
As a cost saving measure it is recommenced that task nodes should be Spot instances
It’s not a good idea to use spot instances for Master or Core Node as they store data on them. And you will lose data once the node is terminated.

If you want to try some EMR hands on refer this tutorial.

This AWS Crash Course series is created to give you a quick snapshot of AWS technologies. You can check about other AWS services in this series over here .

Solved: How to change hostname in AWS EC2 instance of RHEL 7

In our last post we have seen how to change hostname of an RHEL server.
But if you are using the RHEL 7 AMI provided on AWS marketplace the steps will be slightly different.
First login to your EC2 instance. (Check this post to know How to login to AWS EC2 Linux instance.)
Once you login to your EC2 instance execute below command.

 sudo hostnamectl set-hostname --static cloudvedas

(Here “cloudvedas” is the new hostname.)
If you want to make it persistent across reboot follow further.
Now using vi or vim editor edit the file /etc/cloud/cloud.cfg

sudo vi /etc/cloud/cloud.cfg

At the end of file add the following line and save the file

preserve_hostname: true

Finally reboot the server

sudo reboot

Once the server is up, check the hostname.

ec2-user# hostname

cloudvedas

ec2-user#

It should now show you the new hostname.

Solved: How to change the IAM role in Redshift cluster

In our last post we checked How to create a Redshift Cluster?
In this post we will see how to change the IAM role assigned to Redshift cluster.(Check earlier post to know how to create an IAM role.)
Note: Be careful to test the new role first in dev. Don’t try this in prod directly.

First create a role as per your requirement.
Next login to Redshift Console . Do check you are in correct region.
Select the Cluster by clicking on check box.
On the top you will see an option of “Manage IAM Roles”. Click on that.
Select the role you created recently from the drop down and click “Apply Changes”
It will take 3-5 minutes to reflect the changes.

Solved: How to reset the Master User Password of Redshift Cluster

In our last post we checked How to create a Redshift Cluster?
At times you may want to reset the password of the master user in DB of Redshift.
In this post we will see how to reset the password of master user in Redshift cluster.
Note:- Ensure that you really want to change the password. Changing the password in production can break apps which may have old password stored anywhere.

Go to Redshift console and click on cluster for which you want to reset the password. In our case it was “testdw” .
Once you are inside the cluster configuration click on the “Cluster” drop down and select “Modify”.
You will see the field “Master user password”. Enter the new password in it.
Finally click “Modify” .

This will change the master user password. Try connecting to the cluster with the new password.

Solved: How to login to AWS EC2 Linux Instance

In this post we will discuss how you can login to your AWS EC2 linux instance using Putty.
Pre-requisites :-

Your private key with .pem extension you provided while creating EC2 instance.
Download Putty
Download PuttyGen

Once you are done with the pre-requisites let’s move ahead.
Convert .pem key to .ppk

First we will convert the .pem key to .ppk key.
Click on PuttyGen you downloaded.
Click on “Load”. Browse and select your private key with .pem extension.

Now click on “Save private key” .
It will ask if you want to add passphrase. It’s like additional password when you login. If you want you can enter passphrase in “Key passphrase”.
For this exercise i just clicked on “Yes” .

Save key with the name you like. Check that the new key file now have .ppk extension.

Using the Key for Login
Now we will use the .ppk key we just created to login to our ec2 instance.

Open Putty that we downloaded earlier.
In the left Pane click on Session.

In hostname enter you server details like user name and IP.
If you are using Amazon Linux Image the default user is ec2-user. So entry will be like ec2-user@33.44.55.66 and Port 22.

In the left navigation pane click on “Connection” and expand it.

Next expand “SSH” and click on “Auth” (refer image below).

In the right pane click on Browse and select the .ppk key we created earlier.

Now in the left navigation pane click on “Session” again. In the right pane in the “Saved sessions”, name the session as “test” or whatever you like and click save. This will save your session so that you don’t have to do this activity again.
Finally select the session you created and click “Open”. If all is configured correctly you will now be logged in to you EC2 instance.

Note:- If your ssh session gets timed out after being idle for few minutes check this post on how to set putty keep alive time.

Solved: "Network error: Software caused connection abort"

Some time you may have noticed that your putty session is getting disconnected with error “Network error: Software caused connection abort”
This can happen because of time out setting on server or sometime due to firewall. To resolve this issue you will have to set a keep alive time for the session.
After you set a keep alive time putty will send a packet after the specified seconds to keep the session live.
Generally you can set it to 240 seconds i.e. 4 minutes. But at times you may have to keep it low. Like when I connect from my home laptop to my AWS EC2 instance I’ve to keep it at 2 secs.
To set it:

Open Putty.
Load the session for which you are facing timeout issue.
Click on Connection in the left pane .

Here we have set Seconds between keepalives to 2 . (refer image below)

Finally click on “Session” in left pane and save the session.

If you already have other saved sessions in putty you will have to repeat above steps for each of the saved session if needed.

Comparing AWS, Azure and Redhat exams

In last couple of years I’ve given certification exams of multiple cloud providers. While AWS and Azure exams are more theoretical and based on multiple choice questions. Redhat Openstack is practical lab exam. In this article I’ll be discussing the pros and cons of the different exam patterns.
AWS
Passed AWS CSA – Associate and Professional
Pros:-

You can refer to any question any time you want during the exam.
Exam tests you on a wide range of topic.
Even if you have made mistakes in the beginning you can recover by reviewing the questions later.
No datasheet type questions like “How much RAM does C3.Xlarge offers?”
You get 1 Year free tier which is great to learn about AWS.

Cons:-

Exams are expensive in comparison to Microsoft Azure (at least in India)
In many question it just tests your reading speed.
No version, so you don’t know if you should answer as per recent announcement or old method available.

Azure
Passed Architecting Microsoft Azure Solutions.
Pros:-

Azure exams are not very expensive in comparison to AWS or Red Hat.
Exam tests you on a wide range of topic.
Even if you have made mistakes in the beginning you can recover by solving correctly other answers in later sections.

Cons:-

No version, so you don’t know if you should answer as per recent announcement or old method available.
It’s a race against time.
The most insane thing I found in the exam is that with each case study you get 5 to 6 questions. But, for about 2 to 3 questions in that case study you can’t refer back to case study. I don’t understand why Microsoft expects you to remember the whole 2 page case study.
30 days free tier is too less to know about azure.

Update:- Azure is now(Oct-17) offering 12 months free trial.
Redhat Openstack
Passed RHCSA in Redhat Openstack
Pros:-

The difficulty level of the exam is medium.
You generally get questions on tasks which you will be doing in real life.
You get only 15-20 questions (tasks) and even if someone knows those questions beforehand he will have to do the tasks practically to pass the exam. So even if anyone has dumps, they are useless.
It has versioning so you know you have to answer as per Redhat Openstack version 6 or 8.
Can play with Redhat Openstack by installing it in your desktop or laptop. Good Learning!

Cons:-

Exams are expensive in comparison to Microsoft Azure.
If you have made a mistake at the beginning or in the middle of exam chances are you will mess up the whole exam or waste lot of time correcting it.
If your machine doesn’t work properly you may lose time. But generally examiners take care of this.

In the end, I’d like to say that professional exams should not be like your college entrance exams where they mostly test your reading speeds and cramming abilities. But it’s OK for them as the undergrad and grad have limited practical experience.
Professional exams should be more practical, that makes you sure that if a person has cleared the exam, he definitely know how to do that stuff in real life.
If you want to know how to prepare for these exams refer my post for AWS , Azure and Redhat Openstack .

AWS Crash Course - Redshift

Redshift is a data warehouse from Amazon. It’s like a virtual place where you store a huge amount of data.

Redshift is fully managed petabyte-scale system
Amazon Redshift is based on PostgreSQL 8.0.2
It is optimized for data warehousing
Supports integrations and connections with various applications, including, Business Intelligence tools
Redshift provides custom JDBC and ODBC drivers.
Redshift can be integrated with CloudTrail for auditing purpose.
You can monitor Redshift performance from CloudWatch.

Features of Amazon Redshift
Supports VPC − The users can launch Redshift within VPC and control access to the cluster through the virtual networking environment.
Encryption − Data stored in Redshift can be encrypted and configured while creating tables in Redshift.
SSL − SSL encryption is used to encrypt connections between clients and Redshift.
Scalable − With a few simple clicks, you can choose vertical scaling(increasing instance size) or horizontal scaling(increasing compute nodes).
Cost-effective − Amazon Redshift is a cost-effective alternative to traditional data warehousing practices. There are no up-front costs, no long-term commitments and on-demand pricing structure.
MPP(Massive Parallel Processing) – Redshift Leverages parallel processing which improves query performance. Massively parallel refers to the use of a large number of processors (or separate computers) to perform coordinated computations in parallel. This reduces computing time and improves query performance.
Columnar Storage – Redshift uses columnar storage. So it stores data tables by column rather than by row. The goal of a columnar database is to efficiently write and read data to and from hard disk storage in order to speed up the time it takes to return a query.
Advanced Compression – Compression is a column-level operation that reduces the size of data when it is stored thus help in saving space.
Type of nodes in Redshift
Leader Node
Compute Node
What does these nodes do?
Leader Node:- A leader node receives queries from client applications, parse the queries and develops execution plans, which are an ordered set of steps to process these queries. The leader node then coordinates the parallel execution of these plans with the compute nodes. Good part is that you will not be charged for leader node hours; only compute nodes will incur charges. If you run single node Redshift cluster you don’t need leader node.
Compute Node:- It execute the steps specified in the execution plans and transmit data among other nodes to serve queries. The intermediate results are sent back to the leader node for aggregation before being sent back to the client applications. You can have 1 to 128 Compute Nodes.
From which sources you can load data in Redshift?
You can do it from multiple sources like :-

Amazon S3
Amazon DynamoDB
Amazon EMR
AWS Data Pipeline
Any SSH-enabled host on Amazon EC2 or on-premises

Redshift Backup and Restores

Redshift can take automatic snapshots of cluster.
You can also take manual snapshot of cluster.
Redshift continuously backs up its data to S3
Redshift attempt to keep at least 3 copies of the data.

Hope the above snapshot give you a decent understanding of Redshift. If you want to try some handson check this tutorial .
This series is created to give you a quick snapshot of AWS technologies. You can check about other AWS services in this series over here .

How to load data in Amazon Redshift Cluster and query it?

In the last post we checked How to build a Redshift Cluster? . In this post we will see how to load data in that cluster and query it.
Pre-requisite :-
Download SQL Workbench
Download Redshift Driver
Once you have downloaded the above mentioned pre-requisites let's move ahead.
First we will obtain the JDBC URL.

Login to your AWS Redshift Console .
Click on the cluster you have created. If you have followed the last post it will be "testdw" .
In the "Configuration" tab look for JDBC URL .
Copy the JDBC URL and save it in notepad.

Now open the SQL Workbench. You just have to click on 32-bit or 64-bit exe as per your OS version. In my case I am using Windows 10 64-bit so the exe name is SQLWorkbench64 .

Click on File > Connect window.
In the bottom left of the "Select Connection Profile" window click on "Manage Drivers"
In the "Manage Drivers" window click on the folder icon, browse to the location of the Redshift driver you downloaded earlier and select it.

Fill other details in "Manage Drivers" Window as below.

Name:- Redshiftdriver JDBC 4.2
Classname:- com.amazon.redshift.jdbc.Driver

Click OK
Now in the "Select Connection Profile" window

Fill details as below. You can also refer the image below for this.

Driver:- Select the Redshift driver you added.
URL:- Mention the JDBC URL you saved earlier.
Username:- DB username you mentioned during cluster creation.
Password:- Enter password of the DB user .

Check the Autocommit box.

Finally click on OK.
If everything is configured correctly. You will get connected to DB .
Try executing the query

select * from information_schema.tables;

If you connection is successful you will see results in the window.
Now we will load some sample data which is provided by AWS and kept on S3. In the SQL Workbench copy/paste the below query and execute to create a table.

create table users(
 userid integer not null distkey sortkey,
 username char(8),
 firstname varchar(30),
 lastname varchar(30),
 city varchar(30),
 state char(2),
 email varchar(100),
 phone char(14),
 likesports boolean,
 liketheatre boolean,
 likeconcerts boolean,
 likejazz boolean,
 likeclassical boolean,
 likeopera boolean,
 likerock boolean,
 likevegas boolean,
 likebroadway boolean,
 likemusicals boolean);

create table venue(
 venueid smallint not null distkey sortkey,
 venuename varchar(100),
 venuecity varchar(30),
 venuestate char(2),
 venueseats integer);

create table category(
 catid smallint not null distkey sortkey,
 catgroup varchar(10),
 catname varchar(10),
 catdesc varchar(50));

create table date(
 dateid smallint not null distkey sortkey,
 caldate date not null,
 day character(3) not null,
 week smallint not null,
 month character(5) not null,
 qtr character(5) not null,
 year smallint not null,
 holiday boolean default('N'));

create table event(
 eventid integer not null distkey,
 venueid smallint not null,
 catid smallint not null,
 dateid smallint not null sortkey,
 eventname varchar(200),
 starttime timestamp);

create table listing(
 listid integer not null distkey,
 sellerid integer not null,
 eventid integer not null,
 dateid smallint not null  sortkey,
 numtickets smallint not null,
 priceperticket decimal(8,2),
 totalprice decimal(8,2),
 listtime timestamp);

create table sales(
 salesid integer not null,
 listid integer not null distkey,
 sellerid integer not null,
 buyerid integer not null,
 eventid integer not null,
 dateid smallint not null sortkey,
 qtysold smallint not null,
 pricepaid decimal(8,2),
 commission decimal(8,2),
 saletime timestamp);

Now load sample data. Ensure that in below query you replace "<iam-role-arn>" with your ARN.
So your query should look like.

copy users from 's3://awssampledbuswest2/tickit/allusers_pipe.txt' 
credentials 'aws_iam_role=arn:aws:iam::123456789123:role/redshiftrole' 
delimiter '|' region 'us-west-2';

copy users from 's3://awssampledbuswest2/tickit/allusers_pipe.txt' 
credentials 'aws_iam_role=<iam-role-arn>' 
delimiter '|' region 'us-west-2';

copy venue from 's3://awssampledbuswest2/tickit/venue_pipe.txt' 
credentials 'aws_iam_role=<iam-role-arn>' 
delimiter '|' region 'us-west-2';

copy category from 's3://awssampledbuswest2/tickit/category_pipe.txt' 
credentials 'aws_iam_role=<iam-role-arn>' 
delimiter '|' region 'us-west-2';

copy date from 's3://awssampledbuswest2/tickit/date2008_pipe.txt' 
credentials 'aws_iam_role=<iam-role-arn>' 
delimiter '|' region 'us-west-2';

copy event from 's3://awssampledbuswest2/tickit/allevents_pipe.txt' 
credentials 'aws_iam_role=<iam-role-arn>' 
delimiter '|' timeformat 'YYYY-MM-DD HH:MI:SS' region 'us-west-2';

copy listing from 's3://awssampledbuswest2/tickit/listings_pipe.txt' 
credentials 'aws_iam_role=<iam-role-arn>' 
delimiter '|' region 'us-west-2';

copy sales from 's3://awssampledbuswest2/tickit/sales_tab.txt'
credentials 'aws_iam_role=<iam-role-arn>'
delimiter '\t' timeformat 'MM/DD/YYYY HH:MI:SS' region 'us-west-2';

Once you have loaded the data you can run sample queries like below in your SQL Workbench.

Congrats! You have finally created the Redshift cluster and run queries on it after loading data.
Refer this post if you want to reset the master user password.
Don't forget to cleanup the cluster or you will be billed.

For deleting the cluster just click on the Cluster(in our case it's testdw) in the AWS console.
Click on "Cluster" drop down and select delete.

That will cleanup everything.
Hope this guide was helpful to you! Do let me know in the comment section if you have any queries or suggestions .

How to create a multi node redshift cluster?

In this tutorial we will see how to create a multi node Amazon Redshift cluster. If you are new to Redshift refer to our AWS Crash Course – Redshift .
Pre-requisites :-
You will have to create an IAM role and Security Group for Redshift.
Refer How to create a Security Group and How to create an IAM role .
Once you have created the role and security group follow below steps:

Login to Amazon Redshift Console . For this tutorial we will be launching cluster in US East region.
In the Dashboard click on Launch Cluster.

In the cluster details section fill in the details as below:-

Cluster identifier:- testdwDatabase name:- testdbDatabase port:- 5439Master user name:- testuserMaster user password:- <your password>

In the node configuration section select details as in below screenshot:-

Note:- We will be creating 2 node cluster so select multi-node.

Once done click continue.
In the additional configuration screen leave the top section as default (refer image below). Just change “VPC security groups” to “redshiftsg”and “Available roles” to “redshiftrole” that we created earlier.

Click on Continue and Review everything. It will show you approximate cost of running this cluster but if you are using “free tier” you may run it for free if you have not exhausted your “free tier” limits.

Finally click on launch cluster.

If you have followed everything as in above tutorial your cluster will be ready in around 15 minutes.

Hope this tutorial was helpful to you. In the next post we will discuss how to load data in this cluster and query the data.

How to create an IAM role in AWS?

In this tutorial we will be creating an IAM role which we will use to launch an Amazon Redshift cluster.

Login to your IAM Console .
In the left Navigation Pane Click on “Roles”

Click on “Create New Role”
In the AWS Service Role Select “Amazon Redshift” .
In the Policy Type Select “AmazonS3FullAccess” and click on “Next Step” .
Enter details as below:-

Role Name as :-  redshiftroleRole description: - Amazon Redshift Role

Finally Click on Create Role .

How to create a security group in AWS?

In this tutorial we will create a security group which we will use further while creating our Redhshift Cluster.

Login to your AWS VPC Console .
In the left Navigation Pane select Security Groups .
Once in Security groups section click on Create Security Group .
Enter details as below:

Name tag: redshiftsgGroup name: redshiftsgDescription: resdshift security groupVPC: <Select your VPC or if you are not sure leave it as default>

Once details are filled Click on Yes, Create .
Now select the security group you just created.
In the bottom pane click on inbound rules.

Click Edit
Select Redshift in Type and Source 0.0.0.0/0 . By this we are basically allowing all inbound traffic. You can also mention specific IP range in source.

Leave the outbound rule as “All traffic”

Finally click save.

Congrats! You have successfully created a security group.

What is MFA and Why its important? - Security in Cloud

When you talk about cloud you may have heard about MFA.
Why MFA is important? Before I go into that let me tell you a couple of true incidents.
You may have heard about a company called Code Spaces. But for those who don’t know Code Spaces “was” a code repository company like GitHub.
One day a hacker got control of the AWS Management Console of Code Space.
Hacker sent a mail to company asking for huge ransom to be paid in 12 hours or he will delete all the data in their AWS account. Many of the Code Spaces engineers tried to get access of their company’s AWS account but failed as the hacker created many backdoor users in account. At the end of time limit hacker deleted everything which was in company’s AWS account which includes all servers, databases and even backups. And within a day company almost get wiped out. You can read the full story here .
Now you must be thinking that I am just a developer or sysops guy and not much of value is in my personal AWS account so i can live without MFA. Unfortunately you are wrong my friend.
We have seen many incidents when developers put their Access key ID and Secret Access Key in code so that they don’t have to authenticate manually. But these developers may want to work with their friend on the code and they upload it on GitHub. Now the problem is that there are hackers who just search the GitHub to get these keys. And, once they get the key they can easily login to your AWS account. You may not have anything valuable in your AWS account, but what the hackers do is they spin up huge instances in your account . They use this computing power for Bit Mining. Once they are done with mining they get the money and you get the bill from AWS. (At times we have seen that AWS may waive off your bill in this situation as one time exception but if you are not so lucky, you may have to pay the huge bill in hundreds or thousands of dollars.) You can check Joe’s detailed story here .
In both the cases the hacking could have been avoided if MFA was activated on account.
So what is MFA?
MFA is Multi Factor Authentication. This is like second level of security for your account.

MFA can be activated through multiple ways including SMS(Text message) or an Application like Google Authenticator in you mobile phone. By enabling this you add an additional authentication to your account. So once you enable MFA you will enter an additional changing code with your normal login ID and password.
For my account I use Google Authenticator. It’s a free App available on iOS and Android app stores.
How can you enable MFA?
It’s simple, Amazon has given clear steps about how to do it. But if you are not sure follow this post How to enable MFA in AWS for free?
MFA is easiest and quickest way of defense against hackers. You should also follow other security hardening measures to keep yourself safe in cloud. I’ll discuss about them in next post.
For now it’s recommended you should enable MFA in your account ASAP. Even if you are using Azure or any other Cloud you should enable MFA. Remember security in Cloud is shared responsibility.
Be Safe!

Solved: How to enable MFA in AWS for free?

MFA is extremely important to keep your account safe.
Below we will show you how to enable MFA using Google Authenticator. It’s a free app available on both iOS and Android stores.

Login to your IAM console at https://console.aws.amazon.com/iam/
In the IAM Dashboard, below Security Status you will see an option of “Activate MFA on your root account”
Click on “Activate MFA on your root account” .
Click on “Manage MFA“
In the pop-up select “A virtual MFA device” and click on Next.
Now download ” Google Authenticator ” on your mobile from iOS or Android app store.
Once the app is installed open the App and click on the + sign in the App.
It will give you an option of “Scan a Barcode”. Click on that.
Now go back to your AWS console and click next.
You will now see a QR code in the AWS. Scan it with your Mobile with the app you opened in step 8.
Now you will see a 6 digit code in your phone. Enter the code in AWS in “Authentication Code 1”.
After few seconds you will see a new code in phone enter that new code in AWS in “Authentication Code 2” . Ensure that both codes are generated in consecutive sequence.
Click on “Activate Virtual MFA“.
That’s all! Next time when you will login to AWS console you will need the code of “Google Authenticator” with your user id and password.This small activity will keep you safe.

If you want to enable MFA for a specific user. Check this post MFA device for a user .
Next Step should be to set billing alert which will let you know if you are going above your billing limits. Check this post AWS Billing Alert .

AWS Crash Course - SQS

Today we will discuss about an AWS messaging service called SQS.

SQS is Simple Queue Service.
It’s a messaging queue service which acts like a buffer between message producing and message receiving components.

Using SQS you can decouple the components of an application.
Messages can contain upto 256 KB of text in any format.
Any component can later retrieve the messages programmatically using the SQS API.
SQS queues are dynamically created and scale automatically so you can build and grow applications quickly – and efficiently.
You can combine SQS with auto scaling of EC2 instances as per warm up and cool down.
Used by companies like Vodafone, BMW, RedBus, Netflix etc.
You can use Amazon SQS to exchange sensitive data between applications using server-side encryption (SSE) to encrypt each message body.
SQS is pull(or poll) based system. So messages are pulled from SQS queues.
Multiple copies of every message is stored redundantly across multiple availability zones.
Amazon SQS is deeply integrated with other AWS services such as EC2, ECS, RDS, Lambda etc.

Two types of SQS queues:-

Standard Queue
FIFO Queue

Standard Queue :-

Standard Queue is the default type offered by SQS
Allows nearly unlimited transactions per second.
Guarantees that a message will be delivered at least once.
But it can deliver the message more than once also.
It provides best effort ordering.
Messages can be kept from 1 minute to 14 days. Default is 4 days.
It has a visibility time out window. And if order is not processed till that time, it will become visible again and processed by another reader.

FIFO Queue :-

FIFO queue complements the standard queue.
It has First in First Out delivery mechanism.
Messages are processed only once.
Order of the message is strictly preserved.
Duplicates are not introduced in the queue.
Supports message groups.
Limited to 300 transactions per second.

Hope the above snapshot give you a decent understanding of SQS. If you want to try some handson check this tutorial .
This series is created to give you a quick snapshot of AWS technologies. You can check about other AWS services in this series over here .

Which AWS certification is suitable for me?

Many people have asked me which AWS certification should they do that can help in their career in near future.
AWS provides below certifications
Beginner
AWS Certified Cloud Practitioner
Associate level
AWS Certified Developer – Associate
AWS Certified SysOps Administrator – Associate
AWS Certified Solutions Architect – Associate
Professional level
AWS Certified DevOps Engineer – Professional
AWS Certified Solutions Architect – Professional
Specialty Certifications
AWS Certified Big Data – Specialty
AWS Certified Advanced Networking – Specialty
Now coming to the point which AWS certification is best for you.
If you are a fresher you should go for either AWS Certified Cloud Practitioner or AWS Certified Developer – Associate . These are easiest of all AWS exams and will give you a good base.
If you have less than 5 to 6 years of experience in IT industry and you are from development background in that case you should go for AWS Certified Developer – Associate certification.
Similarly, if you have less than 5 to 6 years of experience in IT industry and you are from system admin background you should go for AWS Certified SysOps Administrator – Associate certification.
If you have 8 to 9+ years of experience in IT industry you can go for AWS Certified Solutions Architect – Associate certification. Reason for more experience in Architect certification is that generally you can’t send a 2-3 years experience person in front of the client as an Architect. Because client may not accept such a young person to be given responsibility of designing their environment. However, exceptions are always there but this is generally the trend we have seen.
Also, if you are from pre-sales and sales background you can go for AWS Certified Cloud Practitioner and later to Architect exam.
If you are from Database Admin background then going for AWS Certified Big Data – Specialty certification will be a natural progression and a big plus in your resume.
If you are from networking background then AWS Certified Advanced Networking – Specialty certification will be a good choice and natural progression for you.
Once you have gained approximately 2 years of experience with AWS you can go for professional certifications. Dev and SysOps guys can go for AWS Certified DevOps Engineer – Professional and Architects can naturally move to AWS Certified Solutions Architect – Professional.
Learning about all of the above certification will be a huge advantage but to start with above approach should be a good beginning.
We have recently seen a trend where jobs are being posted only for Redshift Architect or IAM Architect so things may change in future when people will look for specialists in only one or two AWS services.
If you want to have a quick snapshot of AWS services you can refer to our free AWS Crash Course.
Do let me know what you think about this and if you have any query or suggestions ask in comments section.

AWS Sample Exam Questions - Part 1

Below are some sample exam questions and you can expect similar questions in exam. You can get more such questions in acloudguru sample exam questions for AWS Certified Developer – Associate and here for AWS Certified Solutions Architect – Associate .
Q1. A 3-tier e-commerce web application is current deployed on-premises and will be migrated to AWS for greater scalability and elasticity The web server currently shares read-only data using a network distributed file system The app server tier uses a clustering mechanism for discovery and shared session state that depends on IP multicast The database tier uses shared-storage clustering to provide database fall over capability, and uses several read slaves for scaling Data on all servers and the distributed file system directory is backed up weekly to off-site tapes
Which AWS storage and database architecture meets the requirements of the application?
A. Web servers, store read-only data in S3, and copy from S3 to root volume at boot time App servers snare state using a combination or DynamoDB and IP unicast Database use RDS with multi-AZ deployment and one or more Read Replicas Backup web and app servers backed up weekly via Mils database backed up via DB snapshots.
B. Web servers store -read-only data in S3, and copy from S3 to root volume at boot time App servers share state using a combination of DynamoDB and IP unicast Database, use RDS with multi-AZ deployment and one or more read replicas Backup web servers app servers, and database backed up weekly to Glacier using snapshots.
C. Web servers store read-only data In S3 and copy from S3 to root volume at boot time App servers share state using a combination of DynamoDB and IP unicast Database use RDS with multi-AZ deployment Backup web and app servers backed up weekly via AM is. database backed up via DB snapshots
D. Web servers, store read-only data in an EC2 NFS server, mount to each web server at boot time App servers share state using a combination of DynamoDB and IP multicast Database use RDS with multl-AZ deployment and one or more Read Replicas Backup web and app servers backed up weekly via Mils database backed up via DB snapshots
Answer:- A
B: You can’t directly backup EBS snapshots to Glacier they first need to be move to S3 standard.
C: doesn’t have DB read replicas
D: AWS does not support IP Multicast.
So only A seems to be correct.
Q2. Your customer wishes to deploy an enterprise application to AWS which will consist of several web servers, several application servers and a small (50GB) Oracle database information is stored, both in the database and the file systems of the various servers. The backup system must support database recovery whole server and whole disk restores, and individual file restores with a recovery time of no more than two hours. They have chosen to use RDS Oracle as the database
Which backup architecture will meet these requirements?
A. Backup RDS using automated daily DB backups Backup the EC2 instances using AMI s and supplement with file-level backup to S3 using traditional enterprise backup software to provide file level restore
B. Backup RDS using a Multi-AZ Deployment Backup the EC2 instances using AMI s, and supplement by copying file system data to S3 to provide file level restore.
C. Backup RDS using automated daily DB backups Backup the EC2 instances using EBS snapshots and supplement with file-level backups to Amazon Glacier using traditional enterprise backup software to provide file level restore
D. Backup RDS database to S3 using Oracle RMAN Backup the EC2 instances using AMI s, and supplement with EBS snapshots for individual volume restore.
Answer :- A
B:- Multi AZ is for DR.
C:-Glacier takes 3-4 hours so RTO of 2 Hours can’t be respected.
D:- RDS doesn’t use RMAN backups
Q3. Is there a limit to the number of groups you can have?
A.Yes for all users except root
B. No
C. Yes unless special permission granted
D. Yes for all users
Answer :- D
As per this doc http://docs.aws.amazon.com/IAM/latest/UserGuide/reference_iam-limits.html limits are for all users.
Q4. What is the charge for the data transfer incurred in replicating data between your primary and standby?
A. No charge. It is free.
B. Double the standard data transfer charge
C. Same as the standard data transfer charge
D. Half of the standard data transfer charge
Answer:- A
You are not charged for the data transfer incurred in replicating data between your source DB Instance and Read Replica.
Q5. Through which of the following interfaces is AWS Identity and Access Management available?
A) AWS Management Console
B) Command line interface (CLI)
C) IAM Query API
D) Existing libraries
A.Only through Command line interface (CLI)
B.A, B and C
C.A and C
D. All of the above
Answer:- B
IAM can be accessed through Console, CLI and API .
All the questions posted here are taken from public internet and answers are marked as per our understanding. These are not exam dumps.
If you have any concern or suggestion please update in comments or contact us through the contact page.

AWS Crash Course - RDS

Welcome to AWS Crash Course.
What is RDS?

RDS is Relational Database Service of Amazon.
It is part of its PaaS offering.
A new DB instance can easily be launched from AWS management console.
Complex administration process like patching, backup etc. are manged automatically by RDS.
Amazon has its own relational database called Amazon Aurora.
RDS also supports other popular database engines like MySQL, Oracle, SQL Server, PostgreSQL and MariaDB .

RDS Supports Multi AZ(Availability Zone) failovers.
What does that mean?
It means if your primary DB is down. Services will automatically failover to secondary DB in other AZ.

Multi-AZ deployments for MySQL,Oracle and PostgreSQL engines utilizes synchronous physical replication to keep data on the standby up-to-date with Primary.
Multi-AZ deployments for the SQL server engine use synchronous logical replication to achieve the same result, employing SQL server native mirroring tech.
Both approaches safeguard your data in event of a DB instance failure or loss of AZ.
Backups are taken from secondary DB which avoids I/O suspension to the primary.
Restore’s are taken from secondary DB which avoids I/O suspension to the primary.
You can force a failover from one AZ to another by rebooting your DB instance.

But RDS Multi AZ failover is not a scaling Solution.
Read Replicas are for Scaling.
What are Read Replicas?
As we discussed above Multi AZ is synchronous replication of DB. While read replicas are asynchronous replication of DB.

You can have 5 read replicas for both MySQL and PostgreSQL.
You can have read replicas in different regions but for MySQL only.
Read replica’s can be built off Multi-AZ’s databases.
You can have read replica’s of read replica’s , however only for MySQL and this will further increase latency.
You can use read replicas for generating reports. By this you won’t put load on the primary DB.

RDS supports automated backups.
But keep these things in mind.

There is a performance hit if Multi-AZ is not enabled.
If you delete an instance then all automated backups are deleted, however manual db snapshots will not be deleted.
All snapshots are stored on S3.
When you do a restore , you can change the engine type(e.g. SQL standard to SQL enterprise) provided you have enough storage space.

Hope the above snapshot give you a decent understanding of RDS. If you want to try some handson check this tutorial .
This series is created to give you a quick snapshot of AWS technologies. You can check about other AWS services in this series over here .