How to setup an AWS EC2 instance

René-Jean Corneille
6 min readMay 28, 2018

The recent developement in data storage and processing have been motivated by the increasing amount and complexity of data available to individuals and companies. Most of these recent advancements require sophisticated and powerful hardware. Aspiring data scientists must be able to understand and master these new tools.

For instance, given the computer power required to carry out training of algorithms, these are usually carried out “on the cloud” (i.e. by remote access to a virtual machine) which avoid to buy and maintain very expensive hardware especially if the peak usage is only occasional. Cloud computing allow to perform these very intensive tasks and must now be part of the toolbox of any data scientist.

One of the most popular service provider is Amazon which provide an IaaS (infrastructure as a service) with AWS. I have to admit that the AWS interface is dauting and might be very offputting for a beginner. I started using AWS back in 2012 to run Monte Carlo simulations because back then I had a cheap computer and I followed classes on two campuses and the one with the powerful computers I would go to only every other week. I recently stumbled upon my notes when I was trying to figure out how to launch an AWS instance and realized how much the interface had changed since then. I thus decided to update them and in the process to make them available to anyone starting out with AWS.

I assume that the reader already has an AWS account and has a basic knowledge of the Linux terminal commands. If not, here is a quick tutorial. Setting up an EC2 (Elastic Cloud Compute) is a matter of 5 steps:

1. generate a key pair

As I said before, since the computing resource is in the cloud, we need to access it remotely, in order to do so securely, we use an ssh connection that will require a key pair. You can generate one by going into the AWS EC2 dashboard:

1–1. AWS services dashboard (select EC2)

And then you need to select the Key Pairs tab:

1–2. EC2 Dashboard (select Key Pairs)

You can now create a key pair:

1–3. Key pairs list

You only need to give the key pair a name:

1–4. Create new key pair

This will download the key pair in a .pem file. It is not possible to connect to the instance without this file so do not lose it. I repeat: DO. NOT. LOSE. IT. You can still retrieve your instance in this case, AWS provides some information here. Some additional precautions must be taken to ensure that only the administrator can read this file by using this command in the terminal (after a cd in the directory where the file was downloaded):

chmod 400 myec2instance.pem

2. create a new user (optional but recommended)

Now that you have you key pair, you need to create a new user. This step is not required if only one user connects to the instance, but necessary if the connection is shared (setup access rights). To do so, one need to go to the IAM (Identity and Access Management) section:

2–1. Services (IAM)

Then add a new user:

2–2. Users list

the new user name must be completed and its access (programmatic and/or through the AWS console):

2–3. Add user

When all of this is chosen, the user permission must be chosen (this is the most important step). Here since we have only one user, we setup administrator rights:

2–4. Set permission

Then you can download the user access IDs:

2–5. New user created

Now that the new user is created, it is possible to access the AWS plateform with the link provided in the downloaded csv:

2–5. Login

New users do not have root access, only the original account has. In any case, I would advise against using the root access for doing this tutorial: it is just best practice not to get to know the interface with a root access.

3. create billing alert (optional but strongly recommended)

Not only this is best practice, it is in your best interest to set up a billing alert. This is no joke. I have seen plenty of horror stories on reddit on this subject matter. It only takes 5 minutes and it can save you some headaches. You first need to go to CloudWatch US East (N. Virginia) region where the billing alerts are handled.

3–1. CloudWatch US East (N. Virginia) region

Then type billing in the metric search bar and select Total Estimated Charge:

3–2. Select Metric

Then select the EstimatedCharges metric in the “All Metric” tab:

3–3. Select the metric

And then in the “Graphed Metric” tab we can now set an alarm:

3–4. Setup alarm

Then it is possible to choose the threshold amount and the email address to contact when the estimated cost reaches this level:

3–5. Set up alarm level and notification list

4. create a new instance

Now that we have created an alert, we can dive in. Virtual machines can be intanciated in then EC2 Dashboard. It can be accessed as follows:

4–1. EC2 service

Back to the EC2 dashboard, and in the “Instances” select “Launch Instance”:

4–2. Launch Instance

The new instance OS is chosen among the available images on AWS (here I choose Ubuntu Server 16.04 LTS).

4–3. Choose AMI (Ubuntu Server 16.04 LTS)

Following this, the computing power of the instance must be chosen. There are many options. For a test, it is better to choose a small, general purpose configuration:

4–4. Choose Instance Type

Before launching the virtual machine, the last step is to choose the key pair associated.

4–5. Choose Key Pair

5. connect remotely to instance

After a few minutes, the new instance should be running. It is possible to check in the EC2 dashboard:

5–1. EC2 Running instances

By clicking on the 1 Running Instances link, we get:

5–2 Instance Description

Now, in order to connect to the instance remotely, we need to do it through an SSH client, the key pair will allow us to securely connect to the server:

ssh -i myec2instance.pem ubuntu@[EC2 Public DNS]

And you are all set. Some of my upcoming posts (Docker, API with Flask,…) will require to setup an EC2 instance, then this tutorial will come in handy if you still follow me then.

--

--

René-Jean Corneille

Director of ML. I write about data science, mlops, python and sometimes C++