How to create AWS ParallelCluster with Slurm scheduler: Difference between revisions

Revision as of 09:09, 4 July 2019

Environment preparation phase within AWS Management Console

Login the AWS IAM console:

https://console.aws.amazon.com/iam/

From the left pane, click on Policies -> click on Users -> Add User -> specify the name parallelcluster-user -> Access type: Programmatic access -> click Next: Permissions -> Set permissions -> select a group with “AdministratorAccess” role -> click Next: Tags -> click Next: Review -> click on Create user -> click on Download .csv and keep it in a secured location -> click on Close
Follow the instructions below to create a key pair to access the cluster machines:

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-create-your-key-pair

Follow the instructions below to create S3 bucket (with unique name) for storing data to export and import data to/from the FSx Lustre storage:

https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html

Note 1: Document the S3 bucket name for use inside the ParallelCluster config file

Note 2: Create a folder called export (in small letters), inside the S3 bucket

In-case you wish to create a dedicate VPC and subnet for the HPC cluster, follow the instructions below:

https://docs.aws.amazon.com/directoryservice/latest/admin-guide/gsg_create_vpc.html

Logoff the AWS console

Python installation phase on Linux (Debian / Ubuntu)

Login to a Linux machine using SSH, and follow the instructions below to install Python 3:

https://docs.aws.amazon.com/cli/latest/userguide/install-linux-python.html

Note: In-case you already have Python 3 install, use the command below to upgrade to the latest build:

sudo apt-get upgrade python3

To install pip3, run the command below:

sudo apt install python3-pip

Python installation phase on Windows

Login to a Windows machine using privileged account, and follow the instructions below to install Python 3 and PIP:

https://docs.aws.amazon.com/cli/latest/userguide/install-windows.html

AWS ParallelCluster installation phase

Run the commands below to install the AWS ParallelCluster:

Linux:

sudo pip install aws-parallelcluster

Windows:

pip install aws-parallelcluster

Run the command below to verify the installed version:

pcluster version

Follow the instructions below to install the AWS CLI:

https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html

Run the command below in-order to configure AWS CLI:

aws configure

AWS Access Key ID – Specify the value from the CSV of the previously created IAM user parallelcluster-user
AWS Secret Access Key – Specify the value from the CSV of the previously created IAM user parallelcluster-user
Default region name – specify a region such as eu-west-1

Full list: https://docs.aws.amazon.com/general/latest/gr/rande.html

Default output format: JSON

Run the command below to setup the initial configuration:

pcluster configure

Cluster Template: Specify here a custom name for the HPC template (such as HPC Cluster)
AWS Region ID: Specify the same region you specified for the aws configure command (such as eu-west-1)
VPC Name: Specify the same name as the Cluster Template (such as HPC Cluster)
Key Name: Specify the name of the EC2 Key pair previously created
VPC ID: Specify the name of the target VPC ID to deploy the HPC cluster into

Note: The full list of VPC’s can be found within the AWS management console: https://console.aws.amazon.com/vpc

Master Subnet ID: Specify here the name of the target subnet ID to deploy the HPC cluster into

Note: The full list of subnets can be found within the AWS management console: https://console.aws.amazon.com/vpc

Edit the ParallelCluster config file:

Linux: The file is located inside ~/.parallelcluster/config
Windows: The file is located inside %UserProfile%\.parallelcluster\config

Add the following parameters to the [cluster] section (for a large cluster):

base_os = centos7

master_instance_type = c5n.xlarge

compute_instance_type = c5n.18xlarge

cluster_type = ondemand

initial_queue_size = 2

scheduler = slurm

placement_group = DYNAMIC

enable_efa = compute

fsx_settings = fs

Note: For small cluster, add the following parameters to the [cluster] section:

base_os = centos7

master_instance_type = m4.large

compute_instance_type = m4.large

cluster_type = ondemand

initial_queue_size = 2

max_queue_size = 3

scheduler = slurm

placement_group = DYNAMIC

fsx_settings = fs

Add the following entire section to the config file:

[fsx fs]

shared_dir = /fsx

storage_capacity = 3600

imported_file_chunk_size = 1024

export_path = s3://bucket/export

import_path = s3://bucket

weekly_maintenance_start_time = 1:00:00

Note 1: The storage_capacity is the size of the FSx Lustre storage in GB

Note 2: Replace the value of bucket with the previously S3 bucket name

@@ Line 52: / Line 52: @@
 :* '''Master Subnet ID''': Specify here the name of the target subnet ID to deploy the HPC cluster into
 :: Note: The full list of subnets can be found within the AWS management console: https://console.aws.amazon.com/vpc
-* Edit the ParallelCluster config file:
+* Edit the ParallelCluster '''config''' file:
 :* Linux: The file is located inside '''~/.parallelcluster/config'''
 :* Windows: The file is located inside '''%UserProfile%\.parallelcluster\config'''
+* Add the following parameters to the '''[cluster]''' section (for a large cluster):
+: '''base_os = centos7'''
+: '''master_instance_type = c5n.xlarge'''
+: '''compute_instance_type = c5n.18xlarge'''
+: '''cluster_type = ondemand'''
+: '''initial_queue_size = 2'''
+: '''scheduler = slurm'''
+: '''placement_group = DYNAMIC'''
+: '''enable_efa = compute'''
+: '''fsx_settings = fs'''
+: Note: For small cluster, add the following parameters to the '''[cluster]''' section:
+: '''base_os = centos7'''
+: '''master_instance_type = m4.large'''
+: '''compute_instance_type = m4.large'''
+: '''cluster_type = ondemand'''
+: '''initial_queue_size = 2'''
+: '''max_queue_size = 3'''
+: '''scheduler = slurm'''
+: '''placement_group = DYNAMIC'''
+: '''fsx_settings = fs'''
+* Add the following entire section to the '''config''' file:
+: '''[fsx fs]'''
+: '''shared_dir = /fsx'''
+: '''storage_capacity = 3600'''
+: '''imported_file_chunk_size = 1024'''
+: '''export_path = s3://bucket/export'''
+: '''import_path = s3://bucket'''
+: '''weekly_maintenance_start_time = 1:00:00'''
+: Note 1: The storage_capacity is the size of the FSx Lustre storage in GB
+: Note 2: Replace the value of bucket with the previously S3 bucket name

How to create AWS ParallelCluster with Slurm scheduler: Difference between revisions

Revision as of 09:09, 4 July 2019

Contents

Environment preparation phase within AWS Management Console

Python installation phase on Linux (Debian / Ubuntu)

Python installation phase on Windows

AWS ParallelCluster installation phase

Navigation menu

How to create AWS ParallelCluster with Slurm scheduler: Difference between revisions

Revision as of 09:09, 4 July 2019

Environment preparation phase within AWS Management Console

Python installation phase on Linux (Debian / Ubuntu)

Python installation phase on Windows

AWS ParallelCluster installation phase

Navigation menu

Search