How to create AWS ParallelCluster with Slurm scheduler: Difference between revisions

From PUBLIC-WIKI
Jump to navigation Jump to search
No edit summary
No edit summary
Line 52: Line 52:
:* '''Master Subnet ID''': Specify here the name of the target subnet ID to deploy the HPC cluster into
:* '''Master Subnet ID''': Specify here the name of the target subnet ID to deploy the HPC cluster into
:: Note: The full list of subnets can be found within the AWS management console: https://console.aws.amazon.com/vpc
:: Note: The full list of subnets can be found within the AWS management console: https://console.aws.amazon.com/vpc
* Edit the ParallelCluster config file:
* Edit the ParallelCluster '''config''' file:
:* Linux: The file is located inside '''~/.parallelcluster/config'''
:* Linux: The file is located inside '''~/.parallelcluster/config'''
:* Windows: The file is located inside '''%UserProfile%\.parallelcluster\config'''
:* Windows: The file is located inside '''%UserProfile%\.parallelcluster\config'''
* Add the following parameters to the '''[cluster]''' section (for a large cluster):
: '''base_os = centos7'''
: '''master_instance_type = c5n.xlarge'''
: '''compute_instance_type = c5n.18xlarge'''
: '''cluster_type = ondemand'''
: '''initial_queue_size = 2'''
: '''scheduler = slurm'''
: '''placement_group = DYNAMIC'''
: '''enable_efa = compute'''
: '''fsx_settings = fs'''
: Note: For small cluster, add the following parameters to the '''[cluster]''' section:
: '''base_os = centos7'''
: '''master_instance_type = m4.large'''
: '''compute_instance_type = m4.large'''
: '''cluster_type = ondemand'''
: '''initial_queue_size = 2'''
: '''max_queue_size = 3'''
: '''scheduler = slurm'''
: '''placement_group = DYNAMIC'''
: '''fsx_settings = fs'''
* Add the following entire section to the '''config''' file:
: '''[fsx fs]'''
: '''shared_dir = /fsx'''
: '''storage_capacity = 3600'''
: '''imported_file_chunk_size = 1024'''
: '''export_path = s3://bucket/export'''
: '''import_path = s3://bucket'''
: '''weekly_maintenance_start_time = 1:00:00'''
: Note 1: The storage_capacity is the size of the FSx Lustre storage in GB
: Note 2: Replace the value of bucket with the previously S3 bucket name

Revision as of 09:09, 4 July 2019

Environment preparation phase within AWS Management Console

  • Login the AWS IAM console:
https://console.aws.amazon.com/iam/
  • From the left pane, click on Policies -> click on Users -> Add User -> specify the name parallelcluster-user -> Access type: Programmatic access -> click Next: Permissions -> Set permissions -> select a group with “AdministratorAccess” role -> click Next: Tags -> click Next: Review -> click on Create user -> click on Download .csv and keep it in a secured location -> click on Close
  • Follow the instructions below to create a key pair to access the cluster machines:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-create-your-key-pair
  • Follow the instructions below to create S3 bucket (with unique name) for storing data to export and import data to/from the FSx Lustre storage:
https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html
Note 1: Document the S3 bucket name for use inside the ParallelCluster config file
Note 2: Create a folder called export (in small letters), inside the S3 bucket
  • In-case you wish to create a dedicate VPC and subnet for the HPC cluster, follow the instructions below:
https://docs.aws.amazon.com/directoryservice/latest/admin-guide/gsg_create_vpc.html
  • Logoff the AWS console

Python installation phase on Linux (Debian / Ubuntu)

  • Login to a Linux machine using SSH, and follow the instructions below to install Python 3:
https://docs.aws.amazon.com/cli/latest/userguide/install-linux-python.html
Note: In-case you already have Python 3 install, use the command below to upgrade to the latest build:
sudo apt-get upgrade python3
  • To install pip3, run the command below:
sudo apt install python3-pip

Python installation phase on Windows

  • Login to a Windows machine using privileged account, and follow the instructions below to install Python 3 and PIP:
https://docs.aws.amazon.com/cli/latest/userguide/install-windows.html

AWS ParallelCluster installation phase

  • Run the commands below to install the AWS ParallelCluster:
  • Linux:
sudo pip install aws-parallelcluster
  • Windows:
pip install aws-parallelcluster
  • Run the command below to verify the installed version:
pcluster version
  • Follow the instructions below to install the AWS CLI:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html
  • Run the command below in-order to configure AWS CLI:
aws configure
  • AWS Access Key ID – Specify the value from the CSV of the previously created IAM user parallelcluster-user
  • AWS Secret Access Key – Specify the value from the CSV of the previously created IAM user parallelcluster-user
  • Default region name – specify a region such as eu-west-1
Full list: https://docs.aws.amazon.com/general/latest/gr/rande.html
  • Default output format: JSON
  • Run the command below to setup the initial configuration:
pcluster configure
  • Cluster Template: Specify here a custom name for the HPC template (such as HPC Cluster)
  • AWS Region ID: Specify the same region you specified for the aws configure command (such as eu-west-1)
  • VPC Name: Specify the same name as the Cluster Template (such as HPC Cluster)
  • Key Name: Specify the name of the EC2 Key pair previously created
  • VPC ID: Specify the name of the target VPC ID to deploy the HPC cluster into
Note: The full list of VPC’s can be found within the AWS management console: https://console.aws.amazon.com/vpc
  • Master Subnet ID: Specify here the name of the target subnet ID to deploy the HPC cluster into
Note: The full list of subnets can be found within the AWS management console: https://console.aws.amazon.com/vpc
  • Edit the ParallelCluster config file:
  • Linux: The file is located inside ~/.parallelcluster/config
  • Windows: The file is located inside %UserProfile%\.parallelcluster\config
  • Add the following parameters to the [cluster] section (for a large cluster):
base_os = centos7
master_instance_type = c5n.xlarge
compute_instance_type = c5n.18xlarge
cluster_type = ondemand
initial_queue_size = 2
scheduler = slurm
placement_group = DYNAMIC
enable_efa = compute
fsx_settings = fs
Note: For small cluster, add the following parameters to the [cluster] section:
base_os = centos7
master_instance_type = m4.large
compute_instance_type = m4.large
cluster_type = ondemand
initial_queue_size = 2
max_queue_size = 3
scheduler = slurm
placement_group = DYNAMIC
fsx_settings = fs
  • Add the following entire section to the config file:
[fsx fs]
shared_dir = /fsx
storage_capacity = 3600
imported_file_chunk_size = 1024
export_path = s3://bucket/export
import_path = s3://bucket
weekly_maintenance_start_time = 1:00:00
Note 1: The storage_capacity is the size of the FSx Lustre storage in GB
Note 2: Replace the value of bucket with the previously S3 bucket name