How to create HPC Cluster based on Azure CycleCloud
Jump to navigation
Jump to search
Installing Azure CLI
- Login to the machine using privileged account.
- Download the latest build of Azure CLI:
- Windows download instruction and location:
- Linux download instruction and location:
Login to the Azure subscription
- Run the command below to login to the Azure subscription:
- az login
- List available subscriptions:
- az account list --output table
- Change the context to a specific Azure subscription:
- az account set --subscription "My Subscription"
- Note: Replace "My Subscription” with the relevant subscription name
- Run the command below to verify the currently selected Azure subscription:
- az account show
Configure CycleCloud pre-requirements
- From a command prompt, run the command below to configure a service principal:
- az ad sp create-for-rbac --name CycleCloudApp-MySubscription --years 1
- Note: Replace CycleCloudApp-MySubscription with a unique name, relevant to the target Azure subscription
- Document the command output (appId, displayName, name, password, tenant)
- From Linux command prompt, run the command below to generate SSH key pair:
- ssh-keygen -f ~/.ssh/id_rsa -m pem -t rsa -N "" -b 4096
Deploy the Azure CycleCloud
- Login to the Azure Portal:
- From the left pane click on Subscriptions -> from the main pane, click on “global subscriptions filter” -> make sure the target Azure subscription is the only subscription selected
- Inside the top search pane, write “Marketplace”
- Inside the “Search the Marketplace” field, write “Azure CycleCloud”
- Click on Create:
- Subscription: Select the target subscription name
- Resource group: Select “Create new”
- Virtual machine name: Specify here the Azure CycleCloud VM name
- Region: Specify a region close to your location (such as West Europe)
- Note: For full list of regions, see:
- https://azure.microsoft.com/en-us/global-infrastructure/locations/
- Availability options: Leave the default settings
- Image: Leave the default “Azure CycleCloud”
- Size: For small HPC cluster, you may select Standard D2s v3 (2 vCPU, 8GB), otherwise, leave the default Standard D4s v3 (4 vCPU, 16GB)
- Authentication type: SSH public key
- Username: Specify here a username for the cluster admin account (for SSH login to the machine)
- Note: Remember to document this detail for future use
- SSH public key: Paste here the content of the SSH public key (id_rsa.pub) previously created
- Click Next: Disks
- OS disk type – For test purpose or small scale cluster, select “Standard SSD”
- For production or large scale cluster, select “Premium SSD”
- Click Next: Networking
- Virtual network: Leave default settings
- Subnet: Leave default settings
- Configure network security group: Click on “Create new” -> leave the default settings -> click OK
- Click Next: Management
- Boot diagnostics: Select Off
- Click Next: Advanced -> click Next: Tags
- Name: Specify here Project
- Value: Specify here Azure subscription name
- Click Next: Review + create
- Click on Create
- Wait for the deployment process to complete
- When the deployment process completes, click on Go to resource
- From the newly created VM “Overview” page, locate the Public IP address and document it.
Azure CycleCloud Initial Setup
- Open a Browser, and go the URL below:
- https://My_CycleCloud_IP
- Note 1: Replace My_CycleCloud_IP with the public IP address of the CycleCloud VM
- Note 2: Ignore the SSL certificate warning (by default CycleCloud is deployed with self-signed certificate)
- Site name: specify here unique name for the cluster
- Click Next
- Select “I agree” and click Next
- User ID: Specify an admin account for managing the CycleCloud
- Name: Specify a full name for the CycleCloud admin account
- Password: Specify here complex password for the CycleCloud admin account and confirm the password
- SSH public key: Paste here the content of the SSH public key (id_rsa.pub)
- Click on Done
- On the “No Account Configured” page, click on “Click here”
- Account Name: Specify here the service principal name previously created
- Tenant ID: Specify here the “tenant” information of the service principal
- Application ID: Specify here appId of the service principal
- Application Secret: Specify here the service principal password
- Click on “Validate Credentials”
- Default Location: Select the same region where the CycleCloud was deployed into (such as West Europe)
- Resource Group: Select the resource group where the CycleCloud was deployed into
- Storage Account: Specify a name for a storage account (up to 24 characters, lowercase and numbers)
- Click on Save
Azure CycleCloud configuration and cluster installation phase
- Login using SSH to the CycleCloud server (using the public IP address from the Azure portal, and the SSH key previously created)
- Copy the SSH key pair generated at the beginning of the guideline into the CycleCloud server, subfolder /home/cycleadmin/.ssh
- Note: Replace cycleadmin with the CycleCloud admin account (to login using SSH)
- Run the commands below to change the ownership of the SSH key pair:
- cd ~
- chown cycleadmin:cycleadmin ~/.ssh/id_rsa ~/.ssh/id_rsa.pub
- Note: Replace cycleadmin with the CycleCloud admin account (to login using SSH)
- Run the commands below to change the permissions of the SSH key pair:
- chmod 600 ~/.ssh/id_rsa
- chmod 644 ~/.ssh/id_rsa.pub
- Duplicate the CycleCloud Key:
- cp ~/.ssh/id_rsa ~/.ssh/cyclecloud.pem
- Run from CLI the command below to initialize the cluster:
- cyclecloud initialize
- CycleServer URL: Change the value to https://localhost
- Detected untrusted certificate: Yes
- CycleServer username: Specify here the previously created CycleCloud admin account
- CycleServer password: Specify here the password for the CycleCloud admin account
- Note: The configuration will be saved into ~/.cycle/config.ini
Lustre deployment
- Login to the CycleCloud server using SSH
- Run the commands below to install Git:
- sudo yum install git
- Run the command below to download from the CycleCloud Lustre Git repository:
- cd ~
- git clone https://github.com/hmeiland/cyclecloud-lustre.git
- cd cyclecloud-lustre
- Run the command below to list the CycleCloud lockers:
- cyclecloud locker list
- Note: Document the output of the above command
- Run the command below to upload a CycleCloud project:
- cyclecloud project upload <locker-from-previous-step>
- Note: Replace <locker-from-previous-step> with the value of the list command
- Run the command below to import the Lustre template:
- cyclecloud import_template -f templates/lustre.txt
- Open a Browser and login to the CycleCloud web console:
- https://My_CycleCloud_IP
- Note 1: Replace My_CycleCloud_IP with the public IP address of the CycleCloud VM
- Note 2: Ignore the SSL certificate warning (by default CycleCloud is deployed with self-signed certificate)
- From the left pane, click on Clusters -> from the main pane, click on Lustre icon
- Cluster name: Specify here cyclecloud-lustre
- Click on Next -> Networking -> On the region, select the target region where the CycleCloud is deployed into (such as West Europe)-> Subnet ID -> select the subnet where the CycleCloud server is deployed into -> click Next -> Advanced Settings -> leave the default settings -> click Next -> Lustre Settings:
- Blob Account: Specify here the name of Azure storage account previously created (from within the Azure Portal -> Storage accounts)
- Blob Key: Specify here 1st storage account access key (from within the Azure Portal -> Storage accounts -> Access keys)
- Blob container: Specify here the name of the Azure blob container (from within the Azure Portal -> Storage accounts -> Blobs)
- Click on Save
- From the cyclecloud-lustre page, click on Start -> click OK
- Wait for the new cluster process deployment to complete (2 cluster nodes will be created and the status will appear in green, without errors)
- Logoff the CycleCloud web console
- Return to the CycleCloud server SSH console
- Run the command below to view the status of the Lustre cluster:
- cyclecloud show_cluster cyclecloud-lustre
- Note: cyclecloud-lustre is the Lustre cluster name we have previously created
Slurm cluster deployment
- Login to the CycleCloud server using SSH
- Run the command below to download from Slurm Git repository:
- cd ~
- git clone https://github.com/andreas-wilm/cyclecloud-dingo-compute
- cd cyclecloud-dingo-compute
- Run the command below to list the CycleCloud lockers:
- cyclecloud locker list
- Note: Document the output of the above command
- Run the command below to upload a CycleCloud project:
- cyclecloud project upload <locker-from-previous-step>
- Note: Replace <locker-from-previous-step> with the value of the list command
- Run the command below to import the Slurm template:
- cyclecloud import_template -f templates/dingo-compute.txt
- Open a Browser and login to the CycleCloud web console:
- https://My_CycleCloud_IP
- Note 1: Replace My_CycleCloud_IP with the public IP address of the CycleCloud VM
- Note 2: Ignore the SSL certificate warning (by default CycleCloud is deployed with self-signed certificate)
- From the left pane, click on Clusters -> from the lower pane, click on + sign (Add) -> from the main pane, click on the “dingo-compute” icon
- Cluster name: Slurm
- Region: Select the target region where the CycleCloud is deployed into (such as West Europe)
- Subnet ID -> select the subnet where the CycleCloud server is deployed into
- Click Next -> Advanced Settings -> leave the default settings -> click Next -> Virtual Machines:
- Master VM Type: For small HPC cluster, you may select Standard D2s v3 (2 vCPU, 8GB), otherwise, leave the default Standard D4s v3 (4 vCPU, 16GB)
- Execute VM Type: For small cluster, select Standard D4s v3 (4 vCPU, 16GB), for high performance cluster, select a VM type from the list:
- https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-hpc
- Note: In-case you have requirement for Infiniband, select VM type from the above list that specifically says it support Infiniband
- Max Cores: Specify here the maximum number of cores in the cluster, once the auto-scaling is enabled
- Low Priority: Select this feature if you have requirement for cost and your workload support that fact that a VM might be taken by Azure and you code support continuous running from a new VM
- Click Next
- Lustre Cluster: Select the name of the Lustre cluster
- Slurm Version: Specify the build of Slurm (such as 17.11.12-1)
- Click Save
- From the Slurm page, click on Start -> click OK
- Wait for the new cluster process deployment to complete (A cluster node will be created and the status will appear in green, without errors)
- Logoff the CycleCloud web console
- Return to the CycleCloud server SSH console
- Run the command below to view the status of the Lustre cluster:
- cyclecloud show_cluster Slurm
Check HPC cluster status
- To make sure a Slurm cluster was successfully created and mounted to Lustre file system, login to the Master node VM using SSH
- Note: The Master node public IP address can be found inside the Azure Portal -> Virtual Machines or from the CycleCloud web console -> Clusters -> Slurm
- Run the commands below to check the Slurm cluster status:
- sinfo
- squeue
- Run the command below to make sure the Lustre file system was automatically mounted inside the Master node:
- mount | grep lustre
Terminate the entire Azure CycleCloud environment
- Follow the instructions in the link below:
- https://docs.microsoft.com/en-us/azure/cyclecloud/end-cluster
- Note: In-case you used an existing Azure resource group, you have to manually select the CycleCloud resources and delete them
References
- Azure CycleCloud Quickstarts
- Azure CycleCloud Quickstart 2: Create and Run a Simple HPC Cluster
- Manual Installation
- Creating a Slurm/MPI/Lustre HPC cluster with Azure CycleCloud
- Cluster Templates
- CycleCloud Slurm
- Parallel Virtual File Systems on Microsoft Azure