by iterative.ai
Get started
Edit on GitHub

Command Reference: runner

cml runner launch [options]

Starts a runner (either via any supported cloud compute provider or locally on-premise).

Options

Any generic option in addition to:

  • --labels=<...>: One or more (comma-delimited) labels for this runner [default: cml].
  • --name=<...>: Runner name displayed in the CI [default: cml-{ID}].
  • --idle-timeout=<seconds>: Seconds to wait for jobs before terminating. Set to -1 to disable timeout [default: 300].
  • --no-retry: Don't restart the workflow when terminated due to instance disposal or GitHub Actions timeout.
  • --single: Terminate runner after one workflow run.
  • --reuse: Don't launch a new runner if an existing one has the same name or overlapping labels. If an existing matching (same name or overlapping labels) instance is busy, it'll still be reused.
  • --reuse-idle: Creates a new runner only if the matching labels don't exist or are already busy.
  • --cloud={aws,azure,gcp,kubernetes}: Cloud compute provider to host the runner.
  • --cloud-type={m,l,xl,m+k80,m+v100,...}: Instance type. Also accepts native types such as t2.micro.
  • --cloud-gpu={nogpu,k80,v100,tesla}: GPU type.
  • --cloud-hdd-size=<...>: Disk storage in GB.
  • --cloud-spot: Request a preemptible spot instance.
  • --cloud-spot-price=<...>: Maximum spot instance USD bidding price, [default: current price].
  • --cloud-region={us-west,us-east,eu-west,eu-north,...}: Region where the instance is deployed. Also accepts native AWS/Azure region or GCP zone [default: us-west].
  • --cloud-permission-set=<...>: AWS instance profile or GCP instance service account. More details below.
  • --cloud-metadata=<...>: key=value pair to associate with cloud runner instances. May be specified multiple times.
  • --cloud-startup-script=<...>: Run the provided Base64-encoded Linux shell script during the instance initialization. More details below.
  • --cloud-ssh-private=<key>: Private SSH RSA key [default: auto-generate throwaway key]. Only supported on AWS and Azure; intended for debugging purposes. More details below.
  • --cloud-aws-security-group=<...>: AWS security group identifier.
  • --cloud-aws-subnet=<...>: AWS subnet identifier.
  • --cloud-kubernetes-node-selector=<...>: key=value pair to specify the Kubernetes node selector. May be specified multiple times. More details below. [default: accelerator=infer]
  • --docker-volumes=<...>: Volume mount to pass to Docker, e.g. /var/run/docker.sock:/var/run/docker.sock for Docker-in-Docker support. May be specified multiple times. Only supported by GitLab.

FAQs and Known Issues

GitHub

  • GitHub Actions timeout after a few hours.

    You can request up to 35 days via timeout-minutes: 50400. CML will helpfully restart GitHub Actions workflows approaching 35 days (you'd need to write your code to save intermediate results to take advantage of this).

Examples

Using --cloud-permission-set

Format

The associated cloud credentials must grant access to resources needed for managing compute instances.

An AWS ARN to an instance-profile:

  • arn:aws:iam::1234567890:instance-profile/dvc-s3-access
$ cml runner launch \
  --cloud-permission-set=arn:aws:iam::1234567890:instance-profile/dvc-s3-access \
  ...

A GCP service account email & list of scopes:

  • my-sa@myproject.iam.gserviceaccount.com,scopes=storage-rw,datastore
  • my-sa@myproject.iam.gserviceaccount.com,scopes=storage-rw
$ cml runner launch \
  --cloud-permission-set=my-sa@myproject.iam.gserviceaccount.com,scopes=storage-rw,datastore \
  ...

Common Permissions

It's recommended to use provider-managed policies/roles and then explicitly limit the permissions further if possible.

For example this could potentially be further limited to:

ec2:CreateSecurityGroup -- (Firewall and SSH Access Management)
ec2:AuthorizeSecurityGroupEgress
ec2:AuthorizeSecurityGroupIngress
ec2:DescribeSecurityGroups
ec2:DescribeSubnets
ec2:DescribeVpcs
ec2:DescribeInstanceTypeOfferings
ec2:ImportKeyPair
ec2:DeleteKeyPair
ec2:CreateTags -- (General Resource Management)
ec2:RunInstances -- (EC2 Instance Management)
ec2:DescribeImages
ec2:DescribeInstances
ec2:TerminateInstances
ec2:DescribeSpotInstanceRequests -- (Optionally needed for Spot Access)
ec2:RequestSpotInstances
ec2:CancelSpotInstanceRequests

GCP Managed Roles:

  • roles/compute.admin
  • roles/iam.serviceAccountUser

For example this could potentially be further limited to:

compute.diskTypes.get
compute.disks.create
compute.firewalls.create
compute.firewalls.delete
compute.globalOperations.get
compute.instances.create
compute.instances.delete
compute.instances.get
compute.instances.list
compute.instances.setMetadata
compute.instances.setServiceAccount
compute.instances.setTags
compute.machineTypes.get
compute.networks.create
compute.networks.get
compute.networks.updatePolicy
compute.subnetworks.use
compute.subnetworks.useExternalIp
compute.zoneOperations.get
compute.zones.get
compute.zones.list
iam.serviceAccounts.actAs

You may also require additional permissions specific to your application (for example: object storage, private docker registries, and other cloud services). These additional permissions should be managed separately, and exposed either as independent credentials or via --cloud-permission-set

Currently this feature is only available on AWS & GCP clouds.

A set of permissions for a cml runner instance can be predefined a via an AWS role or a GCP service account.

This could, for example, enable credential-free access to AWS s3 & GCP gs DVC remotes, or grant access to AWS' Elastic Container Registry & GCP's Artifact Registry (to push and pull custom docker images).

Other AWS examples include accessing data in:

  • Secrets Manager
  • DynamoDB
  • Redshfit

Example "Permission Sets"

Policy example:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DVCAccess",
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::mydvcbucket/*"
    }
  ]
}

Trust relationships:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "CMLRunnerInstance",
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Using --cloud-permission-set will likely require:

  • an additional role be added to your cml runner credentials roles/ServiceAccountUser,
  • ensuring the invoker has the permission iam.serviceAccount.actAs on the targeted Service Account.

Using --cloud-startup-script

A base64-encoded script to execute during cloud instance provisioning (after cml runner does its initial setup but before the runner becomes available to the CI/CD provider).

This script counts towards the total provisioning time. The total exceeding 10 minutes is considered a failure, resulting in cml runner terminating the instance and exiting with an error.

For example:

$ cml runner launch \
  --cloud-startup-script=IyEvYmluL2Jhc2gKCmVjaG8gImhlbGxvIHdvcmxkIgo= \
  ...

where echo IyEvYmluL2Jhc2gKCmVjaG8gImhlbGxvIHdvcmxkIgo= | base64 -d is:

#!/bin/bash

echo "hello world"

This can be used for debugging, for example allowing SSH access for a GitHub user:

$ cml runner launch \
  --cloud-startup-script=$(echo 'curl https://github.com/${{ github.actor }}.keys >> /home/ubuntu/.ssh/authorized_keys' | base64 -w 0) \
  ...

GitHub Actions will replace ${{ github.actor }} with the username of the person who triggered the workflow. Conveniently, GitHub (and GitLab) provide a URL to access a user's public SSH keys. In effect the above command runs:

$ curl https://github.com/YOUR_USERNAME.keys >> ~/.ssh/authorized_keys

in the cloud instance.

This enables easy SSH access into the runner for debugging as well as experimentation.

By comparison, --cloud-ssh-private relies on a local user-generated private key and is only supported on AWS and Azure.

Using --cloud-ssh-private

  1. Generate a new RSA PEM private key for debugging purposes:

    $ ssh-keygen -t rsa -m pem -b 4096 -f key.pem
  2. Pass the contents of the generated private key file when invoking the cml runner command:

    $ cml runner launch --cloud=... --cloud-ssh-private="$(cat key.pem)"
  3. Access the instance from your local system by using the generated key as an identity file:

    $ ssh -i key.pem ubuntu@IP_ADDRESS

    replacing the IP_ADDRESS placeholder with the instance address returned by cml runner (search the output logs for instanceIp).

Using --cloud-kubernetes-node-selector

Set the Kubernetes node selector.

For example:

$ cml runner launch \
  --cloud-kubernetes-node-selector="disktype=ssd" \
  ...

will select the node labeled with disktype=ssd.

If not provided, a default accelerator=infer key pair will be used.

Node selector on multiple labels

You can set multiple labels for a node selector.

For example:

$ cml runner launch \
  --cloud-kubernetes-node-selector="disktype=ssd" \
  --cloud-kubernetes-node-selector="ram=huge" \
  ...

will select the node labeled with disktype=ssd and ram=huge.

If you specify the same key multiple times, the last one will be used.

Infer the value from the GPU configuration

If you set the key value to infer, it will infer the GPU type from the GPU configuration on the key you have set.

For example:

$ cml runner launch \
  --cloud-kubernetes-node-selector="gpu=infer" \
  ...

will select the node labeled gpu with the value inferred from the GPU configuration if available, e.g. k80.

Debugging

See the section self-hosted runners

Content

🐛 Found an issue? Let us know! Or fix it:

Edit on GitHub

Have a question? Join our chat, we will help you:

Discord Chat