Ray on Golem cluster yaml

Ray uses a configuration yaml file to define the characteristics of the cluster. There are provider-specific pieces of information (node definitions, payment platforms, etc.) and cluster-wide ones (number of nodes, autoscaling parameters, cluster name, etc.).

This article explains the options available when setting up a Ray on Golem cluster.

For more details check out the Ray Cluster YAML Configuration Options reference.

Example Ray on Golem cluster configuration

The basic golem-cluster.yaml is available on github.

It allows you to start a cluster on our testnet with one head node and three worker nodes. It will scale up to 10 nodes when the need arises. Check out the setup tutorial for more detailed guidance.

The details of all the properties that are generally supported by Ray, can be found on Ray docs site.

Ray on Golem strives to support all configuration possibilities available for general Ray cluster managers. When you find a property we don't support yet, please let us know (on #Ray on Golem discord channel).

Most important properties

Let's have a look at the most important properties (including the ones specific for Ray on Golem clusters)

Max workers

The "Max workers" setting defines the size of the cluster that the Ray autoscaler may scale up to, including the head node. There is a limited number of providers on the testnet, so when you need more workers you should consider moving to the mainnet.

# The maximum number of workers the cluster will have at any given time
max_workers: 10

Min workers

The minimum number of workers can be specified per node type, and influences how many such nodes are started with ray up. Additionally, this is the lowest number of nodes that Ray will automatically scale down to.

     min_workers: 3

Idle timeout

Idle timeout controls how fast Ray decommisions the nodes that are not busy. It will always leave at least min_workers workers, though.

# The number of minutes that need to pass before an idle worker node 
# is removed by the Autoscaler
idle_timeout_minutes: 5

Initialization commands

You can use initialization commands to properly set up your nodes - e.g. install all the pip dependencies.

# List of commands that will be run to initialize the nodes (before `setup_commands`)
#initialization_commands: [
#   "pip install endplay", 
initialization_commands: []

Provider section

The whole "provider" section describes quite a lot of Golem node provider internals. Some of these properties interact with how Ray on Golem works in general, so be careful with those that you're unfamiliar with as changing them may render your cluster unusable.

Webserver port

Ray on Golem uses ray-on-golem server to control Golem nodes, payments, etc. This property is used when the server is starting and when the autoscaler from the Ray head node wants to add or remove nodes to / from your cluster.


Ray on Golem uses the GLM token on the Ethereum blockchain to pay for the hosting of ray nodes. Currently, while we support only the Golem testnet, the payments use the goerli test blockchain. When you move to the mainnet, the network property needs to be changed to polygon

# Blockchain used for payments. 
# Goerli means running free nodes on testnet, 
# Polygon is for mainnet operations.
network: "goerli"

Image tag and image hash

Image tag and image hash properties refer to the virtual machine images that Golem provider nodes will start to host the Ray on Golem software.

By default, Ray on Golem uses prepackaged VM images including relatively fresh Python and Ray versions.

However, you can use these properties to override the detection and request a specific image. Supported tags are available on Golem registry.

Please let us know on #Ray on Golem discord channel) if you need an image with any specific content. We will be happy to help you.

Was this helpful?