Built-in Plugins

The docker_run Task

Girder Worker provides a built-in task that can be used to run docker containers. Girder Worker makes it easy to work on data held in girder from within a docker containers.

Container arguments

The docker_run task exposes a container_args parameter which can be used to pass arguments to the container entrypoint.

BindMountVolumes

The volumes to be bind mounted into a container can be passed to the docker_run task in one of two ways.

Using docker-py syntax

In this case the value of the volumes parameter is a dict conforming to specification defined by docker-py, which is passed directly to docker-py. For example

volumes = {
     '/home/docker/data': {
         'bind': '/mnt/docker/',
         'mode': 'rw'
     }

}
docker_run.delay('my/image', pull_image=True, volumes=volumes)

Using the BindMountVolume class

Girder Worker provides a utility class girder_worker.docker.transforms.BindMountVolume that can be used to define volumes that should be mounted into a container. These classes can also be used in conjunction with other parts of the girder_work docker infrastructure, for example providing a location where a file should be downloaded to. See Downloading files from Girder. When using the girder_worker.docker.transforms.BindMountVolume class a list of instances is provided as the value for the volumes parameter, Girder Worker will take care of ensuring that these volumes are mounted. In the example below we are creating a girder_worker.docker.transforms.BindMountVolume instance and passing it as a container argument to provide the mounted location to the container. Girder Worker will take care of transforming the instance into the approriate path inside the container.

vol = BindMountVolume('/home/docker/data', '/mnt/docker/')
docker_run.delay('my/image', pull_image=True, volumes=[vol], container_args=[vol])

Temporary Volume

A girder_worker.docker.transforms.TemporaryVolume class is provided representing a temporary directory on the host machine that is mounted into the container. girder_worker.docker.transforms.TemporaryVolume.default holds a default instance that is used as the default location for many other parts of the Girder Worker docker infrastructure, for example when downloading a file. See Downloading files from Girder. However, it can also be used explicitly, for example, here it is being passed as a container argument for use within a container. Again, Girder Worker will take care of transforming the girder_worker.docker.transforms.TemporaryVolume instance into the appropriate path inside the container, so the container entrypoint will simply received a path.

vol = BindMountVolume('/home/docker/data', '/mnt/docker/')
docker_run.delay('my/image', pull_image=True, container_args=[TemporaryVolume.default])

Note that because we are using the default path, we don’t have to add the instance to the volumes parameter as it is automatically added to the list of volumes to mount.

Downloading files from Girder

Accessing files held in girder from within a container is straightforward using the girder_worker.docker.transforms.girder.GirderFileIdToVolume utility class. One simply provides the file id as an argument to the constructor and passes the instance as a container argument.

docker_run.delay('my/image', pull_image=True,
    container_args=[GirderFileIdToVolume(file_id)])

The girder_worker.docker.transforms.girder.GirderFileIdToVolume instance will take care of downloading the file from Girder and passing the path it was downloaded to into the docker container’s entrypoint as an argument.

If no volume parameter is specified then the file will be downloading to the task temporary volume. The file can also be downloaded to a specific girder_worker.docker.transforms.BindMountVolume by specifying a volume parameter, as follows:

vol = BindMountVolume(host_path, container_path)
docker_run.delay('my/image', pull_image=True,
    container_args=[GirderFileIdToVolume(file_id,volume=vol)])

If the file being downloaded is particularly large you may want to consider streaming it into the container using a named pipe. See Streaming Girder files into a container for more details.

Uploading files to Girder items

Utility classes are also provided to simplify uploading files generated by a docker container. The girder_worker.docker.transforms.girder.GirderUploadVolumePathToItem provides the functionality to upload a file to an item. In the example below, we use the girder_worker.docker.transforms.VolumePath utility class to define a file path that we then pass to the docker container. The docker container can write data to this file path. As well as passing the girder_worker.docker.transforms.VolumePath instance as a container argument we also pass it to girder_worker.docker.transforms.girder.GirderUploadVolumePathToItem , the girder_worker.docker.transforms.girder.GirderUploadVolumePathToItem instance is added to girder_result_hooks. This tells Girder Worker to upload the file path to the item id provided once the docker container has finished running.

volumepath = VolumePath('write_data_to_be_upoaded.txt')
docker_run.delay('my/image', pull_image=True, container_args=[volumepath],
    girder_result_hooks=[GirderUploadVolumePathToItem(volumepath, item_id)])

Using named pipes to stream data in and out of containers

Girder Worker uses named pipes as a language agnostic way of streaming data in and out of docker containers. Basically a named pipe is created at a path that is mounted into the container. This allows the container to open that pipe for read or write and similarly the Girder Worker infrastructure can open the pipe on the host, thus allowing data write and read from the container.

The are two utility classes used to represent a named pipe, girder_worker.docker.transforms.NamedOutputPipe and girder_worker.docker.transforms.NamedInputPipe.

NamedOuputPipe

This represents a named pipe that can be opened in a docker container for write, allowing data to be streamed out of a container.

NamedInputPipe

This represents a named pipe that can be opened in a docker container for read, allowing data to be streamed into a container.

These pipes can be connected together using the girder_worker.docker.transforms.Connect utility class.

Streaming Girder files into a container

One common example of using a named pipe is to stream a potentially large file into a container. This approach allows the task to start processing immediately rather than having to wait for the entire file to download, it also removes the requirement that the file is held on the local filesystem. In the example below we are creating an instance of girder_worker.docker.transforms.girder.GirderFileIdToStream that provides the ability to download a file in chunks. We are also creating a named pipe called read_in_container, as no volume argument is provided this pipe will be created on the temporary volume automatically mounted by Girder Worker. Finally, we are using the girder_worker.docker.transforms.Connect class to “connect” the stream to the pipe and we pass the instance as a container argument. Girder Worker will take care of the select logic to stream the file into the pipe.

stream = GirderFileIdToStream(file_id)
pipe = NamedInputPipe('read_in_container')
docker_run('my/image', pull_image=True, container_args=[Connect(stream, pipe)])

All the container has to do is open the path passed into the container entry point and start reading. Below is an example python entry point:

# Simply open the path passed into the container.
with open(sys.argv[1]) as fp:
    fp.read() # This will be reading the files contents

Streaming progress reporting from Docker tasks to Girder jobs

The girder_worker.docker.transforms.girder.ProgressPipe class can be used to facilitate streaming real-time progress reporting from a docker task to its associated Girder job. It uses a named pipe to provide a simple interface within the container that is usable from any runtime environment.

The following example code shows the Girder side task invocation for using ProgressPipe:

from girder_worker.docker.tasks import docker_run
from girder_worker.docker.transforms.girder import ProgressPipe

docker_run.delay('my_docker_image:latest', container_args=[ProgressPipe()])

The corresponding example code running in the container entrypoint uploads progress events at regular intervals, which will automatically reflect in the job progress on the Girder server. This code is shown in python, but the idea is the same regardless of language.

import json
import sys
import time

with open(sys.argv[1], 'w') as pipe:
    for i in range(10):
        pipe.write(json.dumps({
            'message': 'Step %d of 10' % i,
            'total': 10,
            'current': i + 1
        }))
        pipe.flush()
        time.sleep(1)

The messages written to the pipe must be one per line, and each message must be a JSON Object containing optional message, current, and total values. You must call flush() on the file handle explicitly for your message to be flushed, since it is a named pipe.

Attaching intermediate / optional artifacts to Girder jobs

It’s often useful for debugging/tracing or algorithm analysis to be able to inspect intermediate outputs or other artifacts produced during execution of a task, even (perhaps especially) if the task fails. These artifacts differ from normal output transforms that upload files to Girder in two ways. Firstly, they are optional; if the specified file or directory does not exist, it does not cause any errors. This allows docker image authors to choose either at build time or runtime whether or not to create and upload artifacts. Secondly, the artifact files are attached to the job document itself, rather than placed within the Girder data hierarchy. This facilitates inspection of job artifacts inline with things like the log and status fields.

The following example code shows an example Girder-side usage of the girder_worker.docker.transforms.girder.GirderUploadVolumePathJobArtifact transform to upload job artifacts from your docker task.

from girder_worker.docker.tasks import docker_run
from girder_worker.docker.transforms import VolumePath
from girder_worker.docker.transforms.girder import GirderUploadVolumePathJobArtifact

artifacts = VolumePath('job_artifacts')
docker_run.delay(
    'my_docker_image:latest', container_args=[
        artifacts
    ],
    girder_result_hooks=[
        GirderUploadVolumePathJobArtifact(artifacts)
    ])

Note that you can write to this path inside your container and make it either a directory or a single file. If it’s a directory, all files within the directory will be uploaded and attached to the job as artifacts. This operation is not recursive, i.e. it will not upload anything under subdirectories of the top level directory.

It’s often useful to upload any artifact files even if the docker_run task failed. For that behavior, simply pass an additional argument to the transform:

GirderUploadVolumePathJobArtifact(artifacts, upload_on_exception=True)

MacOS Volume mounting issue workaround

Due to some odd symlinking behavior by Docker engine on MacOS, it may be necessary to add a workaround when running the girder_worker. If your TMPDIR environment variable is underneath the /var directory and you see errors from Docker about MountsDenied, try running girder worker with the TMPDIR set underneath /private/var instead of /var. The location should be equivalent since /var is a symlink to /private/var.