docker_run

Girder Worker provides a built-in task that can be used to run docker containers. Girder Worker makes it easy to work on data held in girder from within a docker containers.

Container arguments

The docker_run task exposes a container_args parameter which can be used to pass arguments to the container entrypoint.

BindMountVolumes

The volumes to be bind mounted into a container can be passed to the docker_run task in one of two ways.

Using docker-py syntax

In this case the value of the volumes parameter is a dict conforming to specification defined by docker-py, which is passed directly to docker-py. For example

volumes = {
     '/home/docker/data': {
         'bind': '/mnt/docker/',
         'mode': 'rw'
     }

}
docker_run.delay('my/image', pull_image=True, volumes=volumes)

Using the BindMountVolume class

Girder Worker provides a utility class girder_worker.docker.transforms.BindMountVolume that can be used to define volumes that should be mounted into a container. These classes can also be used in conjunction with other parts of the girder_work docker infrastructure, for example providing a location where a file should be downloaded to. See Downloading files from Girder. When using the girder_worker.docker.transforms.BindMountVolume class a list of instances is provided as the value for the volumes parameter, Girder Worker will take care of ensuring that these volumes are mounted. In the example below we are creating a girder_worker.docker.transforms.BindMountVolume instance and passing it as a container argument to provide the mounted location to the container. Girder Worker will take care of transforming the instance into the approriate path inside the container.

vol = BindMountVolume('/home/docker/data', '/mnt/docker/')
docker_run.delay('my/image', pull_image=True, volumes=[vol], container_args=[vol])

Temporary Volume

A girder_worker.docker.transforms.TemporaryVolume class is provided representing a temporary directory on the host machine that is mounted into the container. :py:attribute:`girder_worker.docker.transforms.TemporaryVolume.default` holds a default instance that is used as the default location for many other parts of the Girder Worker docker infrastructure, for example when downloading a file. See Downloading files from Girder. However, it can also be used explicitly, for example, here it is being passed as a container argument for use within a container. Again, Girder Worker will take care of transforming the girder_worker.docker.transforms.TemporaryVolume instance into the appropriate path inside the container, so the container entrypoint will simply received a path.

vol = BindMountVolume('/home/docker/data', '/mnt/docker/')
docker_run.delay('my/image', pull_image=True, container_args=[TemporaryVolume.default])

Note that because we are using the default path, we don’t have to add the instance to the volumes parameter as it is automatically added to the list of volumes to mount.

Downloading files from Girder

Accessing files held in girder from within a container is straightforward using the girder_worker.docker.transforms.girder.GirderFileIdToVolume utility class. One simply provides the file id as an argument to the constructor and passes the instance as a container argument.

docker_run.delay('my/image', pull_image=True,
    container_args=[GirderFileIdToVolume(file_id)])

The girder_worker.docker.transforms.girder.GirderFileIdToVolume instance will take care of downloading the file from Girder and passing the path it was downloaded to into the docker container’s entrypoint as an argument.

If no volume parameter is specified then the file will be downloading to the task temporary volume. The file can also be downloaded to a specific girder_worker.docker.transforms.BindMountVolume by specifying a volume parameter, as follows:

vol = BindMountVolume(host_path, container_path)
docker_run.delay('my/image', pull_image=True,
    container_args=[GirderFileIdToVolume(file_id,volume=vol)])

If the file being downloaded is particularly large you may want to consider streaming it into the container using a named pipe. See Streaming Girder files into a container for more details.

Uploading files to Girder items

Utility classes are also provided to simplify uploading files generated by a docker container. The girder_worker.docker.transforms.girder.GirderUploadVolumePathToItem provides the functionality to upload a file to an item. In the example below, we use the girder_worker.docker.transforms.VolumePath utility class to define a file path that we then pass to the docker container. The docker container can write data to this file path. As well as passing the girder_worker.docker.transforms.VolumePath instance as a container argument we also pass it to girder_worker.docker.transforms.girder.GirderUploadVolumePathToItem , the girder_worker.docker.transforms.girder.GirderUploadVolumePathToItem instance is added to girder_result_hooks. This tells Girder Worker to upload the file path to the item id provided once the docker container has finished running.

volumepath = VolumePath('write_data_to_be_upoaded.txt')
docker_run.delay('my/image', pull_image=True, container_args=[volumepath],
    girder_result_hooks=[GirderUploadVolumePathToItem(volumepath, item_id)])

Using named pipes to stream data in and out of containers

Girder Worker uses named pipes as a language agnostic way of streaming data in and out of docker containers. Basically a named pipe is created at a path that is mounted into the container. This allows the container to open that pipe for read or write and similarly the Girder Worker infrastructure can open the pipe on the host, thus allowing data write and read from the container.

The are two utility classes used to represent a named pipe, girder_worker.docker.transforms.NamedOutputPipe and girder_worker.docker.transforms.NamedInputPipe.

NamedOuputPipe

This represents a named pipe that can be opened in a docker container for write, allowing data to be streamed out of a container.

NamedInputPipe

This represents a named pipe that can be opened in a docker container for read, allowing data to be streamed into a container.

These pipes can be connected together using the girder_worker.docker.transforms.Connect utility class.

Streaming Girder files into a container

One common example of using a named pipe is to stream a potentially large file into a container. This approach allows the task to start processing immediately rather than having to wait for the entire file to download, it also removes the requirement that the file is held on the local filesystem. In the example below we are creating an instance of girder_worker.docker.transforms.girder.GirderFileIdToStream that provides the ability to download a file in chunks. We are also creating a named pipe called read_in_container, as no volume argument is provided this pipe will be created on the temporary volume automatically mounted by Girder Worker. Finally, we are using the girder_worker.docker.transforms.Connect class to “connect” the stream to the pipe and we pass the instance as a container argument. Girder Worker will take care of the select logic to stream the file into the pipe.

stream = GirderFileIdToStream(file_id)
pipe = NamedInputPipe('read_in_container')
docker_run('my/image', pull_image=True, container_args=[Connect(stream, pipe)])

All the container has to do is open the path passed into the container entry point and start reading. Below is an example python entry point:

# Simply open the path passed into the container.
with open(sys.argv[1]) as fp:
    fp.read() # This will be reading the files contents

MacOS Volume mounting issue workaround

Due to some odd symlinking behavior by Docker engine on MacOS, it may be necessary to add a workaround when running the girder_worker. If your TMPDIR environment variable is underneath the /var directory and you see errors from Docker about MountsDenied, try running girder worker with the TMPDIR set underneath /private/var instead of /var. The location should be equivalent since /var is a symlink to /private/var.