Types and formats¶
In Girder Worker, analysis inputs and outputs may contain type and format annotations. These
annotations only have an effect if the types
plugin is enabled on the worker,
and the specific behaviors of validation and conversion of data formats are controlled
by flags to the run
task called validate
and auto_convert
respectively, which
default to being enabled if they are not explicitly passed.
A type in Girder Worker is a high-level description of a data structure useful for intuitive
workflows. It is not tied to a particular representation.
For example, the table type may be defined as a list of rows with ordered,
named column fields. This description does not specify any central representation
since the information may be stored in a variety of ways.
A type is specified by a string unique to your Girder Worker environment, such
as "table"
for the table type.
An explicit representation of data is called a format in Girder Worker. A format
is a low-level description of data layout. For example, the table type may have
formats for CSV, database table, R data frame, or JSON. The format may be text,
serialized binary, or even in-memory data layouts. Just like types, a format is
specified by a string unique to your Girder Worker environment, such as "csv"
for the CSV format. Formats under the same type should be convertible
between each other.
Notice that the above uses the phrases such as “may be defined” and “may have formats”.
This is because at its core Girder Worker does not contain types or formats.
The girder_worker.run()
function will attempt to match given input bindings
to analysis inputs, validating data and performing conversions as needed.
To make Girder Worker aware of certain types and formats, you must define validation and
conversion routines. These routines are themselves Girder Worker algorithms of a
particular form, loaded with
girder_worker.plugins.types.format.import_converters()
. See that function’s documentation
for how to define validators and converters.
The following are the types available in Girder Worker core. Application plugins may add their
own types and formats using the girder_worker.plugins.types.format.import_converters
function. See
the Application Plugins section for details on plugin-specific types and formats.
"boolean"
type¶
A true or false value. Formats:
"boolean" : | An in-memory Python bool . |
---|---|
"json" : | A JSON string representing a single boolean ("true" or "false" ). |
"integer"
type¶
An integer. Formats:
"integer" : | An in-memory Python int . |
---|---|
"json" : | A JSON string representing a single integer. |
"number"
type¶
A numeric value (integer or real). Formats:
"number" : | An in-memory Python int or float . |
---|---|
"json" : | A JSON string representing a single number. |
"string"
type¶
A sequence of characters.
"text" : | A raw string of characters (str in Python). |
---|---|
"json" : | A JSON string representing a single string. This is a quoted string with certain characters escaped. |
"integer_list"
type¶
A list of integers. Formats:
"integer_list" : | An in-memory list of Python int . |
---|---|
"json" : | A JSON string representing a list of integers. |
"number_list"
type¶
A list of numbers (integer or real). Formats:
"number_list" : | An in-memory list of Python int or float . |
---|---|
"json" : | A JSON string representing a list of numbers. |
"string_list"
type¶
A list of strings. Formats:
"string_list" : | An in-memory list of Python str . |
---|---|
"json" : | A JSON string representing a list of strings. |
"table"
type¶
A list of rows with ordered, named column attributes. Formats:
"rows" : | A Python dictionary containing keys {
"fields": ["one", "two"],
"rows": [{"one": 1, "two": 2}, {"one": 3, "two": 4}]
}
|
---|---|
"rows.json" : | The equivalent JSON representation of the |
"objectlist" : | A Python list of dictionaries of the form [{"one": 1, "two": 2}, {"one": 3, "two": 4}]
This is identical to the |
"objectlist.json" : | |
The equivalent JSON representation of the
|
|
"objectlist.bson" : | |
The equivalent BSON representation of the
|
|
"csv" : | A string containing the contents of a comma-separated CSV file. The first line of the file is assumed to contain column headers. |
"tsv" : | A string containing the contents of a tab-separated TSV file.
Column headers are detected the same as for the |
"tree"
type¶
A hierarchy of nodes with node and/or link attributes. Formats:
"nested" : | A nested Python dictionary representing the tree.
All nodes may contain a {
"edge_fields": ["weight"],
"node_fields": ["node name", "node weight"],
"node_data": {"node name": "", "node weight": 0.0},
"children": [
{
"node_data": {"node name": "", "node weight": 2.0},
"edge_data": {"weight": 2.0},
"children": [
{
"node_data": {"node name": "ahli", "node weight": 2.0},
"edge_data": {"weight": 0.0}
},
{
"node_data": {"node name": "allogus", "node weight": 3.0},
"edge_data": {"weight": 1.0}
}
]
},
{
"node_data": {"node name": "rubribarbus", "node weight": 3.0},
"edge_data": {"weight": 3.0}
}
]
}
|
---|---|
"nested.json" : | The equivalent JSON representation of the |
"newick" : | A tree in Newick format. |
"nexus" : | A tree in Nexus format. |
"phyloxml" : | A phylogenetic tree in PhyloXML format. |
"graph"
type¶
A collection of nodes and edges with optional attributes. Formats:
"networkx" : | An in-memory representation of a graph using an object of type nx.Graph (or any of its subclasses). |
---|---|
"networkx.json" : | |
A JSON representation of a NetworkX graph. | |
"clique.json" : | A JSON representation of a Clique graph. |
"graphml" : | An XML String representing a valid GraphML representation. |
"adjacencylist" : | |
A string representing a very simple adjacency list which does not preserve node or edge attributes. |
"image"
type¶
A 2D matrix of uniformly-typed numbers. Formats:
"png" : | An image in PNG format. |
---|---|
"png.base64" : | A Base-64 encoded PNG image. |
"pil" : | An image as a PIL.Image from the Python Imaging Library. |