Tutorial: write a data plugin

In this tutorial we describe step by step how to implement a plugin to manage data with bioimageit_core. By default the bioimageit_core library has one data plugin:

  • LOCAL: a data manager based on local file system. Each experiment is a local directory and each data metadata are stored using JSON files.

Data plugins must contains 3 things: 1. plugin_info: a dictionnary that contains the metadata of the plugin 2. ServiceBuilder: a class that allows to instantiate the plugin service 3. DataService: a class that implement the plugin service interface

A plugin should be stored in an independent Git reposotory with a name starting with bioimageit_ to be findable by the BioImageIT plugin engine. As a example, we can refer to the repository of the bioimageit-omero plugin

Plugin info

A BioImageIT data plugin must contain a dictonnary called plugin_info with the plugin metadata. A BioImageIT plugin has 3 metadata: 1. The plugin name 2. The plugin type. The type shoud be data for a data plugin, runner for a runner plugin or tools for a tools manager plugin. 3. The name of the plugin service builder

plugin_info = {
    'name': 'OMERO',
    'type': 'data',
    'builder': 'OmeroMetadataServiceBuilder'
}

Data service builder

The service builder is a class that allows to instantiate and initialize a single instance of the data plugin. The code bellow shows an example of data service for the LocalServiceBuilder

class LocalMetadataServiceBuilder:
    """Service builder for the metadata service"""

    def __init__(self):
        self._instance = None

    def __call__(self, **_ignored):
        if not self._instance:
            self._instance = LocalMetadataService()
        return self._instance

The constructor initialize a null instance of the LocalMetadataService, and the __call__ method instante a new LocalMetadataService. Thus, when the LocalMetadataServiceBuilder is called it is always the same instance of the LocalMetadataService that is used.

Data service

The data service is the class that implements the data management functionalities. Each method of the data service class is dedicated to a single data manipulation. The code below shows the list of the methods to implement where the comments indicate the inputs and outputs of the method.

def create_experiment(self, name, author, date='now', keys=None,
                      destination=''):
    """Create a new experiment

    Parameters
    ----------
    name: str
        Name of the experiment
    author: str
        username of the experiment author
    date: str
        Creation date of the experiment
    keys: list
        List of keys used for the experiment vocabulary
    destination: str
        Destination where the experiment is created. It is a the path of the
        directory where the experiment will be created for local use case

    Returns
    -------
    Experiment container with the experiment metadata

    """
def get_workspace_experiments(self, workspace_uri):
    """Read the experiments in the user workspace

    Parameters
    ----------
    workspace_uri: str
        URI of the workspace

    Returns
    -------
    list of experiment containers

    """
def get_experiment(self, md_uri):
    """Read an experiment from the database

    Parameters
    ----------
    md_uri: str
        URI of the experiment. For local use case, the URI is either the
        path of the experiment directory, or the path of the
        experiment.md.json file

    Returns
    -------
    Experiment container with the experiment metadata

    """
def update_experiment(self, experiment):
    """Write an experiment to the database

    Parameters
    ----------
    experiment: Experiment
        Container of the experiment metadata

    """
def import_data(self, experiment, data_path, name, author, format_,
                date='now', key_value_pairs=dict):
    """import one data to the experiment

    The data is imported to the raw dataset

    Parameters
    ----------
    experiment: Experiment
        Container of the experiment metadata
    data_path: str
        Path of the accessible data on your local computer
    name: str
        Name of the data
    author: str
        Person who created the data
    format_: str
        Format of the data (ex: tif)
    date: str
        Date when the data where created
    key_value_pairs: dict
        Dictionary {key:value, key:value} to annotate files

    Returns
    -------
    class RawData containing the metadata

    """
def import_dir(self, experiment, dir_uri, filter_, author, format_, date,
               directory_tag_key='', observers=None):
    """Import data from a directory to the experiment

    This method import with or without copy data contained
    in a local folder into an experiment. Imported data are
    considered as RawData for the experiment

    Parameters
    ----------
    experiment: Experiment
        Container of the experiment metadata
    dir_uri: str
        URI of the directory containing the data to be imported
    filter_: str
        Regular expression to filter which files in the folder
        to import
    author: str
        Name of the person who created the data
    format_: str
        Format of the image (ex: tif)
    date: str
        Date when the data where created
    directory_tag_key
        If the string directory_tag_key is not empty, a new tag key entry with the
        key={directory_tag_key} and the value={the directory name}.
    observers: list
        List of observers to notify the progress

    """
def get_raw_data(self, md_uri):
    """Read a raw data from the database

    Parameters
    ----------
    md_uri: str
        URI if the raw data
    Returns
    -------
    RawData object containing the raw data metadata

    """
def update_raw_data(self, raw_data):
    """Read a raw data from the database

    Parameters
    ----------
    raw_data: RawData
        Container with the raw data metadata

    """
def get_processed_data(self, md_uri):
    """Read a processed data from the database

    Parameters
    ----------
    md_uri: str
        URI if the processed data

    Returns
    -------
    ProcessedData object containing the raw data metadata

    """
def update_processed_data(self, processed_data):
    """Read a processed data from the database

    Parameters
    ----------
    processed_data: ProcessedData
        Container with the processed data metadata

    """
def get_dataset(self, md_uri):
    """Read a dataset from the database using it URI

    Parameters
    ----------
    md_uri: str
        URI if the dataset

    Returns
    -------
    Dataset object containing the dataset metadata

    """
def update_dataset(self, dataset):
    """Read a processed data from the database

    Parameters
    ----------
    dataset: Dataset
        Container with the dataset metadata

    """
def create_dataset(self, experiment, dataset_name):
    """Create a processed dataset in an experiment

    Parameters
    ----------
    experiment: Experiment
        Object containing the experiment metadata
    dataset_name: str
        Name of the dataset

    Returns
    -------
    Dataset object containing the new dataset metadata

    """
def create_run(self, dataset, run_info):
    """Create a new run metadata

    Parameters
    ----------
    dataset: Dataset
        Object of the dataset metadata
    run_info: Run
        Object containing the metadata of the run. md_uri is ignored and
        created automatically by this method

    Returns
    -------
    Run object with the metadata and the new created md_uri

    """
def get_run(self, md_uri):
    """Read a run metadata from the data base

    Parameters
    ----------
    md_uri
        URI of the run entry in the database

    Returns
    -------
    Run: object containing the run metadata

    """
def create_data(self, dataset, run, processed_data):
    """Create a new processed data for a given dataset

    Parameters
    ----------
    dataset: Dataset
        Object of the dataset metadata
    run: Run
        Metadata of the run
    processed_data: ProcessedData
        Object containing the new processed data. md_uri is ignored and
        created automatically by this method

    Returns
    -------
    ProcessedData object with the metadata and the new created md_uri

    """

Register the service

The last step is to register the metadata service to the bioimageit_core data services factory. Open the file bioimageit_core/plugins/data_factory.py, and add a line at the end to register the service:

metadataServices.register_builder('LOCAL', LocalMetadataServiceBuilder())

In the example above, the string 'LOCAL' is the name of the metadata service. Then, if we want to use this service, we need to specify it in the config file:

...
"metadata": {
    "service": "LOCAL",
...

Summary

To summarize, in order to create a new metadata plugin we need to follow these steps:

  • create a python file in bioimageit_core/plugins/

  • implement a DataServiceBuilder class.

  • implement a MetadataService class.

  • register the service at bioimageit_core/plugins/data_factory.py

  • setup the config.json file with the new plugin to be able to use it