Tutorial: write a data plugin

In this tutorial we describe step by step how to implement a plugin to manage data with bioimageit_core. By default the bioimageit_core library has one data plugin:

LOCAL: a data manager based on local file system. Each experiment is a local directory and each data metadata are stored using JSON files.

Data plugins must contains 3 things: 1. plugin_info: a dictionnary that contains the metadata of the plugin 2. ServiceBuilder: a class that allows to instantiate the plugin service 3. DataService: a class that implement the plugin service interface

A plugin should be stored in an independent Git reposotory with a name starting with bioimageit_ to be findable by the BioImageIT plugin engine. As a example, we can refer to the repository of the bioimageit-omero plugin

Plugin info

A BioImageIT data plugin must contain a dictonnary called plugin_info with the plugin metadata. A BioImageIT plugin has 3 metadata: 1. The plugin name 2. The plugin type. The type shoud be data for a data plugin, runner for a runner plugin or tools for a tools manager plugin. 3. The name of the plugin service builder

plugin_info = {
    'name': 'OMERO',
    'type': 'data',
    'builder': 'OmeroMetadataServiceBuilder'
}

Data service builder

The service builder is a class that allows to instantiate and initialize a single instance of the data plugin. The code bellow shows an example of data service for the LocalServiceBuilder

class LocalMetadataServiceBuilder:
    """Service builder for the metadata service"""

    def __init__(self):
        self._instance = None

    def __call__(self, **_ignored):
        if not self._instance:
            self._instance = LocalMetadataService()
        return self._instance

The constructor initialize a null instance of the LocalMetadataService, and the __call__ method instante a new LocalMetadataService. Thus, when the LocalMetadataServiceBuilder is called it is always the same instance of the LocalMetadataService that is used.

Data service

The data service is the class that implements the data management functionalities. Each method of the data service class is dedicated to a single data manipulation. The code below shows the list of the methods to implement where the comments indicate the inputs and outputs of the method.

def create_experiment(self, name, author, date='now', keys=None,
                      destination=''):
    """Create a new experiment

    Parameters
    ----------
    name: str
        Name of the experiment
    author: str
        username of the experiment author
    date: str
        Creation date of the experiment
    keys: list
        List of keys used for the experiment vocabulary
    destination: str
        Destination where the experiment is created. It is a the path of the
        directory where the experiment will be created for local use case

    Returns
    -------
    Experiment container with the experiment metadata

    """

def get_workspace_experiments(self, workspace_uri):
    """Read the experiments in the user workspace

    Parameters
    ----------
    workspace_uri: str
        URI of the workspace

    Returns
    -------
    list of experiment containers

    """

def get_experiment(self, md_uri):
    """Read an experiment from the database

    Parameters
    ----------
    md_uri: str
        URI of the experiment. For local use case, the URI is either the
        path of the experiment directory, or the path of the
        experiment.md.json file

    Returns
    -------
    Experiment container with the experiment metadata

    """

def update_experiment(self, experiment):
    """Write an experiment to the database

    Parameters
    ----------
    experiment: Experiment
        Container of the experiment metadata

    """

def import_data(self, experiment, data_path, name, author, format_,
                date='now', key_value_pairs=dict):
    """import one data to the experiment

    The data is imported to the raw dataset

    Parameters
    ----------
    experiment: Experiment
        Container of the experiment metadata
    data_path: str
        Path of the accessible data on your local computer
    name: str
        Name of the data
    author: str
        Person who created the data
    format_: str
        Format of the data (ex: tif)
    date: str
        Date when the data where created
    key_value_pairs: dict
        Dictionary {key:value, key:value} to annotate files

    Returns
    -------
    class RawData containing the metadata

    """

def import_dir(self, experiment, dir_uri, filter_, author, format_, date,
               directory_tag_key='', observers=None):
    """Import data from a directory to the experiment

    This method import with or without copy data contained
    in a local folder into an experiment. Imported data are
    considered as RawData for the experiment

    Parameters
    ----------
    experiment: Experiment
        Container of the experiment metadata
    dir_uri: str
        URI of the directory containing the data to be imported
    filter_: str
        Regular expression to filter which files in the folder
        to import
    author: str
        Name of the person who created the data
    format_: str
        Format of the image (ex: tif)
    date: str
        Date when the data where created
    directory_tag_key
        If the string directory_tag_key is not empty, a new tag key entry with the
        key={directory_tag_key} and the value={the directory name}.
    observers: list
        List of observers to notify the progress

    """

def get_raw_data(self, md_uri):
    """Read a raw data from the database

    Parameters
    ----------
    md_uri: str
        URI if the raw data
    Returns
    -------
    RawData object containing the raw data metadata

    """

def update_raw_data(self, raw_data):
    """Read a raw data from the database

    Parameters
    ----------
    raw_data: RawData
        Container with the raw data metadata

    """

def get_processed_data(self, md_uri):
    """Read a processed data from the database

    Parameters
    ----------
    md_uri: str
        URI if the processed data

    Returns
    -------
    ProcessedData object containing the raw data metadata

    """

def update_processed_data(self, processed_data):
    """Read a processed data from the database

    Parameters
    ----------
    processed_data: ProcessedData
        Container with the processed data metadata

    """

def get_dataset(self, md_uri):
    """Read a dataset from the database using it URI

    Parameters
    ----------
    md_uri: str
        URI if the dataset

    Returns
    -------
    Dataset object containing the dataset metadata

    """

def update_dataset(self, dataset):
    """Read a processed data from the database

    Parameters
    ----------
    dataset: Dataset
        Container with the dataset metadata

    """

def create_dataset(self, experiment, dataset_name):
    """Create a processed dataset in an experiment

    Parameters
    ----------
    experiment: Experiment
        Object containing the experiment metadata
    dataset_name: str
        Name of the dataset

    Returns
    -------
    Dataset object containing the new dataset metadata

    """

def create_run(self, dataset, run_info):
    """Create a new run metadata

    Parameters
    ----------
    dataset: Dataset
        Object of the dataset metadata
    run_info: Run
        Object containing the metadata of the run. md_uri is ignored and
        created automatically by this method

    Returns
    -------
    Run object with the metadata and the new created md_uri

    """

def get_run(self, md_uri):
    """Read a run metadata from the data base

    Parameters
    ----------
    md_uri
        URI of the run entry in the database

    Returns
    -------
    Run: object containing the run metadata

    """

def create_data(self, dataset, run, processed_data):
    """Create a new processed data for a given dataset

    Parameters
    ----------
    dataset: Dataset
        Object of the dataset metadata
    run: Run
        Metadata of the run
    processed_data: ProcessedData
        Object containing the new processed data. md_uri is ignored and
        created automatically by this method

    Returns
    -------
    ProcessedData object with the metadata and the new created md_uri

    """

Register the service

The last step is to register the metadata service to the bioimageit_core data services factory. Open the file bioimageit_core/plugins/data_factory.py, and add a line at the end to register the service:

metadataServices.register_builder('LOCAL', LocalMetadataServiceBuilder())

In the example above, the string 'LOCAL' is the name of the metadata service. Then, if we want to use this service, we need to specify it in the config file:

...
"metadata": {
    "service": "LOCAL",
...

Summary

To summarize, in order to create a new metadata plugin we need to follow these steps:

create a python file in bioimageit_core/plugins/
implement a DataServiceBuilder class.
implement a MetadataService class.
register the service at bioimageit_core/plugins/data_factory.py
setup the config.json file with the new plugin to be able to use it