Guide

bioimageit_core is a python3 library. It implements the main API for BioImageIT middleware. In the scheme below we can see the position of bioimageit_core in the BioImageIT ecosystem. In fact, bioimageit_core is the API that connects low level image processing and data management to high level end users applications.

For data management, bioimageit_core implements a set of functions to manage and annotate data at the Experiment (ie project) level. For image processing tools, bioimageit_core implements a set of functions to query a tool database and to run tools on data.

For Data management and tools management, bioimageit_core defines an API that can be implemented with plugins. By default, data is managed using JSON files locally. If we want to use the bioimageit_core API to manage data on a SQL database for example, we can implement a data management plugin that links the bioimageit_core API with the SQL database. For tools execution, bioimageit_core by default runs tools using Conda packages in the local machine. If we want to run a processing tool with a job scheduler for example, we can write a tool runner plugin that links the bioimageit_core API to the job scheduler.

The advantage of this bioimageit_core architecture is to enable writing high level python code to manage and annotate data and deploy it in different hardware or network architecture without the need to update the high level code. Only plugins have to be added.

Data Management

In the BioImageIT project, we propose to manage data using a 3 layers representation:

Experiment: an experiment is a project that contains one dataset of raw data named “data” and a list of processed datasets. Each processed dataset contains the outputs of a processing tool.
DataSet: a dataset contains a list of data that can be raw or processed
Data: a data contains a single data and the associated metadata. For a RawData metadata are a set of key:values pairs information to identify data and a generic dictionary for any specific metadata (like image resolution…). For ProcessedData, metadata are a link to the origin data and the run information.

In this section we show the main functions implemented in the bioimageit_core library to handle Data, Dataset and Experiment. Please refer to the docstring documentation to get more advanced features.

The BioImageIT API is accessible through a single class named Request.

import bioimageit_core.api as iit
req = iit.Request('./config_sample.json')
req.connect()

To create an experiment, bioimageit_core has a dedicated function with the following syntax:

experiment = req.create_experiment(name='myexperiment',
                                   author='Sylvain Prigent',
                                   date='now',
                                   destination="./")

This creates an empty project with the basic metadata of the experiment. Then we can import a single data:

req.import_data(experiment,
                data_path='data_uri',
                name='mydata',
                author='Sylvain Prigent',
                format_='imagetiff',
                date='now',
                key_value_pairs={"key": value})

or a multiple data from a directory:

req.import_dir(experiment=experiment,
               dir_uri='./tests/test_images/data',
               filter_=r'\.tif$',
               author='Sylvain Prigent',
               format_='imagetiff',
               date='now')

Then the next step is to annotate the data. It can be done in batch with functions like:

req.annotate_from_name(experiment, 'Population', ['population1', 'population2'])

or:

req.annotate_using_separator(experiment, 'ID', '_', 1)

that will create key-value pairs for each data by extracting information from the data file names. The first case will search the words population1 and population2 in the data file name and associate it to the key population if one of the the words population1 or population2 is found. The second case shows how to extract information from the data file name using separators. Here we extract the sub-string in the file name that is located between two _ and after the first _, and associate the extracted value to the key ID. We can also manually annotate one data by extracting it and manually adding a key-value pair:

data = req.query(experiment, dataset_name='data', query='name=population1_001.tif')
data[0].set_key_value_pair("population", "Population1")
data[0].set_key_value_pair("ID", "001")
req.update_raw_data(data[0])

The bioimageit_core library also allows to access directly a DataSet:

raw_dataset = req.get_dataset(experiment, name='data')

and interact with the data in the DataSet:

data = req.get_data(raw_dataset, query='Population=population1')

Process Running

In the BioImageIT project processing tools are external packages (like Conda packages or Docker containers) and represented with XML wrappers similarly to the Galaxy Project. The bioimageit_core library, implements functionalities to manipulate and run packaged tools.

A Tool is a python class in bioimageit_core that allows to identify a processing tool. It load the tool XML file and allows to print and access the process information.

tool = req.get_tool('spitfiredeconv2d_v0.1.2')
tool.man()

We can see that we do not need to instantiate directly a Tool since bioimageit_core manage a process database. We can then access a Tool simply using the tool name and version:

A tool can be ran in data file directly with the exec command:

req.exec(tool,
         i='tests/test_images/data/population1_001.tif',
         o='population1_001_deconv.tif',
         sigma=4,
         regularization=12,
         weighting=0.1,
         method='SV',
         padding=True)

Nevertheless, it is not the recommended methods since exec does not generate any metadata. We prefer using the run method which runs a Job on an Experiment and keep track of all the job history (inputs data, outputs data, parameters).

from bioimageit_core.containers.runners_containers import Job

job = Job()
job.set_experiment(experiment)
job.set_tool(req.get_tool('spitfiredeconv2d_v0.1.2'))
job.set_input(name='i', dataset='data', query='')
job.set_param('sigma', '4')
job.set_param('regularization', '12')
job.set_param('weighting', '0.1')
job.set_param('method', 'SV')
job.set_param('padding', 'True')
job.set_output_dataset_name('deconv')

experiment = req.run(job)

Then, all the output data and the run metadata are stored in a new dataset of the Experiment. In the example above the new dataset is called deconv.

Guide

Data Management

Process Running

Further reading