Domain-Specific Language (DSL) Manual for VizSciFlow and its derivatives

VizSciFlow is a workflow management system with a domain-specific language (DSL).

The DSL has simple syntax and a minimal set of keywords. The syntax has similarities to Python’s syntax and indenting.

Scientists can write a workflow script using the visual elements offered in VizSciFlow web interface.

Keywords:

The keywords of VizSciFlow are listed below:
Construct Keywords Example
Conditionalif

for f in GetFiles('/public/MiSeq_SOP'):
    if GetDataType(f) == 'fastq':
        fq_html = fastqc.CheckQuality(f)
Iteration for ... in

for f in GetFiles('/public/MiSeq_SOP'):
    if GetDataType(f) == 'fastq':
        fq_html = fastqc.CheckQuality(f)
Parallelization parallel ... with

datas = ['/public/MiSeq_SOP/F3D6_S194_L001_R1_001.fastq', '/public/MiSeq_SOP/F3D6_S194_L001_R2_001.fastq']
parallel:
    fq_html = fastqc.CheckQuality(datas[0])
with:
    fq_html = fastqc.CheckQuality(datas[1])
parfor

datas = ['/public/MiSeq_SOP/F3D6_S194_L001_R1_001.fastq', '/public/MiSeq_SOP/F3D6_S194_L001_R2_001.fastq']
parfor data in datas:
  fq_html = fastqc.CheckQuality(data)

Subworkflow task Task is similar to functions in General-Purpose Programming Language.

task AlignSequences(ref, data, data2):
    CheckQuality(data)
    CheckQuality(data2)

    data = pear.Merge(data, data2)
    data = bwa.Align(ref, data)
    data = SamToBam(data)
    return data

AlignSequences('/public/genomes/Chr1.cdna', '/public/MiSeq_SOP/F3D6_S194_L001_R1_001.fastq', '/public/MiSeq_SOP/F3D6_S194_L001_R2_001.fastq')
Workflow Run a pre-existing workflow with its ID and passing arguments.

Workflow(id=<workflow_id>, **kwargs)

	  

Syntax:

VizSciFlow uses simple pythonic syntax and indenting. Advanced syntaxes like lambda, class, iterators, generators, annotation are not allowed. Here is a complete example of a VizSciFlow script:

Like Python, VizSciFlow is structured with indentation i.e. the statements within a block line up vertically. The block ends at a line less indented or the end of the file. If a block has to be more deeply nested, it is simply indented further to the right.

Here is a VizSciFlow indenting example:

task AlignSequences(ref, data, data2):
    CheckQuality(data)
    CheckQuality(data2)

    data = pear.Merge(data, data2)
    data = bwa.Align(ref, data)
    data = SamToBam(data)
    return data

AlignSequences('/public/genomes/Chr1.cdna', '/public/MiSeq_SOP/F3D6_S194_L001_R1_001.fastq', '/public/MiSeq_SOP/F3D6_S194_L001_R2_001.fastq')

VizSciFlow is a dynamic language. The type of a literal is inferred from the value. For example:

Integer i = 10
Float f = 10.0f
Text s = "Hello World!"

The VizSciFlow web interface

The domain experts can use VizSciFlow web interface to quickly create a workflow. The IDE looks as below: A typical set of events are:

Data Sources Panel

This is the left-top panel of the user interface. It lists all the data items available from different file systems (posix, galaxy, hdfs). The public folder of each file system is available to all users for read access. Another folder named as username is available for read-write accesses. The generated files during workflow execution are usually found in this folder.

Data Sources

To insert a data item into the code editor, first remove the argument name from the service call and then click right button on the item and select "To Editor".

Data Source Toolbar

There is a small toolbar on top of data source panel.

Sometimes, you may need to insert the selected data item in the arguments box. Right click on the selected item and press "To args" context menu item.

Services Panel

Services

The services/tools/modules for the workflows are listed on the right-top corner of the UI. There are 3 different access modes for the service. If you don't see your tool, click the "All" radio button.

  1. Public: Accessible by all users.
  2. Shared: Accessible only by those users who are given accesses.
  3. Private: Accessible only by the logged on user himself.

Add New Service

Experienced users can extend the capabilities of the system by adding new services to it. It is important to decide the input and output of the tool. You may have multiple threads in the tool, but it must have a single synchrounous exit point. Different types of tools can be integrated into VizSciFlow system.

  1. Python code: You can wrap python code in the code editor as a tool. If you need a module from PyPi, add the module to pip install textbox.
  2. Python module: You can attach your python module with Add file button. You need to use the module in your adapter code. If it is a zip/tar/bz file, it will be extracted where the adapter resides.
  3. External standalone tool: You can add the package with Add File and then use it from the adapter. If it is a zip/tar/bz file, it will be extracted where the adapter resides. You need to know the extracted folder to use its executable.
  4. External installer: This type of tools can't be installed directly. They need interactive installation or compilation which only administrators can do. More about it in the limitations below.
To add an external tool, you must know how to call the tool from python.

In the service pane, the "+" button opens the dialog for Service Mapping. There are two tabs: Here is a simple service code which returns the first argument:

Adapter
Mapper
Here is an adapter for a simple demo service which takes an argument and returns it.

def demo_service(context, *args, **kwargs):
	return str(args[0]) if args else 0
	
context is a helper object to connect the internal system to this adapter. VizSciFlow has a virtual file system (VFS) concept. Many of context's functions converts file/folder path from VFS (denormalized) to physical file system (normalized). Inputs and outputs of a module/service must be in VFS (denormalized) path. Here are some functions of context:

Here is a json mapper of the above service. Give a meaningful name to the service by changing the "Name". <package>.<name> or (only <name> if package is empty) must be unique to the DSL vocabulary.


	{
		"name": "DemoService",
		"params": [
			{
				"name": "data",
				"type": "int"
			}
		],
		"returns": {
			"name": "data",
			"type": "int"
		}
	}

Call this function in DSL editor like below:
	
print(DemoService(10))
	
If you run this 10 will be printed in log.

pip install:

If you need to install a PyPi package, select a virtual environment for it. If you select the system's virtual environment (.venv), you can use it directly in your adapter in usual way. You can also install a list of packages by giving a requirements file. If you select another python environment (e.g. .venvpy2), you have to create a bash script to run your python 2 code. We have example of it below. You can also create a separate virtual environment for you. In that case, first type your environment name in the text box beside "New venv". Once you type it, "New venv" button we active. Click it to create the environment. On success, it will appear in virtual environment dropdown.

Share:

If you check the "Public" box, the tool will be added as public and usable by all. You can also share it with specific users by selecting them from "Share with" dropdown.

Select the "All" redio button on Services/Tools panel if you don't find your tool.

We have shown below a complete example of adding FastQC tool to the system.

  1. Download the FastQC tool from the Internet.
  2. Add it in "Add Service" dialog using "Choose File...".
  3. Write an adapter to call this fastqc tool. It should look similar to below code:

from os import path
from pathlib import Path

fastqc = path.join(path.abspath(path.dirname(__file__)), path.join('bin', 'fastqc'))

def demo_service(context, *args, **kwargs):
	arguments = context.parse_args('CheckQuality', 'fastqc', *args, **kwargs)
	outdir = context.createoutdir()
	cmdargs = [arguments["data"], "--outdir=" + outdir]
	context.exec_run(fastqc, *cmdargs)
	outname = Path(arguments["data"]).stem
	return path.join(outdir, outname + "_fastqc.html"), path.join(outdir, outname + "_fastqc.zip")

Here is the json mapper of the above service. You need to add one parameter and two return values, all file types.

	{
		"package":"",
		"name":"FastQService",
		"params":[
		   {
			  "name":"data",
			  "type":"file",
		   }
		],
		"returns":[
		   {
			  "name":"html",
			  "type":"file"
		   },
		   {
			  "name":"zip",
			  "type":"file"
		   }
		]
	 }

The new service is by default private. Check the "Public" checkbox to make it public or It can be shared to specific users by selecting target users from "Share with" dropdown. Click the "Add" button. If there is no error, the FastQService should appear in Services panel. Click "All" radio button and then "Reload" if you don't see it. If you double-click/drag the service, following code should appear in code editor:
	
html,zip = FastQService(data)
	

Here is another example of using matplotlib in VizSciFlow. Since the executation model of VizSciFlow is unattended execution (in contrast to interactive execution), matplotlib.show will not work. The output must be saved as a file using savefig.

Here is an adapter for the service:

from os import path
import matplotlib.pyplot as plt

def demo_service(context, *args, **kwargs):
	plt.plot(args[0], args[1])
	plt.xlabel('Months')
	plt.ylabel('Books Read')

	output = path.join(context.gettempdir(), 'books_read.png')
	plt.savefig(output)
	return output

Here is the json mapper of the above service. From JSON mapper tab, you have to remove the default parameter.

	{
		"package":"",
		"name":"BookChartV3",
		"params":[
		   {
			  "name":"data",
			  "type":"int[]",
			  "desc":""
		   },
		   {
			  "name":"data2",
			  "type":"int[]",
			  "desc":""
		   }
		],
		"returns":[
		   {
			  "name":"data",
			  "type":"file"
		   }
		]
	 }

matplotlib may not be installed in some VizSciFlow systems by default. You can specify to install it in "pip install" text box by typing matplotlib. Click the "Add" button. If there is no error, the BookReadChart should appear in Services panel. Click "All" radio button and then "Reload" if you don't see it. If you double-click/drag the service, following code should appear in code editor:
	
BookReadChart()
	
Here is another adapter for running a python script (fastqe) which is installed using pip install "fastqe". As it is installed as a module, we need to use python shell to run it. context.pyvenv_run can do it.

from os import path
from pathlib import Path

thispath = path.dirname(__file__)
def demo_service(context, *args, **kwargs):
	arguments = context.parse_args('FastQE', 'fastqc', *args, **kwargs)
	outdir = context.createoutdir()
	output = path.join(outdir, Path(arguments['data']).stem + "_fastqe.html")
	context.pyvenv_run(thispath, 'fastqe', arguments['data'] + ' --min --max --output=' + output)
	return output

Advanced: Limitations of tool integration: There are some limitations to the tool integration by users in VizSciFlow. We have listed them below:

Workflows Panel

The right-bottom panel lists the saved workflows of the system. Workflows can be public, private or shared. New workflows can be saved by clicking the "+" button.

Workflows

Job Histories Panel

The left-bottom panel lists all the workflow instances run by this user. You can stop currently running workflows.

Job Histories

Newer jobs are shown on the top. Jobs which were modified in the last 5 minutes are shown red colored.

Double click an item to display the status information of workflow execution.

If you check a single running job, the incremental execution status of each step is shown on the report viewer.