Domain-Specific Language (DSL) Manual for VizSciFlow and its derivatives

VizSciFlow is a workflow management system with a domain-specific language (DSL).

The DSL has simple syntax and a minimal set of keywords. The syntax has similarities to Python’s syntax and indenting.

Scientists can write a workflow script using the visual elements offered in VizSciFlow web interface.

Keywords:

The keywords of VizSciFlow are listed below:

Construct	Keywords	Example
Conditional	if	`for f in GetFiles('/public/MiSeq_SOP'): if GetDataType(f) == 'fastq': fq_html = fastqc.CheckQuality(f)`
Iteration	for ... in	`for f in GetFiles('/public/MiSeq_SOP'): if GetDataType(f) == 'fastq': fq_html = fastqc.CheckQuality(f)`
Parallelization	parallel ... with	`datas = ['/public/MiSeq_SOP/F3D6_S194_L001_R1_001.fastq', '/public/MiSeq_SOP/F3D6_S194_L001_R2_001.fastq'] parallel: fq_html = fastqc.CheckQuality(datas[0]) with: fq_html = fastqc.CheckQuality(datas[1])`
Parallelization	parfor	`datas = ['/public/MiSeq_SOP/F3D6_S194_L001_R1_001.fastq', '/public/MiSeq_SOP/F3D6_S194_L001_R2_001.fastq'] parfor data in datas: fq_html = fastqc.CheckQuality(data)`
Subworkflow	task	Task is similar to functions in General-Purpose Programming Language. `task AlignSequences(ref, data, data2): CheckQuality(data) CheckQuality(data2) data = pear.Merge(data, data2) data = bwa.Align(ref, data) data = SamToBam(data) return data AlignSequences('/public/genomes/Chr1.cdna', '/public/MiSeq_SOP/F3D6_S194_L001_R1_001.fastq', '/public/MiSeq_SOP/F3D6_S194_L001_R2_001.fastq')`
Subworkflow	Workflow	Run a pre-existing workflow with its ID and passing arguments. `Workflow(id=<workflow_id>, **kwargs)`

Syntax:

VizSciFlow uses simple pythonic syntax and indenting. Advanced syntaxes like lambda, class, iterators, generators, annotation are not allowed. Here is a complete example of a VizSciFlow script:

Like Python, VizSciFlow is structured with indentation i.e. the statements within a block line up vertically. The block ends at a line less indented or the end of the file. If a block has to be more deeply nested, it is simply indented further to the right.

Here is a VizSciFlow indenting example:


task AlignSequences(ref, data, data2):
    CheckQuality(data)
    CheckQuality(data2)

    data = pear.Merge(data, data2)
    data = bwa.Align(ref, data)
    data = SamToBam(data)
    return data

AlignSequences('/public/genomes/Chr1.cdna', '/public/MiSeq_SOP/F3D6_S194_L001_R1_001.fastq', '/public/MiSeq_SOP/F3D6_S194_L001_R2_001.fastq')

VizSciFlow is a dynamic language. The type of a literal is inferred from the value. For example:

Integer	i = 10
Float	f = 10.0f
Text	s = "Hello World!"

The VizSciFlow web interface

The domain experts can use VizSciFlow web interface to quickly create a workflow. The IDE looks as below: A typical set of events are:

Search the desired service from the Services panel to the right.
Drag the service to insert into the code editor.
Alternatively, you can also start typing the service name on code editor and press Ctrl+Space for automatic code completion.
Search the data item from the DataSources panel to the left.
Replace the data argument on the code editor by dragging the desired data source and dropping on this panel.
Run the workflow from command panel.
An item is added to the left-bottom panel for the execution job. Check it for automatic status update of each step or double click for instant load.

Data Sources Panel

This is the left-top panel of the user interface. It lists all the data items available from different file systems (posix, galaxy, hdfs). The public folder of each file system is available to all users for read access. Another folder named as username is available for read-write accesses. The generated files during workflow execution are usually found in this folder.

To insert a data item into the code editor, first remove the argument name from the service call and then click right button on the item and select "To Editor".

Data Source Toolbar

There is a small toolbar on top of data source panel.

Add: Adds a folder at the selected item.
Remove: Removes the selected item.
Rename: Renames the selected item.
Upload: Uploads an item from the client machine to the selected folder.
Download: Downloads the selected file.
Reload: Reloads the tree from the file system.
About: Opens the help page.

Sometimes, you may need to insert the selected data item in the arguments box. Right click on the selected item and press "To args" context menu item.

Services Panel

The services/tools/modules for the workflows are listed on the right-top corner of the UI. There are 3 different access modes for the service. If you don't see your tool, click the "All" radio button.

Public: Accessible by all users.
Shared: Accessible only by those users who are given accesses.
Private: Accessible only by the logged on user himself.

Add New Service

Experienced users can extend the capabilities of the system by adding new services to it. It is important to decide the input and output of the tool. You may have multiple threads in the tool, but it must have a single synchrounous exit point. Different types of tools can be integrated into VizSciFlow system.

Python code: You can wrap python code in the code editor as a tool. If you need a module from PyPi, add the module to pip install textbox.
Python module: You can attach your python module with Add file button. You need to use the module in your adapter code. If it is a zip/tar/bz file, it will be extracted where the adapter resides.
External standalone tool: You can add the package with Add File and then use it from the adapter. If it is a zip/tar/bz file, it will be extracted where the adapter resides. You need to know the extracted folder to use its executable.
External installer: This type of tools can't be installed directly. They need interactive installation or compilation which only administrators can do. More about it in the limitations below.

To add an external tool, you must know how to call the tool from python.

The location of the tool.
The arguments it accepts. Some of these arguments might be your (formatted) input parameters.
The result it returns. Sometimes, stdout and stderr may be necessary. Your output of the tool will be selected/calculated from these values.
VizSciFlow works in batch mode. So any interactive display will not work. Output must be saved to a file and returned from adapter.

In the service pane, the "+" button opens the dialog for Service Mapping. There are two tabs:

Code:It containes the adapter code to connect VizSciFlow DSL with external module or executable. The VizSciFlow scheduler triggers this code during execution of the tool.
JSON:It binds the service parameters with the inputs and outputs of workflow step.

Here is a simple service code which returns the first argument:

Here is an adapter for a simple demo service which takes an argument and returns it.


def demo_service(context, *args, **kwargs):
	return str(args[0]) if args else 0

context is a helper object to connect the internal system to this adapter. VizSciFlow has a virtual file system (VFS) concept. Many of context's functions converts file/folder path from VFS (denormalized) to physical file system (normalized). Inputs and outputs of a module/service must be in VFS (denormalized) path. Here are some functions of context:

gettempdir(): path to user's temp directory in normalized and resolved form.
getpublicdir(): path to user's public directory in normalized and resolved form.
createoutdir(outname = None): Creates a unique output directory.
getnormdir(__file__): When used in an adapter file, path to this adapter's directory. Uploaded files are stored in this directory. .zip/.tar/.tar.gz files are uncompressed in this directory.
normalize(path): Normalizes a path from VizSciFlow VFS to physical path. Your input is usually converted into abstract form automatically.
denormalize(path): Denormalizes a path from physical path to VizSciFlow VFS. Your abstract path is automatically converted into output in VFS form.
moveto(src, dest, ignores=[]): moves contents of src directory to dest
copyto(src, dest, ignores=[]): copies contents of src directory to dest
exec_run(executable, *args): Runs an executable file. You can also pass values for cwd=path and env=path for current working directory and environment path respectively. It returns stdout and stderr.
bash_run(script, *args): Runs a bash script. You can also pass values for cwd=path and env=path if needed. It returns stdout and stderr.
bash_run_out_err_exit(script, *args): Runs a bash script and returns stdout, stderr and exit code.
pyvenv_run(tooldir, script, *args): Runs a python script. tooldir=os.path.dirname(__file__). It returns stdout and stderr.
parse_args(funcname, package, *args, **kwargs): Use this method to parse the arguments. See src/plugins/modules/blast/adapter.py for examples.
save_stdout_stderr(out, err): Saves stdout and stderr of a task in the database.

Here is a json mapper of the above service. Give a meaningful name to the service by changing the "Name". <package>.<name> or (only <name> if package is empty) must be unique to the DSL vocabulary.


	{
		"name": "DemoService",
		"params": [
			{
				"name": "data",
				"type": "int"
			}
		],
		"returns": {
			"name": "data",
			"type": "int"
		}
	}

Call this function in DSL editor like below:

	
print(DemoService(10))

If you run this 10 will be printed in log.

pip install:

If you need to install a PyPi package, select a virtual environment for it. If you select the system's virtual environment (.venv), you can use it directly in your adapter in usual way. You can also install a list of packages by giving a requirements file. If you select another python environment (e.g. .venvpy2), you have to create a bash script to run your python 2 code. We have example of it below. You can also create a separate virtual environment for you. In that case, first type your environment name in the text box beside "New venv". Once you type it, "New venv" button we active. Click it to create the environment. On success, it will appear in virtual environment dropdown.

If you check the "Public" box, the tool will be added as public and usable by all. You can also share it with specific users by selecting them from "Share with" dropdown.

Select the "All" redio button on Services/Tools panel if you don't find your tool.

We have shown below a complete example of adding FastQC tool to the system.

Download the FastQC tool from the Internet.
Add it in "Add Service" dialog using "Choose File...".
Write an adapter to call this fastqc tool. It should look similar to below code:


from os import path
from pathlib import Path

fastqc = path.join(path.abspath(path.dirname(__file__)), path.join('bin', 'fastqc'))

def demo_service(context, *args, **kwargs):
	arguments = context.parse_args('CheckQuality', 'fastqc', *args, **kwargs)
	outdir = context.createoutdir()
	cmdargs = [arguments["data"], "--outdir=" + outdir]
	context.exec_run(fastqc, *cmdargs)
	outname = Path(arguments["data"]).stem
	return path.join(outdir, outname + "_fastqc.html"), path.join(outdir, outname + "_fastqc.zip")

Here is the json mapper of the above service. You need to add one parameter and two return values, all file types.


	{
		"package":"",
		"name":"FastQService",
		"params":[
		   {
			  "name":"data",
			  "type":"file",
		   }
		],
		"returns":[
		   {
			  "name":"html",
			  "type":"file"
		   },
		   {
			  "name":"zip",
			  "type":"file"
		   }
		]
	 }

The new service is by default private. Check the "Public" checkbox to make it public or It can be shared to specific users by selecting target users from "Share with" dropdown. Click the "Add" button. If there is no error, the FastQService should appear in Services panel. Click "All" radio button and then "Reload" if you don't see it. If you double-click/drag the service, following code should appear in code editor:

	
html,zip = FastQService(data)

Here is another example of using matplotlib in VizSciFlow. Since the executation model of VizSciFlow is unattended execution (in contrast to interactive execution), matplotlib.show will not work. The output must be saved as a file using savefig.

Here is an adapter for the service:


from os import path
import matplotlib.pyplot as plt

def demo_service(context, *args, **kwargs):
	plt.plot(args[0], args[1])
	plt.xlabel('Months')
	plt.ylabel('Books Read')

	output = path.join(context.gettempdir(), 'books_read.png')
	plt.savefig(output)
	return output

Here is the json mapper of the above service. From JSON mapper tab, you have to remove the default parameter.


	{
		"package":"",
		"name":"BookChartV3",
		"params":[
		   {
			  "name":"data",
			  "type":"int[]",
			  "desc":""
		   },
		   {
			  "name":"data2",
			  "type":"int[]",
			  "desc":""
		   }
		],
		"returns":[
		   {
			  "name":"data",
			  "type":"file"
		   }
		]
	 }

matplotlib may not be installed in some VizSciFlow systems by default. You can specify to install it in "pip install" text box by typing matplotlib. Click the "Add" button. If there is no error, the BookReadChart should appear in Services panel. Click "All" radio button and then "Reload" if you don't see it. If you double-click/drag the service, following code should appear in code editor:

	
BookReadChart()

Here is another adapter for running a python script (fastqe) which is installed using pip install "fastqe". As it is installed as a module, we need to use python shell to run it. context.pyvenv_run can do it.


from os import path
from pathlib import Path

thispath = path.dirname(__file__)
def demo_service(context, *args, **kwargs):
	arguments = context.parse_args('FastQE', 'fastqc', *args, **kwargs)
	outdir = context.createoutdir()
	output = path.join(outdir, Path(arguments['data']).stem + "_fastqe.html")
	context.pyvenv_run(thispath, 'fastqe', arguments['data'] + ' --min --max --output=' + output)
	return output

Advanced:

Older python environment:VizSciFlow is developed using Python 3.10. If a tool is developed using older version of Python (say 2.7), users need a workaround to integrate it. We have provided a Python 2.7 virtual environment for it. Users need to create a .sh executable with the following code and run it using context.pyvenv_run. /home/venvs/<venvname>>/bin/activate
```
	#!/bin/bash
	source /home/venvs/.venvpy2/bin/activate
	python2 $1 $2 $3 $4 $5 $6 $7 $8 # $1 is a python script/program. $2-$8 are the arguments passed to $1
	
	
```


		out, err = context.pyvenv_run(thispath, "python2", blastz, *cmdargs) # python2 is the filename with extension .sh

Incompatible python library dependency:For unmaintaned libraries, it may happen that a tool needs a library which is incompatible to other tools of the python environment. Users need to create a .sh executable with the following code and run it using context.run_pyvenv.
```
		#!/bin/bash
		source /home/venvs/.venvpycoqc/bin/activate
		python2 $1 $2 $3 $4 $5 $6 $7 $8
		
		
```
Similarly you can use your own virtual environment creating a shell script activating the environment and calling it from your adapter.

Limitations of tool integration: There are some limitations to the tool integration by users in VizSciFlow. We have listed them below:

Admin access:If a tool needs admin or sudo access, it can't be integrated by the normal users. If a tools needs a different version of system library (e.g. libc) or needs manual configuration, system admin must do it.
Needs compilation:Tools which need to be compiled and installed from source, cannot be integrated by users. The admin must do it.

Workflows Panel

The right-bottom panel lists the saved workflows of the system. Workflows can be public, private or shared. New workflows can be saved by clicking the "+" button.

Job Histories Panel

The left-bottom panel lists all the workflow instances run by this user. You can stop currently running workflows.

Newer jobs are shown on the top. Jobs which were modified in the last 5 minutes are shown red colored.

Double click an item to display the status information of workflow execution.

If you check a single running job, the incremental execution status of each step is shown on the report viewer.