Debug a workflow on the Terra platform
Some biomedical researchers use the Terra platform to run data analysis jobs on Google Cloud. When we run into errors, it can be daunting to figure out where the errors are coming from and how to fix them. In this tutorial, we walk through each step of viewing an error, understanding it, and fixing it in an actual workflow. I hope this illustrates the general strategy for how we can solve any workflow issue on Terra.
Terra is a user interface for Google Cloud#
Basically, we can think of Terra as a user interface that is supposed to make Google Cloud easier to use.
(All Terra users that I know have accounts at the Broad Institute. Since the institute uses Google to provide services like email, they will provide you with a Google account that you can use with Google Cloud.)
Workspaces, Workflows, and Jobs#
The Terra platform has a few technical terms that we need to learn:
Workspace - A space on Terra that provides access to authorized users, and is connected to a single Google Cloud bucket.
Workflow - A text file with the
.wdlextension written in the Workflow Description Language (WDL) that specifies how to execute other software packages.
Job - An instance of a workflow that was launched on Terra, with a dedicated URL for finding the input files, output files, run time status, logs, and error messages.
Please see the official documentation for more details.
Viewing an error in a workflow#
Suppose we launched a workflow called
cellranger_workflow that invokes the
Cell Ranger software by 10X Genomics. This is something we might need to do
each time we collect raw sequencing data from a single-cell RNA-seq experiment.
The workflow has a few key steps:
- Convert raw BCL data to FASTQ
- Generate a count matrix with one row for each gene and one column for each cell
After we launch a workflow, we need to find the job submission and click on it:
Then, click on the icon to go to the
Job Manager for this job:
We might have run this workflow successfully many times in the past, but today we got a new error message that we've never seen before:
Wait, where's the error?
In Terra, there are three ways to view the error messages:
Hover the mouse cursor over the red warning icon to reveal a tiny popup window with a truncated error message.
ERRORStab to view the same truncated error message.
Click the green and blue document-with-a-cloud icon to view a log with messages that were printed as the job was running.
Let's go ahead and hover our mouse cursor over that tiny icon:
Here's the text of the truncated error message:
Failed to evaluate 'if_condition' (reason 1 of 1): Evaluating (generate_count_config.link_arc_ids != "") failed: Bad array access generate_count_config.link_arc_ids: Array size 0 does not have an index value '0'
If your error is present in the green and blue document-with-a-cloud log file , then you should be able to see the full error message that applies to your situation.
What does the error mean?#
If we are familiar with computer programming, then
we might understand that
generate_count_config.link_arc_ids is an array with zero
elements, so we can't access the 0th element with
, and we get an error.
But what line of code threw this error? Was it the Cell Ranger code? Or was it the workflow?
As a general rule, this seems to be true:
Error messages from downstream software executed by the workflow (e.g. Cell Ranger) appear in the document-with-a-cloud log file .
- For these errors, we should find the source code for the downstream software.
Error messages from the workflow code appear only on the Terra user interface next to the red icon , and we will not find these errors in any log files.
- For workflow errors, we should find the source code for the workflow.
Download the source code for the workflow#
Let's find the source code for the
cellranger_workflow workflow, and maybe we can figure out what is going on.
In Terra, click
WORKFLOWS to get to this page and click on the relevant workflow:
Then click on the
We'll be redirected to a site called
portal.firecloud.org that hosts the source code for the workflow:
Download WDL to get the code.
Modify the workflow code#
With the source code in hand, we can find that line 365 threw the error:
At this point, we should try to study the code for a few minutes to figure out if we might have provided invalid input for the workflow:
If we conclude that our input must have been invalid when we launched the workflow, then we should change the input and try again.
In this case, it seems that our input was fine. The error was caused by the code in the WDL file, so we should change it.
If we read the specification for arrays in the WDL language, we might arrive at the idea to change the code to look like this instead:
Instead of assuming that the array has elements and testing the first element like
array != '', let's use
length() to check if the length of the array is
greater than zero!
Finally, it is worthwhile to double-check if there are any other instances of a similar error in the code that we might be able to fix. It is better to measure twice and cut once, because it is laborious to update the WDL code.
Failed to evaluate 'if_condition' (reason 1 of 1): Evaluating (generate_count_config.link_arc_ids != "") failed: Bad array access generate_count_config.link_arc_ids: Array size 0 does not have an index value '0' Failed to evaluate 'if_condition' (reason 1 of 1): Evaluating (generate_count_config.sample_vdj_ids != "") failed: Bad array access generate_count_config.sample_vdj_ids: Array size 0 does not have an index value '0' Failed to evaluate 'if_condition' (reason 1 of 1): Evaluating (generate_count_config.link_fbc_ids != "") failed: Bad array access generate_count_config.link_fbc_ids: Array size 0 does not have an index value '0' Failed to evaluate 'if_condition' (reason 1 of 1): Evaluating (generate_count_config.sample_atac_ids != "") failed: Bad array access generate_count_config.sample_atac_ids: Array size 0 does not have an index value '0' Failed to evaluate 'if_condition' (reason 1 of 1): Evaluating (generate_count_config.sample_feature_ids != "") failed: Bad array access generate_count_config.sample_feature_ids: Array size 0 does not have an index value '0' Failed to evaluate 'if_condition' (reason 1 of 1): Evaluating (generate_count_config.link_multi_ids != "") failed: Bad array access generate_count_config.link_multi_ids: Array size 0 does not have an index value '0'
So, we should go ahead and change all seven of those lines to use
length(array) > 0 instead
array != ''. That should do it!
Upload the new workflow code#
We don't have permission to modify the workflow WDL file, because we are not among the owners listed on the workflow page.
So, let's go ahead and click “Clone…” to make a copy that we can modify:
Then, follow these steps:
- Paste the new code into the text box.
Namespaceto be your username.
Nameto something memorable.
Create New Method.
Export to Workspace...:
Use Blank Configuration:
Next, choose the appropriate workspace where you launch your workflows:
And finally click
Export to Workspace:
Whew! That is a lot of clicking just to update a bit of code.
Run the new workflow#
Our new workflow file should now be available in our Terra workspace. Let's re-launch the same job as before, but this time use the new workflow instead of the old one.
As luck would have it, this fix actually worked. We can see the happy little green checkmarks and no red warning icons:
Contribute the fix to the owners of the workflow#
cellranger_workflow mentioned in this tutorial is part of a collection of WDL files called cumulus, hosted on GitHub.
Here's the file:
After we have confirmed that our new WDL code is working correctly, we might consider sharing the fixed code with the developers.
Good luck with your workflows!