Workflow Tip: Validating Bioinformatics Workflows on Terra

July 27, 2023

Validating workflow outputs can be a challenging and time-consuming task for many public health laboratories. Especially when it comes to large data sets, inconsistencies or differences can easily be missed. TheiaValidate is one solution that is designed to bring clarity and simplicity to your data comparison needs.

TheiaValidate is designed to perform basic comparisons between user-selected columns in two separate tables. This workflow can be helpful when you need to discern any differences that exist between two different workflows or version releases. We use it as a way to assure that no unintended alterations or discrepancies have been introduced during updates.

TheiaValidate generates a comprehensive PDF report summarizing its findings. It also produces an Excel spreadsheet listing every non-matching value for a more granular view. This makes it possible to both get an overview of the comparisons and dive into the specifics when needed.

To make TheiaValidate as flexible as possible, we’ve also designed it to require users to list out what columns should be compared between the two tables. This way, it can seamlessly adapt to different workflow series and can be fully customized to suit your unique validation requirements. Please keep in mind that the tables need to have the exact same number of rows and matching sample names.

By default, TheiaValidate performs exact string matches. However, in some cases, comparing if a sample has the same contents even if they’re in a different order might be more useful. In addition, stochasticity in some bioinformatic algorithms may result in exact-match failures, so knowing whether or not two numerical values are within a certain percentage threshold of each other is more useful. The user can provide a file that indicates what comparison should be performed on each column. Current options include “SET”, “EXACT”, or a decimal value that will be used to calculate the maximum allowable percent difference.

TheiaValidate can expedite the way you compare data between tables, facilitating a more accurate and efficient validation process. Whether you are managing version releases or comparing results from different workflows, TheiaValidate is here to make your job easier and your results more reliable.

You can find more information about this workflow on our Theiagen Public Health Resources Page.