
Kirill Batmanov1, Jan Delabie2, and Junbai Wang1
1Department of Pathology, Norwegian Radium Hospital, PO Box 4953 Nydalen, 0424 Oslo, Norway
2Department of Pathology, University Health Network, Toronto, Ontario, Canada
BayesPI-BAR2 [4] is a package designed to predict how non-coding somatic mutations in cancer samples affect protein-DNA binding at the mutated place. Changes in binding of transcription factors to mutated regulatory sequences can lead to disrupted gene regulation, which may promote tumorigenesis. BayesPI-BAR2 takes into account the possibility for several nearby mutations to affect binding of the same protein. The predicted effects are tested for significance in the given patient cohort, and only those that appear in patient samples more frequently than expected by chance are reported.
BayesPI-BAR2 is written in Python 2. It includes our BayesPI2 [1][2] software in binary form, which is available for Linux and OS X operating systems. Here is the full list of dependencies:
You can use the pip install scipy matplotlib command to install the Python libraries. bedtools and samtools are included in many Linux repositories.
The BayesPI-BAR2 package is out dated 2024! here.
To test the basic functionality, go to the demo/melanoma_small folder and run the command python melanoma_small_pipeline.py . After downloading the reference human genome, the test pipeline should complete without errors in a few minutes and produce the result file, data/skin_cancer_small/out/foreground/block_0_5_1295228_1295253/result.tsv with several ETS factors mentioned in it.
The package has four subfolders:
bin: the binaries of BayesPI2demo: the two example pipelines, melanoma_small for a quick test and melanoma_full for a complete applicationdata: the folder from which the demos take their input data and where they put their outputspython: the folder with the package Python source codeThe main package is a set of command line tools residing in the python folder. Run python <tool_name.py> --help command to see the full usage information for a particular tool. The detailed description of every tool is here.
The package includes an example analysis pipeline which reproduces the known result about mutations in the TERT gene promoter that create binding sites for ETS family transcription factors. The pipeline calls the main package tools in appropriate sequence, reporting the progress of the computation.
To run the pipeline, go to the demo/melanoma_full folder and run the following commands:
python get_and_preprocess_data.py to download the input and reference data and preprocess it into the right format.python bayespi_bar2_pipeline.py to execute the main pipeline code. This will take about one full day of computation on a multi-core machine. The computation speed can be greatly improved if you run the pipeline on a cluster which supports the SLURM queue manager. Edit the parallel_options.txt file in the same folder to specify the desired parallelization configuration. Check the help of bayespi_bar.py from the main package to learn about the parallelization options.python make_plots.py to make the heatmaps for the significantly affected transcription factors in the foreground blocks.The main pipeline script, bayespi_bar2_pipeline.py, is designed to be robust to interruptions. If the pipeline execution was interrupted at any point, simply run the script again, and it will resume calculation from the place it was interrupted. You can see the progress of the computation as well as the main pipeline parameters in the log file, whose location is printed on the screen when the pipeline starts.
The get_and_preprocess_data.py script will download about 2 Gb of data necessary for the pipeline. Here is the full list of additional files that will be downloaded:
The bayespi_bar2_pipeline.py script is the starting point for users wishing to use BayesPI-BAR2 to process their own datasets. The instructions for customizing the default pipeline can be found here.