Hi-C File Converter

Hi-C data can come in many formats including, for example, .hic, .h5, and .cool. We are building a library of code to convert these formats to a .hic file compatible with the 4D-Genome Browser. The git repository, hic-converter is a collection of these tools and useful commands. This repository also holds Hi-C data used in preprocessing and for setup of the 4D-Genome Browser Project. In the following sections we highlight how to set up a python computing environment for running these scripts. We also walk through some example usage of our conversion tools.

Installation and setup

Start by cloning the hic-converter git repository to a local directory. For example:

## Clone the hic-converter git repo
git clone git@github.com:4DGB/hic-converter.git

## Make scripts within the tools directory of the hic-converter repo executable
chmod +x ./hic-converter/tools/*.sh ./hic-converter/tools/*.py

Generate a conda environment with an installation of HiCExplorer. Other needed python libraries include:

  • os

  • argparse

  • numpy

  • pandas

  • gzip

  • subprocess

Below is an example conda create command for installing and setting up the needed python environment

conda create -n hicexplorerenv hicexplorer -c bioconda -c conda-forge

Dependencies

The two largest dependencies for these conversion scripts are python and juicer tools: Scripts here were developed using juicer version 1.22.01. The jar file of these tools is also stored here.

Example usage

Converting contact matrix from HiCExplorer as an .h5 file to a .hic contact matrix

## Make a project direcotry (not needed) and cd into it
mkdir myproject
cd myproject

## Clone the hic-converter git repo
git clone git@github.com:4DGB/hic-converter.git
## This will generate a folder named "hic-converter"

## Make scripts within the tools directory of the hic-converter repo executable
chmod +x ./hic-converter/tools/*.sh ./hic-converter/tools/*.py

## Activate hic-explorer environment
conda activate hicexplorerenv

## Change directory to tools
cd ./hic-converter/tools

## Convert the .h5 file to .hic formatted file for Mus musculus chromosome 13
./h5.to.hic.sh -m ../data/h5/SRR1956527_chr13.h5 -g ../data/sizes/mm10.chr13.size.bed -o ../data/hic/SRR1956527_chr13.200kb.hic

Generating a .hic file from .summary.txt.gz file

## Change directory to tools
cd ./hic-converter/tools

## Convert a summary.txt.gz file to an .hic file for a single chromosome
./summary.to.chrom.hic.py -i ../data/summary/GSM2667262_WT1.HiC.rep1.mus.chr13.summary.txt.gz -g mm9 -c chr13 -O ../data/hic/GSM2667262_WT1.HiC.rep1.mus.chr13.200kb.hic

Convert juicer merged_nodups (long format) file for chromosome 22 to .hic

## Activate hic-explorer environment
conda activate hicexplorerenv

## Change directory to tools
cd ./hic-converter/tools

## call our “long to chrom” hic function
./long.to.chrom.hic.py -i ../data/long/merged_nodups.chr22.subsampled.txt.gz -g ../data/sizes/GRCh38.chr22.size.bed -c chr22 -O ../data/hic/chr22.10kb.hic -R 10000

A Note on size file

The file ‘GRCH37.chr22.size.bed’ is a simple, two-column, tab-delimited text file, with no header/column names. It contains the sequence name of the chromosome targeted for conversion (i.e. chr22) and the length in base pairs (bp) of that chromosome (for example 50,818,468). These sizes for human chromosomes and for chromosomes of other species can be found on the UCSC Genome Browser or ENCODE project websites. It is expected that users will have to make their own size files for the conversion tools.