A549 Cells ========== The data used as an example here and throughout the 4D-Genome Browser project were generated for the `ENCODE project `_ by the Reddy lab at Duke University. It includes paired-end, Hi-C sequencing libraries on A549 cells, a cancer lung cell line treated with 100 nM dexamethasone. Specifically, the library of the first isogenic replicate, for the initial time point (`ENCLB571GEP `_) with paired files `ENCFF039FYU `_ and `ENCFF479RSE `_ and the first isogenic replicate, for the twelve-hour time point - treated with 100 nM dexamethasone - (`ENCLB870JCZ `_) with paired files `ENCFF668EDF `_ and `ENCFF398SQH `_ were used to generate Hi-C contact maps via HiCExplorer. As an example, we have included the instructions for the construction of an ``.h5`` file of the twelve-hour timepoint for chromosome 22, below. File gathering and preprocessing -------------------------------- .. code-block:: ## Set the working directory cd ./hic-converter/data ## Activate hic-explorer environment conda activate hicexplorerenv ## Make directories for analysis. mkdir -p GRCh38 fastq bam bwa ## Download the two paired fastq.gz files for the first replicate. wget https://www.encodeproject.org/files/ENCFF668EDF/@@download/ENCFF668EDF.fastq.gz -O ./fastq/ENCFF668EDF_R1_001.fastq.gz wget https://www.encodeproject.org/files/ENCFF398SQH/@@download/ENCFF398SQH.fastq.gz -O ./fastq/ENCFF398SQH_R2_001.fastq.gz ## Download the human reference genome from the ENCODE project. wget https://www.encodeproject.org/files/GRCh38_no_alt_analysis_set_GCA_000001405.15/@@download/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz -O ./GRCh38/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz ## Unzip the genome. gunzip ./GRCh38/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz Restriction site marking and indexing ------------------------------------- Prior to analysis we need to index the reference genome we downloaded above and then search for MobI restriction cut sites. For this we use ``bwa index`` from the ``bwa`` suite of tools and ``hicFindRestSite`` from HiCExplorer. .. code-block:: ## Index the human genome (this can take awhile but we only need to do it once)! bwa index -p bwa/GRCh38_index ./GRCh38/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta ## Find the MobI restriction sites (GATC) in the GRCh38, human reference genome. hicFindRestSite --fasta ./GRCh38/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta -o ./GRCh38/GRCh38_MobI_cut_sites.bed --searchPattern GATC Building a contact map for chromosome 22 ---------------------------------------- Below are the example ``bwa mem`` and ``hicBuildMatrix`` commands used to align squenced reads and build a Hi-C contact map in ``.h5`` format. .. code-block:: ## Align data to the human reference genome. bwa mem -A 1 -B 4 -E 50 -L 0 -t 14 bwa/GRCh38_index ./fastq/ENCFF668EDF_R1_001.fastq.gz | samtools view -SHb - > ./bam/ENCFF668EDF_R1_001.bam bwa mem -A 1 -B 4 -E 50 -L 0 -t 14 bwa/GRCh38_index ./fastq/ENCFF398SQH_R2_001.fastq.gz | samtools view -SHb - > ./bam/ENCFF398SQH_R2_001.bam ## Build .h5 matrix for chromosome 22. hicBuildMatrix --samFiles ./bam/ENCFF668EDF_R1_001.bam ./bam/ENCFF398SQH_R2_001.bam \ --region chr22 --binSize 200000 \ --threads 4 --inputBufferSize 200000 \ --restrictionSequence GATC --danglingSequence GATC --restrictionCutFile ./GRCh38/GRCh38_MboI_cut_sites.bed \ --QCfolder ENCLB870JCZ.chr22.12hr --outFileName ./h5/ENCLB870JCZ.chr22.200kb.12hr.h5 Converting the ``.h5`` contact map to ``.hic`` ---------------------------------------------- Currently the 4D-Genome Workflow is only compatible with ``.hic`` files. After generating a ``.h5`` file as shown above, it will need to be converted. An example of this file conversion process is shown below. The scripts required for this conversion are available `here `_. .. code-block:: ## Activate hic-explorer environment conda activate hicexplorerenv ## Change directory to tools cd ./hic-converter/tools ## Convert a summary.txt.gz file to an .hic file for a single chromosome ./h5.to.hic.sh -m ../data/h5/ENCLB870JCZ.chr22.200kb.12hr.h5 -g ../data/sizes/GRCh38.chr22.size.bed -o ../data/hic/ENCLB870JCZ.chr22.200kb.12hr.hic