Generate, bin, filter, and correct Hic objects

After we have obtained a filtered ReadPairs object, we can easily convert it into a Hic matrix.

hic_folder = mkdir(os.path.join(output_folder, 'hic'))
hic_file = os.path.join(hic_folder, 'example.hic')
hic = pairs.to_hic(file_name=hic_file)

Note that this only uses valid pairs for Hi-C object creation and creates fragment-level Hi-C matrix, using the same fragment definitions as in the original ReadPairs object.


If you have multiple Hi-C objects for the same sample, for example from different replicates or sequencing runs, we recommend merging them at this stage:

from fanc.hic import Hic
merged_hic = Hic.merge([hic_rep1, hic_rep2]])

The fragment-level Hi-C objects can very easily be binned:

binned_hic = hic.bin(1000000,
                     file_name=os.path.join(hic_folder, 'binned_1mb.hic'),

Just like ReadPairs, Hic can also be filtered. Currently, the two available filters are LowCoverageFilter and DiagonalFilter:

from fanc.hic import LowCoverageFilter
lc_filter = LowCoverageFilter(binned_hic, rel_cutoff=0.2)
  • LowCoverageFilter removes all Hi-C “edges” (pixels) in regions that have low coverage. The cutoff can either be expressed in absolute numbers of pairs (cutoff) or as a fraction of the median coverage of all regions (rel_cutoff). We highly recommend filtering for low coverage before matrix balancing
  • DiagonalFilter removes all edges at the diagonal. Some researcher have achieved better normalisation results by setting the matrix diagonal to 0 before normalisation.

Finally, we can normalise the matrix using matrix balancing. You have the choice of either Knight-Ruiz matrix balancing (KR, kr_balancing) or iterative correction (ICE, ice_balancing). KR balancing is typically faster, but also consumes a lot more memory, especially for high matrix resolutions. ICE is slow, and might not always converge to a solution, but is much more memory efficient in its implementation.

from fanc.hic import kr_balancing
kr_balancing(binned_hic, whole_matrix=False,

With KR balancing, you can choose to restore the original valid pairs count using restore_coverage=True. If this is set to False, edge weights represent ligation probabilities rather than counts. If you set whole_matrix=True, the balancing will be done on the whole genome matrix. The default is to do it on a per-chromosome basis.

The bias vector will be stored in the matrix, and is automatically applied when querying the object.