Generate, bin, filter, and correct Hic objects¶
hic_folder = mkdir(os.path.join(output_folder, 'hic')) hic_file = os.path.join(hic_folder, 'example.hic') hic = pairs.to_hic(file_name=hic_file)
Note that this only uses valid pairs for Hi-C object creation and creates fragment-level
Hi-C matrix, using the same fragment definitions as in the original
If you have multiple Hi-C objects for the same sample, for example from different replicates or sequencing runs, we recommend merging them at this stage:
from fanc.hic import Hic merged_hic = Hic.merge([hic_rep1, hic_rep2]])
The fragment-level Hi-C objects can very easily be binned:
binned_hic = hic.bin(1000000, file_name=os.path.join(hic_folder, 'binned_1mb.hic'), threads=4)
from fanc.hic import LowCoverageFilter lc_filter = LowCoverageFilter(binned_hic, rel_cutoff=0.2) binned_hic.filter(lc_filter) binned_hic.run_queued_filters()
LowCoverageFilterremoves all Hi-C “edges” (pixels) in regions that have low coverage. The cutoff can either be expressed in absolute numbers of pairs (
cutoff) or as a fraction of the median coverage of all regions (
rel_cutoff). We highly recommend filtering for low coverage before matrix balancing
DiagonalFilterremoves all edges at the diagonal. Some researcher have achieved better normalisation results by setting the matrix diagonal to 0 before normalisation.
Finally, we can normalise the matrix using matrix balancing. You have the choice of either
Knight-Ruiz matrix balancing (KR,
kr_balancing) or iterative correction (ICE,
ice_balancing). KR balancing is typically faster, but also consumes a lot
more memory, especially for high matrix resolutions. ICE is slow, and might not always converge
to a solution, but is much more memory efficient in its implementation.
from fanc.hic import kr_balancing kr_balancing(binned_hic, whole_matrix=False, restore_coverage=False)
With KR balancing, you can choose to restore the original valid pairs count using
restore_coverage=True. If this is set to
False, edge weights represent ligation
probabilities rather than counts. If you set
whole_matrix=True, the balancing will be done
on the whole genome matrix. The default is to do it on a per-chromosome basis.
The bias vector will be stored in the matrix, and is automatically applied when querying the object.