Expected and O/E calculations¶
The following steps assume that you ran the
fanc auto command in Example analysis.
Additionally, we set up the Python session like this:
import fanc import matplotlib.pyplot as plt import fanc.plotting as fancplot hic_500kb = fanc.load("output/hic/binned/fanc_example_500kb.hic")
If you want to try the tutorial with an equivalent Cooler file, load the Hi-C file like this instead:
hic_500kb = fanc.load("architecture/other-hic/fanc_example.mcool@500kb")
or like this if you want to work with a Juicer file built from the same data:
hic_500kb = fanc.load("architecture/other-hic/fanc_example.juicer.hic@500kb")
Note that there may be minor differences in the results due to the “zooming” and balancing applied by the different tools.
RegionMatrixContainer objects (see here) have a builtin
function to calculate expected values from existing matrix data called
expected_values(). This function calculates and returns
intra-chromosomal, intra-chromosomal per chromosome, and inter-chromosomal expected values.
intra_expected, intra_expected_chromosome, inter_expected = hic_500kb.expected_values()
intra_expected is a list of average (/expected) contact values, where the position of
the value in the list corresponds to the separation between genomic regions in bins.
intra_expected_chromosome is a dictionary with chromosome names as keys, and an expected
value list as value calculated on a per-chromosome basis.
inter_expected is a single, average
inter-chromosomal contact value.
The expected values are typically plotted on a log-log scale, as illustrated here using chromosome 19:
# obtain bin distances bin_size = hic_500kb.bin_size distance = list(range(0, bin_size * len(intra_expected_chromosome['chr19']), bin_size)) # plot expected values fig, ax = plt.subplots() plt.plot(distance, intra_expected_chromosome['chr19']) ax.set_xscale('log') ax.set_yscale('log') ax.set_xlabel("Distance") ax.set_ylabel("Average contacts") plt.show()
FAN-C also has a built-in function for plotting the expected values,
distance_decay_plot(). Additional named arguments
are passed on to
ax.plot, for example to change the line color.
The function returns a
matplotlib axes object, which can then be further customised:
ax = fancplot.distance_decay_plot(hic_500kb, chromosome='chr18', color='mediumturquoise')
To compare the expected values of multiple samples, just provide multiple Hic objects:
lowc_hindiii = fanc.load("architecture/other-hic-update/lowc_hindiii_100k_1mb.hic") lowc_mboi = fanc.load("architecture/other-hic-update/lowc_mboi_100k_1mb.hic") ax = fancplot.distance_decay_plot(lowc_hindiii, lowc_mboi, chromosome='chr1', labels=['HindIII', 'MboI'])
Note: as Hi-C matrices are normalised on a per-chromosome basis in FAN-C by default, it would be misleading to plot the overall normalised intra-chromosomal expected values, or to use them for downstream analysis. We can, however, also calculate the unnormalised expected values easily enough.
intra_expected_nonorm, intra_expected_chromosome_nonorm, inter_expected_nonorm = hic_500kb.expected_values(norm=False) # obtain bin distances bin_size = hic_500kb.bin_size distance = list(range(0, bin_size * len(intra_expected_nonorm), bin_size)) # plot expected values fig, ax = plt.subplots() plt.plot(distance, intra_expected_nonorm) ax.set_xscale('log') ax.set_yscale('log') ax.set_xlabel("Distance") ax.set_ylabel("Average contacts (not normalised)") plt.show()
If you are simply interested in plotting the unnormalised values, you can use
ax = fancplot.distance_decay_plot(hic_500kb, norm=False)
Expected values rarely need to be calculated explicitly in FAN-C analysis functions, but will be calculated (or retrieved) on demand whenever necessary. To obtain observed/expected matrices, for example, please refer to RegionMatrixContainer.