To follow this tutorial, download the FAN-C example data, for example through our
Keeper library. Then run the
fanc auto command in Example analysis to generate all the
necessary files, and set up your Python session like this:
import fanc import fanc.plotting as fancplot import logging logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") hic_100kb = fanc.load("output/hic/binned/fanc_example_100kb.hic") hic_rao = fanc.load("architecture/loops/rao2014.chr11_77400000_78600000.hic") boundaries = fanc.load("architecture/domains/fanc_example_100kb.insulation_boundaries_score_0.7_1mb.bed") tads = fanc.load("architecture/domains/gm12878_tads.bed") loops = fanc.load("architecture/loops/rao2014.chr11_77400000_78600000.loops_no_singlets.bedpe")
If you want to try the tutorial with an equivalent Cooler file, load the Hi-C file like this instead:
hic_100kb = fanc.load("architecture/other-hic/fanc_example.mcool@100kb")
or like this if you want to work with a Juicer file built from the same data:
hic_100kb = fanc.load("architecture/other-hic/fanc_example.juicer.hic@100kb")
Note that there may be minor differences in the results due to the “zooming” and balancing applied by the different tools.
Also have a look at the command line documentation at Hi-C aggregate analysis, which explains the different types of aggregate analyses with helpful illustrations.
Aggregate analyses can provide a general overview of the 3D genomic structure around a set of interesting genomic regions. This is achieved by aggregating, i.e. averaging, the matrices around each region to obtain a single, aggregated view of all regions at the same time.
FAN-C distinguished three kinds of aggregate analyses:
- Fixed-size regions along the matrix diagonal, for example a window of 500kb around each active TSS in the genome
- Variable-size regions along the matrix diagonal, for example TADs in the genome or genes from 5’ to 3’
- Region pairs, denoting a part of the matrix off the diagonal, for example loops in the Hi-C matrix
All of these can be computed using the
class, and dedicated functions for each analysis type.
Fixed-size analyses are the simplest aggregate analyses. All you need is a list of regions
of interest, for example insulating boundaries, and they can be computed using
from_center() like this:
fixed_am = fanc.AggregateMatrix.from_center(hic_100kb, boundaries.regions, window=5000000)
You can optionally supply a
file_name to save the aggregate matrix and all its components
window parameter is crucial here and sets the window around the center of each region
that should be extracted. If the window lies outside the chromosome boundaries, the region is
declared as “invalid” and not used for the aggregate matrix. Despite being named
it is possible to use another point in each region as relative window center with
You can set this to any of
three_prime and the window
will be centred on that part of each region. For example, use
five_prime to aggregate the
region around the TSS’s in a list of genes.
You can plot the aggregate matrix using the covenience function
ax = fancplot.aggregate_plot(fixed_am, vmin=-1, vmax=1, labels=['-2.5Mb', 'boundary', '+2.5Mb'])
In this case you can nicely see the boundary in the centre of the plot. By default, an aggregate
analysis always uses O/E transformed matrices, as otherwise the distance decay might bias the result
too much, particular for a chromosome-based normalisation strategy, as is the default in FAN-C.
You can switch this off using
The command for variable-size regions such as TADs is highly similar:
variable_am = fanc.AggregateMatrix.from_regions(hic_100kb, tads.regions)
By default, each region is extended by each own length on each side (
which is a useful preset for TADs so that the regions are clearly visible in the centre.
We can visualise the region using
ax = fancplot.aggregate_plot(variable_am, vmin=-0.5, vmax=0.5, relative_label_locations=[0.33, 0.66])
To generate an aggregate matrix from variable-size regions, the extracted matrices first have to be
extrapolated to the same size. This is done internally using
skimage.transform module. You can determine the type of interpolation using the
interpolation argument, which is an integer: 0: Nearest-neighbor (default), 1: Bi-linear,
2: Bi-quadratic, 3: Bi-cubic, 4: Bi-quartic, 5: Bi-quintic. You can also turn off
in case you feel that this is a source of bias.
For TADs or other regions along the diagonal specifically, you can also
rescale the aggregate
matrix, which applies an artificial exponential decay to the matrix, making it resemble a Hi-C
rescale_am = fanc.AggregateMatrix.from_regions(hic_100kb, tads.regions, rescale=True, log=False)
ax = fancplot.aggregate_plot(rescale_am, vmax=0.045, relative_label_locations=[0.33, 0.66], colormap='germany')
So far, we have only aggregated regions along the diagonal.
from_region_pairs() allows us to aggregate
arbitrary region pairs, albeit without extrapolation of differently-sized regions - only fixed
window sizes are supported. You can either supply a
window, as with fixed-size regions
along the diagonal, or a number of
pixels/bins, which is set to
16 by default.
Here is an example for loops:
loops_am = fanc.AggregateMatrix.from_center_pairs(hic_rao, loops.region_pairs())
ax = fancplot.aggregate_plot(loops_am, vmin=-0.5, vmax=0.5, relative_label_locations=[0.5], labels=['loop anchor'])
If you don’t just want to plot the aggregate matrix, but work with its values, you can access
it as a
numpy array via
m = fixed_am.matrix() type(m) # numpy.ndarray
All aggregate functions discussed above have a
keep_components parameter, which is
by default. This way, all the matrix that have been extracted from regions or region pairs are
stored inside the aggregate matrix object. You can access them via
for component_matrix in fixed_am.components(): shape = component_matrix.shape # 51, 51 # ...
These individual matrices can be very useful to debug a troublesome aggregate plot, because often
it is unusual outlier matrices that have a large effect on the plot. They are simply
arrays, which you can plot with
matplotlib, for example.