TADs and TAD boundaries¶
To follow this tutorial, download the FAN-C example data, for example through our
Keeper library. Then run the
fanc auto command in Example analysis to generate all the
necessary files, and set up your Python session like this:
import fanc import fanc.plotting as fancplot import logging logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") hic_100kb = fanc.load("output/hic/binned/fanc_example_100kb.hic")
If you want to try the tutorial with an equivalent Cooler file, load the Hi-C file like this instead:
hic_100kb = fanc.load("architecture/other-hic/fanc_example.mcool@100kb")
or like this if you want to work with a Juicer file built from the same data:
hic_100kb = fanc.load("architecture/other-hic/fanc_example.juicer.hic@100kb")
Note that there may be minor differences in the results due to the “zooming” and balancing applied by the different tools.
Also have a look at the command line documentation at Hi-C domain analysis, which explains a lot of the principles behind insulation scores and directionality indexes with a couple of helpful illustrations.
insulation = fanc.InsulationScores.from_hic(hic_100kb, [5000000, 1000000, 1500000, 2000000, 2500000, 3000000, 3500000, 4000000], file_name="architecture/domains/fanc_example_100kb.insulation")
The file created in
file_name can later be easily loaded with
The resulting object stores all insulation score tracks for the
provided. It is derived from the
RegionsTable class, which is
RegionBased. There is a lot of convenient functionality
bundled in these objects, which you can read about in RegionBased.
You can access the calculated scores by using the
scores() function, which requires
the window size of the scores you want to retrieve, and returns a list of scores,
one for each genomic region in the object:
scores = insulation.scores(1000000)
However, a more flexible way to retrieve scores, which allows things like subsetting and
retrieving region information at the same time, is the
iterator. Scores are stored as a region attribute
for region in insulation.regions('chr18'): score = region.insulation_1000000 # do something with score # ...
If you prefer to have a dedicated object for a particular window size, you can use
score_regions(). The score is then
accessible via the
insulation_1mb = insulation.score_regions(1000000) for r in insulation_1mb.regions: score = r.score
Insulation calculation parameters¶
from_hic has several parameters that
allow you to customise the insulation score calculation. You can read everything them in the
linked API reference, and we will highlight the most important ones here.
impute_missing allows you to replace missing or masked pixels with their expected values
prior to the insulation score calculation. While this avoids dealing with missing pixels,
this can be misleading in case larger regions are unmappable or have been filtered out for
some other reason, so enable this setting with caution. The related parameter
na_threshold instead controls how much of a sliding window can be covered by missing
pixels before assigning it NaN by default. This is set to
0.5 by default - by increasing
this value you will obtain more robust scores, but also the number of NaNs will increase in
your results. Whatever value you choose, it is a tradeoff between information content and
accuracy. In most cases you should be okay with the default setting.
normalise, which is enabled by default, divides insulation scores by their chromosome
mean, so every score is expressed relative to the expected score on the same chromosome. By
disabling this option, you will get raw sums of weights in the sliding window. Normalisation
is stringly recommended, however. You can also normalise to a more local region instead of the
whole chromosome, which may account for local variations in contact intensity other than
insulation. For this purpose, set
normalisation_window to a number of bins, for example
300, which is then the number of scores surrounding each region used for normalisation by
After their computation and normalisation, insulation scores are log2-transformed. This
makes scores (roughly) symmetrical around 0. To disable this, for example if you have also
The original insulation score definition uses the arithmetic mean to normalise the scores.
If you expect insulation score outliers in your data, which might affect the normalisation by
the mean, you can use a trimmed arithmetic mean instead by setting
to a value between 0 and 1. The top fraction of insulation scores is then discarded for
calculating the mean.
If you intend to compare scores from different samples (and have the
log options enabled), a mathematically more accurate normalisation would be to use the
geometric mean instead. You can enable this by setting
Insulation score export¶
Insulation score plots¶
You can plot all calculated insulation scores using
p = fancplot.GenomicVectorArrayPlot(insulation, colormap='RdBu_r', vmin=-1, vmax=1, genomic_format=True) p.plot('chr18:18mb-28mb')
If you only want to plot a single score track, use
p = fancplot.LinePlot(insulation, ylim=(-1, 1), colors=['darkturquoise'], style="mid", attribute="insulation_1000000") p.plot('chr18:18mb-28mb')
You can read more about different plot types in Plotting (API).
boundaries = fanc.Boundaries.from_insulation_score(insulation, window_size=1000000)
This finds all minima in the insulation score vector and returns the corresponding regions
RegionsTable object, which is RegionBased. The
boundary strength is stored in the
score attribute of each region:
for boundary in boundaries.regions: score = boundary.score
By default, all minima are reported, including very weak and likely false-positive boundaries. We recommend plotting the boundaries and their strength alongside the Hi-C matrix and then decide on a score cutoff to only select a robust set of boundaries!
ph = fancplot.TriangularMatrixPlot(hic_100kb, max_dist=5000000, vmin=0, vmax=0.05) pb = fancplot.BarPlot(boundaries) f = fancplot.GenomicFigure([ph, pb]) fig, axes = f.plot('chr18:18mb-28mb')
You can filter out false-positives manually like this
min_score = 1.0 robust_boundaries =  for boundary in boundaries.regions: score = boundary.score if score >= min_score: robust_boundaries.append(boundary)
or by simply using the
robust_boundaries = fanc.Boundaries.from_insulation_score(insulation, window_size=1000000, min_score=1.0)
For a short explanation of the directionality index, please read Directionality Index.
The principle to compute the directionality index is highly similar to the insulation score.
The dedicated class
DirectionalityIndexes handles all
relevant actions. Use
compute it for multiple window sizes:
directionality = fanc.DirectionalityIndexes.from_hic(hic_100kb, [1000000, 1500000, 2000000, 2500000, 3000000, 3500000, 4000000], file_name="architecture/domains/fanc_example_100kb.di")
Plotting also works the same way, for all indexes:
p = fancplot.GenomicVectorArrayPlot(directionality, colormap='RdBu_r', vmin=-0.1, vmax=0.1, genomic_format=True) p.plot('chr18:18mb-28mb')
And for single a single directionality index:
p = fancplot.LinePlot(directionality, ylim=(-0.1, 0.1), colors=['darkturquoise'], style="mid", attribute="directionality_4000000") p.plot('chr18:18mb-28mb')