Peaks module¶

class
fanc.peaks.
DistancePeakFilter
(cutoff=1, mask=None)¶ Bases:
fanc.peaks.PeakFilter
Filter for peaks where regions are closer than this cutoff in bins.

valid
(row)¶ Map valid_peak to MaskFilter.valid(self, row).
Parameters: row – A pytables Table row. Returns: The boolean value returned by valid_edge.

valid_peak
(peak)¶ Evaluate whether a peak passes FDR cutoffs set in __init__ :param peak: An
Edge
object :return: True if peak passes internal FDR cutoffs, False otherwise


class
fanc.peaks.
EnrichmentPeakFilter
(enrichment_cutoff=None, enrichment_ll_cutoff=None, enrichment_h_cutoff=None, enrichment_v_cutoff=None, enrichment_d_cutoff=None, mask=None)¶ Bases:
fanc.peaks.PeakFilter
Filter peaks that do not have a sufficiently strong observed/expected ratio.

valid
(row)¶ Map valid_peak to MaskFilter.valid(self, row).
Parameters: row – A pytables Table row. Returns: The boolean value returned by valid_edge.

valid_peak
(peak)¶ Determine if a
RaoPeak
object is valid or should be filtered.When implementing custom PeakFilter this method must be overridden. It should return False for
RaoPeak
objects that are to be fitered and True otherwise.Internally, the
RaoPeakInfo
object will iterate over all RaoPeak instances to determine their validity on an individual basis.Parameters: peak – A RaoPeak
objectReturns: True if PeakFilter
is valid, False otherwise


class
fanc.peaks.
FdrPeakFilter
(mask=None, fdr_cutoff=None, fdr_ll_cutoff=None, fdr_v_cutoff=None, fdr_h_cutoff=None, fdr_d_cutoff=None)¶ Bases:
fanc.peaks.PeakFilter
Filter for peaks that do not pass a certain FDR cutoff.

valid
(row)¶ Map valid_peak to MaskFilter.valid(self, row).
Parameters: row – A pytables Table row. Returns: The boolean value returned by valid_edge.

valid_peak
(peak)¶ Evaluate whether a peak passes FDR cutoffs set in __init__ :param peak: An
Edge
object :return: True if peak passes interal FDR cutoffs, False otherwise


class
fanc.peaks.
FdrSumFilter
(cutoff=1.0, mask=None)¶ Bases:
fanc.peaks.PeakFilter
Remove peaks that have a qvalue sum > cutoff.

valid
(row)¶ Map valid_peak to MaskFilter.valid(self, row).
Parameters: row – A pytables Table row. Returns: The boolean value returned by valid_edge.

valid_peak
(peak)¶ Determine if a
RaoPeak
object is valid or should be filtered.When implementing custom PeakFilter this method must be overridden. It should return False for
RaoPeak
objects that are to be fitered and True otherwise.Internally, the
RaoPeakInfo
object will iterate over all RaoPeak instances to determine their validity on an individual basis.Parameters: peak – A RaoPeak
objectReturns: True if PeakFilter
is valid, False otherwise


class
fanc.peaks.
LazyPeak
(row, nodes_table, bin_size=1)¶ Bases:
fanc.matrix.LazyEdge
Container for a Peak/enriched contact in a HiC matrix.
This class implements
LazyPeak
, which provides lazy loading of attributes from a PyTables table row.

class
fanc.peaks.
MappabilityPeakFilter
(mask=None, mappability_cutoff=None, mappability_ll_cutoff=None, mappability_v_cutoff=None, mappability_h_cutoff=None, mappability_d_cutoff=None)¶ Bases:
fanc.peaks.PeakFilter
Filter for peaks that do not pass a certain FDR cutoff.

valid
(row)¶ Map valid_peak to MaskFilter.valid(self, row).
Parameters: row – A pytables Table row. Returns: The boolean value returned by valid_edge.

valid_peak
(peak)¶ Evaluate whether a peak passes FDR cutoffs set in __init__ :param peak: An
Edge
object :return: True if peak passes interal FDR cutoffs, False otherwise


class
fanc.peaks.
ObservedPeakFilter
(cutoff=1, mask=None)¶ Bases:
fanc.peaks.PeakFilter
Filter for peaks that do not pass a certain FDR cutoff.

valid
(row)¶ Map valid_peak to MaskFilter.valid(self, row).
Parameters: row – A pytables Table row. Returns: The boolean value returned by valid_edge.

valid_peak
(peak)¶ Evaluate whether a peak passes FDR cutoffs set in __init__ :param peak: An
Edge
object :return: True if peak passes interal FDR cutoffs, False otherwise


class
fanc.peaks.
Peak
(source, sink, *args, **kwargs)¶ Bases:
fanc.matrix.Edge
Container for a Peak/enriched contact in a HiC matrix.

class
fanc.peaks.
PeakFilter
(mask=None)¶ Bases:
fanc.general.MaskFilter
Abstract class that provides filtering functionality for the peaks in a
RaoPeakInfo
object.Extends MaskFilter and overrides valid(self, row) to make
RaoPeakInfo
filtering more “natural”.To create custom filters for the
RapPeakInfo
object, extend this class and override the valid_peak(self, peak) method. valid_peak should return False for a specificEdge
object if the object is supposed to be filtered/masked and True otherwise. SeeDiagonalFilter
for an example.Pass a custom filter to the
filter()
method inHic
to apply it.
valid
(row)¶ Map valid_peak to MaskFilter.valid(self, row).
Parameters: row – A pytables Table row. Returns: The boolean value returned by valid_edge.

valid_peak
(peak)¶ Determine if a
RaoPeak
object is valid or should be filtered.When implementing custom PeakFilter this method must be overridden. It should return False for
RaoPeak
objects that are to be fitered and True otherwise.Internally, the
RaoPeakInfo
object will iterate over all RaoPeak instances to determine their validity on an individual basis.Parameters: peak – A RaoPeak
objectReturns: True if PeakFilter
is valid, False otherwise


class
fanc.peaks.
PeakInfo
(file_name=None, mode='a', tmpdir=None, _table_name_regions='regions', _table_name_peaks='edges')¶ Bases:
fanc.matrix.RegionMatrixTable
Generalpurpose class for recording peaks in Hic (and similar) data.
A peak has the following information: source, sink: coordinates of the highest peak pixel in the HiC matrix observed: observed value of the peak in the HiC matrix, generally uncorrected expected: expected value of the peak at this position in the HiC matrix p_value: a Pvalue that reflects how likely it is to observe a peak with these properties at random x, y: coordinates of the peak centroid, if it is larger than one pixel radius: radius of the peak, expressed in bins (can be converted to base pairs)

class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.

class
MaskDescription
¶ Bases:
tables.description.IsDescription

class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table

add_contact
(contact, *args, **kwargs)¶ Alias for
add_edge()
Parameters:  contact –
Edge
 args – Positional arguments passed to
_add_edge()
 kwargs – Keyword arguments passed to
_add_edge()
 contact –

add_contacts
(contacts, *args, **kwargs)¶ Alias for
add_edges()

add_edge
(edge, check_nodes_exist=True, *args, **kwargs)¶ Add an edge / contact between two regions to this object.
Parameters:  edge –
Edge
, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).  check_nodes_exist – Make sure that there are nodes that match source and sink indexes
 args – Positional arguments passed to
_add_edge()
 kwargs – Keyword arguments passed to
_add_edge()
 edge –

add_edge_from_dict
(edge, *args, **kwargs)¶ Direct method to add an edge from dict input.
Parameters: edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes

add_edge_from_edge
(edge, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.Parameters: edge – Edge

add_edge_from_list
(edge, *args, **kwargs)¶ Direct method to add an edge from list or tuple input.
Parameters: edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)

add_edge_simple
(source, sink, weight=None, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.Parameters:  source – Source region index
 sink – Sink region index
 weight – Weight of the edge

add_edges
(edges, flush=True, *args, **kwargs)¶ Bulkadd edges from a list.
List items can be any of the supported edge types, list, tuple, dict, or
Edge
. Repeatedly callsadd_edge()
, so may be inefficient for large amounts of data.Parameters: edges – List (or iterator) of edges. See add_edge()
for details

add_mask_description
(name, description)¶ Add a mask description to the _mask table and return its ID.
Parameters:  name (str) – name of the mask
 description (str) – description of the mask
Returns: id of the mask
Return type: int

add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
Parameters: region – Can be a GenomicRegion
, a str in the form ‘<chromosome>:<start><end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).

add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
Parameters: regions – List (or any iterator) with objects that describe a genomic region. See add_region
for options.

static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
Parameters:  intervals – iterator of tuples (start, end, score)
 bins – Number of bins to divide the region into
 interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)

static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
Parameters:  intervals – iterator of tuples (start, end, score)
 bin_size – Size of each bin in base pairs
 interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)

bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
Returns: int

binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.Parameters:  region – String or class:~GenomicRegion object denoting the region to be binned
 bins – Number of bins to divide the region into
 bin_size – Size of each bin (alternative to bins argument)
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
 args – Arguments passed to _region_intervals
 kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of
GenomicRegion
objects

bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
Parameters: bins – float, fraction of bins Returns: int, base pairs

chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is rangecompatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.

chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.

close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in readonly mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
Parameters:  copy_tmp – If False, does not overwrite original with modified file.
 remove_tmp – If False, does not delete temporary copy of file.

distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
Parameters: distance – distance in base pairs Returns: float, distance as fraction of bin size

downsample
(n, file_name=None)¶ Sample edges from this object.
Sampling is always done on uncorrected HiC matrices.
Parameters:  n – Sample size or reference object. If n < 1 will be interpreted as a fraction of total reads in this object.
 file_name – Output file name for downsampled object.
Returns: RegionPairsTable

edge_data
(attribute, *args, **kwargs)¶ Iterate over specific edge attribute.
Parameters:  attribute – Name of the attribute, e.g. “weight”
 args – Positional arguments passed to
edges()
 kwargs – Keyword arguments passed to
edges()
Returns: iterator over edge attribute

edge_subset
(key=None, *args, **kwargs)¶ Get a subset of edges.
This is an alias for
edges()
.Returns: generator ( Edge
)

edges
¶ Iterate over contacts / edges.
edges()
is the central function ofRegionPairsContainer
. Here, we will use theHic
implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementingRegionPairsContainer
, includingJuicerHic
andCoolerHic
.import fanc # file from FANC examples hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")
We can easily find the number of edges in the sample
Hic
object:len(hic.edges) # 8695
When used in an iterator context,
edges()
iterates over all edges in theRegionPairsContainer
:for edge in hic.edges: # do something with edge print(edge) # 4242; bias: 5.797788472650082e05; sink_node: chr18:4200000143000000; source_node: chr18:4200000143000000; weight: 0.12291311562018173 # 2428; bias: 6.496381719803623e05; sink_node: chr18:2800000129000000; source_node: chr18:2400000125000000; weight: 0.025205961072838057 # 576; bias: 0.00010230955745211447; sink_node: chr18:7600000177000000; source_node: chr18:50000016000000; weight: 0.00961709840049876 # 6668; bias: 8.248432587969082e05; sink_node: chr18:6800000169000000; source_node: chr18:6600000167000000; weight: 0.03876763316345468 # ...
Calling
edges()
as a method has the same effect:# note the '()' for edge in hic.edges(): # do something with edge print(edge) # 4242; bias: 5.797788472650082e05; sink_node: chr18:4200000143000000; source_node: chr18:4200000143000000; weight: 0.12291311562018173 # 2428; bias: 6.496381719803623e05; sink_node: chr18:2800000129000000; source_node: chr18:2400000125000000; weight: 0.025205961072838057 # 576; bias: 0.00010230955745211447; sink_node: chr18:7600000177000000; source_node: chr18:50000016000000; weight: 0.00961709840049876 # 6668; bias: 8.248432587969082e05; sink_node: chr18:6800000169000000; source_node: chr18:6600000167000000; weight: 0.03876763316345468 # ...
Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a
GenomicRegion
, all nonzero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings orGenomicRegion
, only edges between the two regions are returned.# select all edges between chromosome 19 # and any other region: for edge in hic.edges("chr19"): print(edge) # 49106; bias: 0.00026372303696871666; sink_node: chr19:2700000128000000; source_node: chr18:4900000150000000; weight: 0.003692122517562033 # 682; bias: 0.00021923129703834945; sink_node: chr19:30000014000000; source_node: chr18:60000017000000; weight: 0.0008769251881533978 # 47107; bias: 0.00012820949175399097; sink_node: chr19:2800000129000000; source_node: chr18:4700000148000000; weight: 0.0015385139010478917 # 38112; bias: 0.0001493344481069762; sink_node: chr19:3300000134000000; source_node: chr18:3800000139000000; weight: 0.0005973377924279048 # ... # select all edges that are only on # chromosome 19 for edge in hic.edges(('chr19', 'chr19')): print(edge) # 90116; bias: 0.00021173151730025176; sink_node: chr19:3700000138000000; source_node: chr19:1100000112000000; weight: 0.009104455243910825 # 135135; bias: 0.00018003890596887822; sink_node: chr19:5600000157000000; source_node: chr19:5600000157000000; weight: 0.10028167062466517 # 123123; bias: 0.00011063368998965993; sink_node: chr19:4400000145000000; source_node: chr19:4400000145000000; weight: 0.1386240135570439 # 9293; bias: 0.00040851066434864896; sink_node: chr19:1400000115000000; source_node: chr19:1300000114000000; weight: 0.10090213409411629 # ... # select interchromosomal edges # between chromosomes 18 and 19 for edge in hic.edges(('chr18', 'chr19')): print(edge) # 49106; bias: 0.00026372303696871666; sink_node: chr19:2700000128000000; source_node: chr18:4900000150000000; weight: 0.003692122517562033 # 682; bias: 0.00021923129703834945; sink_node: chr19:30000014000000; source_node: chr18:60000017000000; weight: 0.0008769251881533978 # 47107; bias: 0.00012820949175399097; sink_node: chr19:2800000129000000; source_node: chr18:4700000148000000; weight: 0.0015385139010478917 # 38112; bias: 0.0001493344481069762; sink_node: chr19:3300000134000000; source_node: chr18:3800000139000000; weight: 0.0005973377924279048 # ...
By default,
edges()
will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all filebased FANCRegionPairsContainer
objects support lazy loading, where attributes are only read on demand.for edge in hic.edges('chr18', lazy=True): print(edge.source, edge.sink, edge.weight, edge) # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0> # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1> # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2> # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3> # ...
Warning
The lazy iterator reuses the
LazyEdge
object in every iteration, and overwrites theLazyEdge
attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expectedlist(hic.edges())
, with allEdge
objects stored in the list, while this codelist(hic.edges(lazy=True))
will result in a list of identicalLazyEdge
objects. Always ensure you do all edge processing in the loop when working with lazy iterators!When working with normalised contact frequencies, such as obtained through matrix balancing in the example above,
edges()
automatically returns normalised edge weights. In addition, thebias
attribute will (typically) have a value different from 1.When you are interested in the raw contact frequency, use the
norm=False
parameter:for edge in hic.edges('chr18', lazy=True, norm=False): print(edge.source, edge.sink, edge.weight) # 42 42 2120.0 # 24 28 388.0 # 5 76 94.0 # 66 68 470.0 # ...
You can also choose to omit all intra or interchromosomal edges using
intra_chromosomal=False
orinter_chromosomal=False
, respectively.Returns: Iterator over Edge
or equivalent.

edges_dict
(*args, **kwargs)¶ Edges iterator with access by bracket notation.
This iterator always returns unnormalised edges.
Returns: dict or dictlike iterator

expected_values
(selected_chromosome=None, norm=True, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genomewide intrachromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intrachromosomal expected values specific to each chromosome, and a float for interchromosomal expected value.
Parameters:  selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
 norm – If False, will calculate the expected values on the unnormalised matrix.
 args – Not used in this context
 kwargs – Not used in this context
Returns: list of intrachromosomal expected values, dict of intrachromosomal expected values by chromosome, interchromosomal expected value

expected_values_and_marginals
(selected_chromosome=None, norm=True, force=False, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances and the whole matrix marginals.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genomewide intrachromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intrachromosomal expected values specific to each chromosome, and a float for interchromosomal expected value.
Parameters:  selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
 norm – If False, will calculate the expected values on the unnormalised matrix.
 args – Not used in this context
 kwargs – Not used in this context
Returns: list of intrachromosomal expected values, dict of intrachromosomal expected values by chromosome, interchromosomal expected value

filter
(edge_filter, queue=False, log_progress=True)¶ Filter edges in this object by using a
MaskFilter
.Parameters:  edge_filter – Class implementing
MaskFilter
.  queue – If True, filter will be queued and can be executed
along with other queued filters using
run_queued_filters()
 log_progress – If true, process iterating through all edges will be continuously reported.
 edge_filter – Class implementing

filter_rao
(queue=False)¶ Convenience function that applies a
RaoMergedPeakFilter
.Parameters: queue – If True, filter will be queued and can be executed along with other queued filters using run_queued_filters

find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
Parameters: query_regions – Region selector string, :class:~GenomicRegion, or list of the former Returns: index (or list of indexes) of the region at the center of the query region

flush
(silent=False, update_mappability=True)¶ Write data to file and flush buffers.
Parameters:  silent – do not print flush progress
 update_mappability – After writing data, update mappability and expected values

get_mask
(key)¶ Search _mask table for key and return Mask.
Parameters:  key (int) – search by mask name
 key – search by mask ID
Returns: Mask

get_masks
(ix)¶ Extract mask IDs encoded in parameter and return masks.
IDs are powers of 2, so a single int field in the table can hold multiple masks by simply adding up the IDs. Similar principle to UNIX chmod (although that uses base 8)
Parameters: ix (int) – integer that is the sum of powers of 2. Note that this value is not necessarily itself a power of 2. Returns: list of Masks extracted from ix Return type: list (Mask)

intervals
(*args, **kwargs)¶ Alias for region_intervals.

mappable
(region=None)¶ Get the mappability of regions in this object.
A “mappable” region has at least one contact to another region in the genome.
Returns: array
where True means mappable and False unmappable

marginals
(masked=True, *args, **kwargs)¶ Get the marginals vector of this Hic matrix.
Sums up all contacts for each bin of the HiC matrix. Unmappable regoins will be masked in the returned vector unless the
masked
parameter is set toFalse
.By default, corrected matrix entries are summed up. To get uncorrected matrix marginals use
norm=False
. Generally, all parameters accepted byedges()
are supported.Parameters:  masked – Use a numpy masked array to mask entries corresponding to unmappable regions
 kwargs – Keyword arguments passed to
edges()

matrix
(key=None, log=False, default_value=None, mask=True, log_base=2, *args, **kwargs)¶ Assemble a
RegionMatrix
from region pairs.Parameters:  key – Matrix selector. See
edges()
for all supported key types  log – If True, logtransform the matrix entries. Also see log_base
 log_base – Base of the log transformation. Default: 2; only used when log=True
 default_value – (optional) set the default value of matrix entries that have no associated edge/contact
 mask – If False, do not mask unmappable regions
 args – Positional arguments passed to
regions_and_matrix_entries()
 kwargs – Keyword arguments passed to
regions_and_matrix_entries()
Returns:  key – Matrix selector. See

classmethod
merge
(matrices, *args, **kwargs)¶ Merge multiple
RegionMatrixContainer
objects.Merging is done by adding the weight of edges in each object.
Parameters: matrices – list of RegionMatrixContainer
Returns: merged RegionMatrixContainer

possible_contacts
()¶ Calculate the possible number of contacts in the genome.
This calculates the number of potential region pairs in a genome for any possible separation distance, taking into account the existence of unmappable regions.
It will calculate one number for interchromosomal pairs, return a list with the number of possible pairs where the list index corresponds to the number of bins separating two regions, and a dictionary of lists for each chromosome.
Returns: possible intrachromosomal pairs, possible intrachromosomal pairs by chromosome, possible interchromosomal pairs

region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
Parameters: args – provide a GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.Returns:

region_data
(key, value=None)¶ Retrieve or add vectordata to this object. If there is existing data in this object with the same name, it will be replaced
Parameters:  key – Name of the data column
 value – vector with regionbased data (one entry per region)

region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equallysized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
Parameters:  region – String or class:~GenomicRegion object denoting the region to be binned
 bins – Number of bins to divide the region into
 bin_size – Size of each bin (alternative to bins argument)
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
 args – Arguments passed to _region_intervals
 kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of tuples: (start, end, score)

region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
Parameters: region – String or class:~GenomicRegion object for which covered bins will be returned.

regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.Returns: RegionIter

regions_and_edges
(key, *args, **kwargs)¶ Convenient access to regions and edges selected by key.
Parameters:  key – Edge selector, see
edges()
 args – Positional arguments passed to
edges()
 kwargs – Keyword arguments passed to
edges()
Returns: list of row regions, list of col regions, iterator over edges
 key – Edge selector, see

regions_and_matrix_entries
(key=None, score_field=None, *args, **kwargs)¶ Convenient access to nonzero matrix entries and associated regions.
Parameters:  key – Edge key, see
edges()
 oe – If True, will divide observed values by their expected value at the given distance. False by default
 oe_per_chromosome – If True (default), will do a perchromosome O/E calculation rather than using the whole matrix to obtain expected values
 score_field – (optional) any edge attribute that returns a number
can be specified here for filling the matrix. Usually
this is defined by the
_default_score_field
attribute of the matrix class.  args – Positional arguments passed to
edges()
 kwargs – Keyword arguments passed to
edges()
Returns: list of row regions, list of col regions, iterator over (i, j, weight) tuples
 key – Edge key, see

regions_dict
¶ Return a dictionary with region index as keys and regions as values.
Returns: dict {region.ix: region, …}

static
regions_identical
(pairs)¶ Check if the regions in all objects in the list are identical.
Parameters: pairs – list
ofRegionBased
objectsReturns: True if chromosome, start, and end are identical between all regions in the same list positions.

run_queued_filters
(log_progress=True)¶ Run queued filters.
Parameters: log_progress – If true, process iterating through all edges will be continuously reported.

scaling_factor
(matrix, weight_column=None)¶ Compute the scaling factor to another matrix.
Calculates the ratio between the number of contacts in this Hic object to the number of contacts in another Hic object.
Parameters:  matrix – A
Hic
object  weight_column – Name of the column to calculate the scaling factor on
Returns: float
 matrix – A

subset
(*regions, **kwargs)¶ Subset a Hic object by specifying one or more subset regions.
Parameters:  regions – string or GenomicRegion object(s)
 kwargs – Supports
file_name: destination file name of subset Hic object;
tmpdir: if True works in tmp until object is closed
additional parameters are passed to
edges()
Returns: Hic

to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_bed()

to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_bigwig()

to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_gff()

class

class
fanc.peaks.
RaoMergedPeakFilter
(cutoff=0.02, mask=None)¶ Bases:
fanc.peaks.PeakFilter
Filter merged peaks exactly the same way that Rao et al. (2014) do.
It removes peaks that are singlets and have a qvalue sum >.02.

valid
(row)¶ Map valid_peak to MaskFilter.valid(self, row).
Parameters: row – A pytables Table row. Returns: The boolean value returned by valid_edge.

valid_peak
(peak)¶ Determine if a
RaoPeak
object is valid or should be filtered.When implementing custom PeakFilter this method must be overridden. It should return False for
RaoPeak
objects that are to be fitered and True otherwise.Internally, the
RaoPeakInfo
object will iterate over all RaoPeak instances to determine their validity on an individual basis.Parameters: peak – A RaoPeak
objectReturns: True if PeakFilter
is valid, False otherwise


class
fanc.peaks.
RaoPeakCaller
(p=None, w_init=None, min_locus_dist=None, max_w=20, min_ll_reads=16, process_inter=False, correct_inter='fdr', n_processes=4, slice_size=2000, min_mappable_fraction=0.7, cluster=False)¶ Bases:
object
Class that calls peaks the same way Rao et al. (2014) propose.
Every pixel in a HiC matrix is evaluated based on its local “neighborhood”, i.e. the pixel’s observed value is compared to the expected value calculated from all pixels in its close surroundings.
If a pixel is significantly enriched with respect to all investigated neighborhoods, it is assumed to be a “peak”.
Four neighborhood types are calculated:
 “donut”: Pixels surrounding the investigated pixel in a certain distance range
 “lowerleft” Pixels to the “lowerleft” of a given pixel
 “horizontal”: Pixels left and right of a given pixel
 “vertical”: Pixels above and below a given pixel
While the first neighborhood type most generally calculates enrichment of local background, the other types of neighborhoods serve mostly to exclude falsepositive results from nonpeak structures, such as TAD boundaries.
The
RaoPeakCaller
is initialized with the peak calling parameters and run usingcall_peaks()
.FDRs for intrachromosomal peaks are automatically corrected for multiple testing using the “lamdachunking” methodology introduced in Rao et al. 2014. FDRs for interchromosomal peaks are corrected by default using the Benjamini Hochberg falsediscovery rate correction (but ‘bonferroni’ is also an option)

call_peaks
(hic, chromosome_pairs=None, file_name=None, intra_expected=None, inter_expected=None)¶ Call peaks in HiC matrix.
This method will determine each pixel’s likelihood to be a “true” peak. By default, only pixels with nonzero count and an observed/expected ratio > 1.0 for each neighborhood will be reported, because these can by definition not be true peaks.
The peak calling behavior can be influenced by modifying the object attributes set when initializing
RaoPeakCaller
.Parameters:  hic – A
Hic
object  chromosome_pairs – If None, all chromosome pairs will be investigated for peaks. Otherwise specify a list of chromosome name tuples (e.g. [(‘chr1’, ‘chr1’), (‘chr1’, ‘chr3’), …])
 file_name – An optional filename that backs the returned
RaoPeakInfo
object  intra_expected – A dict of the form <chromosome>:<list of expected values> to override expected value calculation
 inter_expected – A float describing the expected value for interchromosomal contact matrix entries
Returns: RaoPeakInfo
object hic – A

static
e_d
(m, i, j, e, w=1, p=0)¶ Compute the average value of pixels in the “donut” neighborhood of a pixel.

static
e_h
(m, i, j, e, w=1, p=0)¶ Compute the average value of pixels in the horizontal neighborhood of a pixel.

static
e_ll
(m, i, j, e, w=1, p=0)¶ Compute the average value of pixels in the lowerleft neighborhood of a pixel.

static
e_v
(m, i, j, e, w=1, p=0)¶ Compute the average value of pixels in the vertical neighborhood of a pixel.

static
find_chunk
(value, chunk_func=<function RaoPeakCaller.<lambda>>)¶ Use bisection to find a matching lambda chunk for a given expected value.

static
ll_sum
(m, i, j, w=1, p=0)¶ Compute the sum of pixels in the lowerleft neighborhood of a pixel.

class
fanc.peaks.
RaoPeakFilter
(mask=None)¶ Bases:
fanc.peaks.PeakFilter
Filter peaks exactly the same way that Rao et al. (2014) do.
It only retains peaks that
 are at least 2fold enriched over either the donut or lowerleft neighborhood
 are at least 1.5fold enriched over the horizontal and vertical neighborhoods
 are at least 1.75fold enriched over both the donut and lowerleft neighborhood
 have an FDR <= 0.1 in every neighborhood

valid
(row)¶ Map valid_peak to MaskFilter.valid(self, row).
Parameters: row – A pytables Table row. Returns: The boolean value returned by valid_edge.

valid_peak
(peak)¶ Determine if a
RaoPeak
object is valid or should be filtered.When implementing custom PeakFilter this method must be overridden. It should return False for
RaoPeak
objects that are to be fitered and True otherwise.Internally, the
RaoPeakInfo
object will iterate over all RaoPeak instances to determine their validity on an individual basis.Parameters: peak – A RaoPeak
objectReturns: True if PeakFilter
is valid, False otherwise

class
fanc.peaks.
RaoPeakInfo
(file_name=None, mode='a', tmpdir=None, _table_name_regions='regions', _table_name_peaks='edges')¶ Bases:
fanc.matrix.RegionMatrixTable
Information about peaks called by
RaoPeakCaller
.A peak has the following information:
source, sink: coordinates of the highest peak pixel in the HiC matrix observed: observed value of the peak in the HiC matrix, generally uncorrected e_ll: expected value of the peak given its lowerleft neighborhood e_h: expected value of the peak given its horizontal neighborhood e_v: expected value of the peak given its vertical neighborhood e_d: expected value of the peak given its surrounding (donut) neighborhood e_ll_chunk: “lambdachunk” this peak falls into given its ‘ll’ neighborhood e_h_chunk: “lambdachunk” this peak falls into given its ‘h’ neighborhood e_v_chunk: “lambdachunk” this peak falls into given its ‘v’ neighborhood e_d_chunk: “lambdachunk” this peak falls into given its ‘d’ neighborhood fdr_ll: FDR of the peak given its lowerleft neighborhood fdr_h: FDR of the peak given its horizontal neighborhood fdr_v: FDR of the peak given its vertical neighborhood fdr_d: FDR of the peak given its surrounding (donut) neighborhood
For more information about neighborhoods and peak information, see
RaoPeakCaller
.
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.

class
MaskDescription
¶ Bases:
tables.description.IsDescription

class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table

add_contact
(contact, *args, **kwargs)¶ Alias for
add_edge()
Parameters:  contact –
Edge
 args – Positional arguments passed to
_add_edge()
 kwargs – Keyword arguments passed to
_add_edge()
 contact –

add_contacts
(contacts, *args, **kwargs)¶ Alias for
add_edges()

add_edge
(edge, check_nodes_exist=True, *args, **kwargs)¶ Add an edge / contact between two regions to this object.
Parameters:  edge –
Edge
, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).  check_nodes_exist – Make sure that there are nodes that match source and sink indexes
 args – Positional arguments passed to
_add_edge()
 kwargs – Keyword arguments passed to
_add_edge()
 edge –

add_edge_from_dict
(edge, *args, **kwargs)¶ Direct method to add an edge from dict input.
Parameters: edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes

add_edge_from_edge
(edge, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.Parameters: edge – Edge

add_edge_from_list
(edge, *args, **kwargs)¶ Direct method to add an edge from list or tuple input.
Parameters: edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)

add_edge_simple
(source, sink, weight=None, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.Parameters:  source – Source region index
 sink – Sink region index
 weight – Weight of the edge

add_edges
(edges, flush=True, *args, **kwargs)¶ Bulkadd edges from a list.
List items can be any of the supported edge types, list, tuple, dict, or
Edge
. Repeatedly callsadd_edge()
, so may be inefficient for large amounts of data.Parameters: edges – List (or iterator) of edges. See add_edge()
for details

add_mask_description
(name, description)¶ Add a mask description to the _mask table and return its ID.
Parameters:  name (str) – name of the mask
 description (str) – description of the mask
Returns: id of the mask
Return type: int

add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
Parameters: region – Can be a GenomicRegion
, a str in the form ‘<chromosome>:<start><end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).

add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
Parameters: regions – List (or any iterator) with objects that describe a genomic region. See add_region
for options.

static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
Parameters:  intervals – iterator of tuples (start, end, score)
 bins – Number of bins to divide the region into
 interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)

static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
Parameters:  intervals – iterator of tuples (start, end, score)
 bin_size – Size of each bin in base pairs
 interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)

bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
Returns: int

binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.Parameters:  region – String or class:~GenomicRegion object denoting the region to be binned
 bins – Number of bins to divide the region into
 bin_size – Size of each bin (alternative to bins argument)
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
 args – Arguments passed to _region_intervals
 kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of
GenomicRegion
objects

bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
Parameters: bins – float, fraction of bins Returns: int, base pairs

chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is rangecompatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.

chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.

close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in readonly mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
Parameters:  copy_tmp – If False, does not overwrite original with modified file.
 remove_tmp – If False, does not delete temporary copy of file.

distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
Parameters: distance – distance in base pairs Returns: float, distance as fraction of bin size

downsample
(n, file_name=None)¶ Sample edges from this object.
Sampling is always done on uncorrected HiC matrices.
Parameters:  n – Sample size or reference object. If n < 1 will be interpreted as a fraction of total reads in this object.
 file_name – Output file name for downsampled object.
Returns: RegionPairsTable

edge_data
(attribute, *args, **kwargs)¶ Iterate over specific edge attribute.
Parameters:  attribute – Name of the attribute, e.g. “weight”
 args – Positional arguments passed to
edges()
 kwargs – Keyword arguments passed to
edges()
Returns: iterator over edge attribute

edge_subset
(key=None, *args, **kwargs)¶ Get a subset of edges.
This is an alias for
edges()
.Returns: generator ( Edge
)

edges
¶ Iterate over contacts / edges.
edges()
is the central function ofRegionPairsContainer
. Here, we will use theHic
implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementingRegionPairsContainer
, includingJuicerHic
andCoolerHic
.import fanc # file from FANC examples hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")
We can easily find the number of edges in the sample
Hic
object:len(hic.edges) # 8695
When used in an iterator context,
edges()
iterates over all edges in theRegionPairsContainer
:for edge in hic.edges: # do something with edge print(edge) # 4242; bias: 5.797788472650082e05; sink_node: chr18:4200000143000000; source_node: chr18:4200000143000000; weight: 0.12291311562018173 # 2428; bias: 6.496381719803623e05; sink_node: chr18:2800000129000000; source_node: chr18:2400000125000000; weight: 0.025205961072838057 # 576; bias: 0.00010230955745211447; sink_node: chr18:7600000177000000; source_node: chr18:50000016000000; weight: 0.00961709840049876 # 6668; bias: 8.248432587969082e05; sink_node: chr18:6800000169000000; source_node: chr18:6600000167000000; weight: 0.03876763316345468 # ...
Calling
edges()
as a method has the same effect:# note the '()' for edge in hic.edges(): # do something with edge print(edge) # 4242; bias: 5.797788472650082e05; sink_node: chr18:4200000143000000; source_node: chr18:4200000143000000; weight: 0.12291311562018173 # 2428; bias: 6.496381719803623e05; sink_node: chr18:2800000129000000; source_node: chr18:2400000125000000; weight: 0.025205961072838057 # 576; bias: 0.00010230955745211447; sink_node: chr18:7600000177000000; source_node: chr18:50000016000000; weight: 0.00961709840049876 # 6668; bias: 8.248432587969082e05; sink_node: chr18:6800000169000000; source_node: chr18:6600000167000000; weight: 0.03876763316345468 # ...
Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a
GenomicRegion
, all nonzero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings orGenomicRegion
, only edges between the two regions are returned.# select all edges between chromosome 19 # and any other region: for edge in hic.edges("chr19"): print(edge) # 49106; bias: 0.00026372303696871666; sink_node: chr19:2700000128000000; source_node: chr18:4900000150000000; weight: 0.003692122517562033 # 682; bias: 0.00021923129703834945; sink_node: chr19:30000014000000; source_node: chr18:60000017000000; weight: 0.0008769251881533978 # 47107; bias: 0.00012820949175399097; sink_node: chr19:2800000129000000; source_node: chr18:4700000148000000; weight: 0.0015385139010478917 # 38112; bias: 0.0001493344481069762; sink_node: chr19:3300000134000000; source_node: chr18:3800000139000000; weight: 0.0005973377924279048 # ... # select all edges that are only on # chromosome 19 for edge in hic.edges(('chr19', 'chr19')): print(edge) # 90116; bias: 0.00021173151730025176; sink_node: chr19:3700000138000000; source_node: chr19:1100000112000000; weight: 0.009104455243910825 # 135135; bias: 0.00018003890596887822; sink_node: chr19:5600000157000000; source_node: chr19:5600000157000000; weight: 0.10028167062466517 # 123123; bias: 0.00011063368998965993; sink_node: chr19:4400000145000000; source_node: chr19:4400000145000000; weight: 0.1386240135570439 # 9293; bias: 0.00040851066434864896; sink_node: chr19:1400000115000000; source_node: chr19:1300000114000000; weight: 0.10090213409411629 # ... # select interchromosomal edges # between chromosomes 18 and 19 for edge in hic.edges(('chr18', 'chr19')): print(edge) # 49106; bias: 0.00026372303696871666; sink_node: chr19:2700000128000000; source_node: chr18:4900000150000000; weight: 0.003692122517562033 # 682; bias: 0.00021923129703834945; sink_node: chr19:30000014000000; source_node: chr18:60000017000000; weight: 0.0008769251881533978 # 47107; bias: 0.00012820949175399097; sink_node: chr19:2800000129000000; source_node: chr18:4700000148000000; weight: 0.0015385139010478917 # 38112; bias: 0.0001493344481069762; sink_node: chr19:3300000134000000; source_node: chr18:3800000139000000; weight: 0.0005973377924279048 # ...
By default,
edges()
will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all filebased FANCRegionPairsContainer
objects support lazy loading, where attributes are only read on demand.for edge in hic.edges('chr18', lazy=True): print(edge.source, edge.sink, edge.weight, edge) # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0> # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1> # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2> # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3> # ...
Warning
The lazy iterator reuses the
LazyEdge
object in every iteration, and overwrites theLazyEdge
attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expectedlist(hic.edges())
, with allEdge
objects stored in the list, while this codelist(hic.edges(lazy=True))
will result in a list of identicalLazyEdge
objects. Always ensure you do all edge processing in the loop when working with lazy iterators!When working with normalised contact frequencies, such as obtained through matrix balancing in the example above,
edges()
automatically returns normalised edge weights. In addition, thebias
attribute will (typically) have a value different from 1.When you are interested in the raw contact frequency, use the
norm=False
parameter:for edge in hic.edges('chr18', lazy=True, norm=False): print(edge.source, edge.sink, edge.weight) # 42 42 2120.0 # 24 28 388.0 # 5 76 94.0 # 66 68 470.0 # ...
You can also choose to omit all intra or interchromosomal edges using
intra_chromosomal=False
orinter_chromosomal=False
, respectively.Returns: Iterator over Edge
or equivalent.

edges_dict
(*args, **kwargs)¶ Edges iterator with access by bracket notation.
This iterator always returns unnormalised edges.
Returns: dict or dictlike iterator

expected_values
(selected_chromosome=None, norm=True, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genomewide intrachromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intrachromosomal expected values specific to each chromosome, and a float for interchromosomal expected value.
Parameters:  selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
 norm – If False, will calculate the expected values on the unnormalised matrix.
 args – Not used in this context
 kwargs – Not used in this context
Returns: list of intrachromosomal expected values, dict of intrachromosomal expected values by chromosome, interchromosomal expected value

expected_values_and_marginals
(selected_chromosome=None, norm=True, force=False, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances and the whole matrix marginals.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genomewide intrachromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intrachromosomal expected values specific to each chromosome, and a float for interchromosomal expected value.
Parameters:  selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
 norm – If False, will calculate the expected values on the unnormalised matrix.
 args – Not used in this context
 kwargs – Not used in this context
Returns: list of intrachromosomal expected values, dict of intrachromosomal expected values by chromosome, interchromosomal expected value

filter
(edge_filter, queue=False, log_progress=True)¶ Filter edges in this object by using a
MaskFilter
.Parameters:  edge_filter – Class implementing
MaskFilter
.  queue – If True, filter will be queued and can be executed
along with other queued filters using
run_queued_filters()
 log_progress – If true, process iterating through all edges will be continuously reported.
 edge_filter – Class implementing

filter_enrichment
(enrichment_ll_cutoff=1.0, enrichment_h_cutoff=1.0, enrichment_v_cutoff=1.0, enrichment_d_cutoff=1.0, queue=False)¶ Convenience function that applies a
ObservedExpectedRatioPeakFilter
. The actual algorithm and rationale used for filtering will depend on the internal _mapper attribute.Parameters: queue – If True, filter will be queued and can be executed along with other queued filters using run_queued_filters

filter_fdr
(fdr_cutoff, queue=False)¶ Convenience function that applies a
FdrPeakFilter
. The actual algorithm and rationale used for filtering will depend on the internal _mapper attribute.Parameters:  fdr_cutoff – The falsediscovery rate of every neighborhood enrichment must be lower or equal to this threshold
 queue – If True, filter will be queued and can be executed along with other queued filters using run_queued_filters

filter_mappability
(cutoff, queue=False)¶ Convenience function that applies a
MappabilityPeakFilter
. The actual algorithm and rationale used for filtering will depend on the internal _mapper attribute.Parameters:  cutoff – Minimum mappability (fraction of 1)
 queue – If True, filter will be queued and can be executed along with other queued filters using run_queued_filters

filter_observed
(cutoff, queue=False)¶ Convenience function that applies a
ObservedPeakFilter
. The actual algorithm and rationale used for filtering will depend on the internal _mapper attribute.Parameters:  cutoff – Minimum observed value
 queue – If True, filter will be queued and can be executed along with other queued filters using run_queued_filters

filter_rao
(queue=False)¶ Convenience function that applies all filters Rao et al. (2014) do.
Parameters: queue – If True, filter will be queued and can be executed along with other queued filters using run_queued_filters

find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
Parameters: query_regions – Region selector string, :class:~GenomicRegion, or list of the former Returns: index (or list of indexes) of the region at the center of the query region

flush
(silent=False, update_mappability=True)¶ Write data to file and flush buffers.
Parameters:  silent – do not print flush progress
 update_mappability – After writing data, update mappability and expected values

get_mask
(key)¶ Search _mask table for key and return Mask.
Parameters:  key (int) – search by mask name
 key – search by mask ID
Returns: Mask

get_masks
(ix)¶ Extract mask IDs encoded in parameter and return masks.
IDs are powers of 2, so a single int field in the table can hold multiple masks by simply adding up the IDs. Similar principle to UNIX chmod (although that uses base 8)
Parameters: ix (int) – integer that is the sum of powers of 2. Note that this value is not necessarily itself a power of 2. Returns: list of Masks extracted from ix Return type: list (Mask)

intervals
(*args, **kwargs)¶ Alias for region_intervals.

mappable
(region=None)¶ Get the mappability of regions in this object.
A “mappable” region has at least one contact to another region in the genome.
Returns: array
where True means mappable and False unmappable

marginals
(masked=True, *args, **kwargs)¶ Get the marginals vector of this Hic matrix.
Sums up all contacts for each bin of the HiC matrix. Unmappable regoins will be masked in the returned vector unless the
masked
parameter is set toFalse
.By default, corrected matrix entries are summed up. To get uncorrected matrix marginals use
norm=False
. Generally, all parameters accepted byedges()
are supported.Parameters:  masked – Use a numpy masked array to mask entries corresponding to unmappable regions
 kwargs – Keyword arguments passed to
edges()

matrix
(key=None, log=False, default_value=None, mask=True, log_base=2, *args, **kwargs)¶ Assemble a
RegionMatrix
from region pairs.Parameters:  key – Matrix selector. See
edges()
for all supported key types  log – If True, logtransform the matrix entries. Also see log_base
 log_base – Base of the log transformation. Default: 2; only used when log=True
 default_value – (optional) set the default value of matrix entries that have no associated edge/contact
 mask – If False, do not mask unmappable regions
 args – Positional arguments passed to
regions_and_matrix_entries()
 kwargs – Keyword arguments passed to
regions_and_matrix_entries()
Returns:  key – Matrix selector. See

classmethod
merge
(matrices, *args, **kwargs)¶ Merge multiple
RegionMatrixContainer
objects.Merging is done by adding the weight of edges in each object.
Parameters: matrices – list of RegionMatrixContainer
Returns: merged RegionMatrixContainer

merged_peaks
(file_name=None, euclidian_distance=20000)¶ Merge spatially proximal peaks.
Parameters:  file_name – Optional file to save merged peak info to.
 euclidian_distance – Maximal distance in base pairs to still consider two peaks to be the same
Returns:

possible_contacts
()¶ Calculate the possible number of contacts in the genome.
This calculates the number of potential region pairs in a genome for any possible separation distance, taking into account the existence of unmappable regions.
It will calculate one number for interchromosomal pairs, return a list with the number of possible pairs where the list index corresponds to the number of bins separating two regions, and a dictionary of lists for each chromosome.
Returns: possible intrachromosomal pairs, possible intrachromosomal pairs by chromosome, possible interchromosomal pairs

region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
Parameters: args – provide a GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.Returns:

region_data
(key, value=None)¶ Retrieve or add vectordata to this object. If there is existing data in this object with the same name, it will be replaced
Parameters:  key – Name of the data column
 value – vector with regionbased data (one entry per region)

region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equallysized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
Parameters:  region – String or class:~GenomicRegion object denoting the region to be binned
 bins – Number of bins to divide the region into
 bin_size – Size of each bin (alternative to bins argument)
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
 args – Arguments passed to _region_intervals
 kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of tuples: (start, end, score)

region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
Parameters: region – String or class:~GenomicRegion object for which covered bins will be returned.

regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.Returns: RegionIter

regions_and_edges
(key, *args, **kwargs)¶ Convenient access to regions and edges selected by key.
Parameters:  key – Edge selector, see
edges()
 args – Positional arguments passed to
edges()
 kwargs – Keyword arguments passed to
edges()
Returns: list of row regions, list of col regions, iterator over edges
 key – Edge selector, see

regions_and_matrix_entries
(key=None, score_field=None, *args, **kwargs)¶ Convenient access to nonzero matrix entries and associated regions.
Parameters:  key – Edge key, see
edges()
 oe – If True, will divide observed values by their expected value at the given distance. False by default
 oe_per_chromosome – If True (default), will do a perchromosome O/E calculation rather than using the whole matrix to obtain expected values
 score_field – (optional) any edge attribute that returns a number
can be specified here for filling the matrix. Usually
this is defined by the
_default_score_field
attribute of the matrix class.  args – Positional arguments passed to
edges()
 kwargs – Keyword arguments passed to
edges()
Returns: list of row regions, list of col regions, iterator over (i, j, weight) tuples
 key – Edge key, see

regions_dict
¶ Return a dictionary with region index as keys and regions as values.
Returns: dict {region.ix: region, …}

static
regions_identical
(pairs)¶ Check if the regions in all objects in the list are identical.
Parameters: pairs – list
ofRegionBased
objectsReturns: True if chromosome, start, and end are identical between all regions in the same list positions.

run_queued_filters
(log_progress=True)¶ Run queued filters.
Parameters: log_progress – If true, process iterating through all edges will be continuously reported.

scaling_factor
(matrix, weight_column=None)¶ Compute the scaling factor to another matrix.
Calculates the ratio between the number of contacts in this Hic object to the number of contacts in another Hic object.
Parameters:  matrix – A
Hic
object  weight_column – Name of the column to calculate the scaling factor on
Returns: float
 matrix – A

subset
(*regions, **kwargs)¶ Subset a Hic object by specifying one or more subset regions.
Parameters:  regions – string or GenomicRegion object(s)
 kwargs – Supports
file_name: destination file name of subset Hic object;
tmpdir: if True works in tmp until object is closed
additional parameters are passed to
edges()
Returns: Hic

to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_bed()

to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_bigwig()

to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_gff()

class

fanc.peaks.
overlap_peaks
(peaks, max_distance=6000)¶ Calculate overlap between different peak calls.
Useful for comparing peak calls across different samples or conditions.
Parameters:  peaks (dict) – Peaks to overlap. Dictionary of
fanc.data.network.PeakInfo
, keys are dataset names.  max_distance (int) – Maximum distance between peaks for overlap
Returns: DataFrame of overlap statistics and dictionary containing overlapping peaks. Keys are sets of dataset names.
Return type: (pandas.DataFrame, fanc.data.network.PeakInfo)
 peaks (dict) – Peaks to overlap. Dictionary of