Compartments module

class fanc.architecture.compartments.ABCompartmentMatrix(file_name=None, mode='r', tmpdir=None)

Bases: fanc.matrix.RegionMatrixTable

Class representing O/E correlation matrix used to derive AB compartments.

You can generate an ABCompartmentMatrix from a Hic object using the from_hic() class method.

hic = fanc.load("path/to/file.hic")
ab = ABCompartmentMatrix.from_hic(hic)

The ab object can then be used to calculate compartmentalisation eigenvectors and A/B compartment assignments:

ev = ab.eigenvector()
domains = ab.domains()

For more robust A and B calls, you can use a genome file (FASTA) to orient the eigenvector so that regions with higher GC content on average get assigned positive EV values:

ev = ab.eigenvector(genome="path/to/genome.fa")
domains = ab.domains(genome="path/to/genome.fa")

Finally, you can calculate an AB compertment enrichment profile using

profile, ev_cutoffs = ab.enrichment_profile(hic)

# or with genome to orient the EV
profile, ev_cutoffs = ab.enrichment_profile(hic, genome="path/to/genome.fa")
class ChromosomeDescription

Bases: tables.description.IsDescription

Description of the chromosomes in this object.

class MaskDescription

Bases: tables.description.IsDescription

class RegionDescription

Bases: tables.description.IsDescription

Description of a genomic region for PyTables Table

add_contact(contact, *args, **kwargs)

Alias for add_edge()

Parameters:
  • contactEdge
  • args – Positional arguments passed to _add_edge()
  • kwargs – Keyword arguments passed to _add_edge()
add_contacts(contacts, *args, **kwargs)

Alias for add_edges()

add_edge(edge, check_nodes_exist=True, *args, **kwargs)

Add an edge / contact between two regions to this object.

Parameters:
  • edgeEdge, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).
  • check_nodes_exist – Make sure that there are nodes that match source and sink indexes
  • args – Positional arguments passed to _add_edge()
  • kwargs – Keyword arguments passed to _add_edge()
add_edge_from_dict(edge, *args, **kwargs)

Direct method to add an edge from dict input.

Parameters:edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes
add_edge_from_edge(edge, *args, **kwargs)

Direct method to add an edge from Edge input.

Parameters:edgeEdge
add_edge_from_list(edge, *args, **kwargs)

Direct method to add an edge from list or tuple input.

Parameters:edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)
add_edge_simple(source, sink, weight=None, *args, **kwargs)

Direct method to add an edge from Edge input.

Parameters:
  • source – Source region index
  • sink – Sink region index
  • weight – Weight of the edge
add_edges(edges, flush=True, *args, **kwargs)

Bulk-add edges from a list.

List items can be any of the supported edge types, list, tuple, dict, or Edge. Repeatedly calls add_edge(), so may be inefficient for large amounts of data.

Parameters:edges – List (or iterator) of edges. See add_edge() for details
add_mask_description(name, description)

Add a mask description to the _mask table and return its ID.

Parameters:
  • name (str) – name of the mask
  • description (str) – description of the mask
Returns:

id of the mask

Return type:

int

add_region(region, *args, **kwargs)

Add a genomic region to this object.

This method offers some flexibility in the types of objects that can be loaded. See parameters for details.

Parameters:region – Can be a GenomicRegion, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
add_regions(regions, *args, **kwargs)

Bulk insert multiple genomic regions.

Parameters:regions – List (or any iterator) with objects that describe a genomic region. See add_region for options.
static bin_intervals(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into a fixed number of bins.

Parameters:
  • intervals – iterator of tuples (start, end, score)
  • bins – Number of bins to divide the region into
  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
Returns:

iterator of tuples: (start, end, score)

static bin_intervals_equidistant(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into bins with a fixed size.

Parameters:
  • intervals – iterator of tuples (start, end, score)
  • bin_size – Size of each bin in base pairs
  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
Returns:

iterator of tuples: (start, end, score)

bin_size

Return the length of the first region in the dataset.

Assumes all bins have equal size.

Returns:int
binned_regions(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)

Same as region_intervals, but returns GenomicRegion objects instead of tuples.

Parameters:
  • region – String or class:~GenomicRegion object denoting the region to be binned
  • bins – Number of bins to divide the region into
  • bin_size – Size of each bin (alternative to bins argument)
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
  • args – Arguments passed to _region_intervals
  • kwargs – Keyword arguments passed to _region_intervals
Returns:

iterator of GenomicRegion objects

bins_to_distance(bins)

Convert fraction of bins to base pairs

Parameters:bins – float, fraction of bins
Returns:int, base pairs
chromosome_bins

Returns a dictionary of chromosomes and the start and end index of the bins they cover.

Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

chromosome_lengths

Returns a dictionary of chromosomes and their length in bp.

chromosomes()

List all chromosomes in this regions table. :return: list of chromosome names.

close(copy_tmp=True, remove_tmp=True)

Close this HDF5 file and run exit operations.

If file was opened with tmpdir in read-only mode: close file and delete temporary copy.

If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.

Parameters:
  • copy_tmp – If False, does not overwrite original with modified file.
  • remove_tmp – If False, does not delete temporary copy of file.
distance_to_bins(distance)

Convert base pairs to fraction of bins.

Parameters:distance – distance in base pairs
Returns:float, distance as fraction of bin size
domains(*args, **kwargs)

Get the AB domain regions of the compartment matrix.

This returns a RegionWrapper object, where you can iterate over the domains using

for region in domains.regions:
    print(region.name)  # A or B
Parameters:
Returns:

A RegionWrapper object

downsample(n, file_name=None)

Sample edges from this object.

Sampling is always done on uncorrected Hi-C matrices.

Parameters:
  • n – Sample size or reference object. If n < 1 will be interpreted as a fraction of total reads in this object.
  • file_name – Output file name for down-sampled object.
Returns:

RegionPairsTable

edge_data(attribute, *args, **kwargs)

Iterate over specific edge attribute.

Parameters:
  • attribute – Name of the attribute, e.g. “weight”
  • args – Positional arguments passed to edges()
  • kwargs – Keyword arguments passed to edges()
Returns:

iterator over edge attribute

edge_subset(key=None, *args, **kwargs)

Get a subset of edges.

This is an alias for edges().

Returns:generator (Edge)
edges

Iterate over contacts / edges.

edges() is the central function of RegionPairsContainer. Here, we will use the Hic implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementing RegionPairsContainer, including JuicerHic and CoolerHic.

import fanc

# file from FAN-C examples
hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")

We can easily find the number of edges in the sample Hic object:

len(hic.edges)  # 8695

When used in an iterator context, edges() iterates over all edges in the RegionPairsContainer:

for edge in hic.edges:
    # do something with edge
    print(edge)
    # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173
    # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057
    # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876
    # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468
    # ...

Calling edges() as a method has the same effect:

# note the '()'
for edge in hic.edges():
    # do something with edge
    print(edge)
    # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173
    # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057
    # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876
    # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468
    # ...

Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a GenomicRegion, all non-zero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings or GenomicRegion, only edges between the two regions are returned.

# select all edges between chromosome 19
# and any other region:
for edge in hic.edges("chr19"):
    print(edge)
    # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033
    # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978
    # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917
    # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048
    # ...

# select all edges that are only on
# chromosome 19
for edge in hic.edges(('chr19', 'chr19')):
    print(edge)
    # 90--116; bias: 0.00021173151730025176; sink_node: chr19:37000001-38000000; source_node: chr19:11000001-12000000; weight: 0.009104455243910825
    # 135--135; bias: 0.00018003890596887822; sink_node: chr19:56000001-57000000; source_node: chr19:56000001-57000000; weight: 0.10028167062466517
    # 123--123; bias: 0.00011063368998965993; sink_node: chr19:44000001-45000000; source_node: chr19:44000001-45000000; weight: 0.1386240135570439
    # 92--93; bias: 0.00040851066434864896; sink_node: chr19:14000001-15000000; source_node: chr19:13000001-14000000; weight: 0.10090213409411629
    # ...

# select inter-chromosomal edges
# between chromosomes 18 and 19
for edge in hic.edges(('chr18', 'chr19')):
    print(edge)
    # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033
    # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978
    # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917
    # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048
    # ...

By default, edges() will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all file-based FAN-C RegionPairsContainer objects support lazy loading, where attributes are only read on demand.

for edge in hic.edges('chr18', lazy=True):
    print(edge.source, edge.sink, edge.weight, edge)
    # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0>
    # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1>
    # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2>
    # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3>
    # ...

Warning

The lazy iterator reuses the LazyEdge object in every iteration, and overwrites the LazyEdge attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expected list(hic.edges()), with all Edge objects stored in the list, while this code list(hic.edges(lazy=True)) will result in a list of identical LazyEdge objects. Always ensure you do all edge processing in the loop when working with lazy iterators!

When working with normalised contact frequencies, such as obtained through matrix balancing in the example above, edges() automatically returns normalised edge weights. In addition, the bias attribute will (typically) have a value different from 1.

When you are interested in the raw contact frequency, use the norm=False parameter:

for edge in hic.edges('chr18', lazy=True, norm=False):
    print(edge.source, edge.sink, edge.weight)
    # 42 42 2120.0
    # 24 28 388.0
    # 5 76 94.0
    # 66 68 470.0
    # ...

You can also choose to omit all intra- or inter-chromosomal edges using intra_chromosomal=False or inter_chromosomal=False, respectively.

Returns:Iterator over Edge or equivalent.
edges_dict(*args, **kwargs)

Edges iterator with access by bracket notation.

This iterator always returns unnormalised edges.

Returns:dict or dict-like iterator
eigenvector(sub_region=None, genome=None, eigenvector=0, per_chromosome=None, oe_per_chromosome=None, exclude_chromosomes=None, force=False)

Calculate the eigenvector (EV) of this AB matrix.

Parameters:
  • sub_region – Optional region string to only output the EV of that region.
  • genome – A Genome object or path to a FASTA file. Used to orient EV value signs so that the “A” compartment corresponds to the regions with higher GC content. It is recommended to make use of this, as otherwise the sign of the EV is arbitrary and will not allow for between-sample comparisons.
  • eigenvector – Index of the eigenvector to calculate. This parameter is 0-based! Always try “0” first, and if that EV does not seem to reflect A/B compartments, try increasing that value.
  • per_chromosome – Calculate the eigenvector on a per-chromosome basis (True by default). If your matrix is whole-genome normalised and you know what you are doing, set this to False to calculate the EV on the whole matrix.
  • oe_per_chromosome – Use the expected value vector matching each chromosome. Do not modify this unless you know what you are doing.
  • exclude_chromosomes – List of chromosome names to exclude from the EV calculation. Can sometimes be useful if certain chromosomes do not produce reasonable compartment profiles.
  • force – Force EV recalculation, even if the EV has already been previously calculated with the same parameters and is stored in the object.
Returns:

array of eigenvector values

enrichment_profile(hic, percentiles=(20.0, 40.0, 60.0, 80.0, 100.0), only_gc=False, symmetric_at=None, exclude_chromosomes=(), intra_chromosomal=True, inter_chromosomal=False, eigenvector=None, collapse_identical_breakpoints=False, *args, **kwargs)

Generate a compartment enrichment profile for the compartment matrix.

This returns a ndarray with the enrichment profile matrix, and a list of cutoffs used to bin regions according to the eigenvector (EV) values. These cutoffs are determined by the percentiles argument.

The returned objects can be used to generate a saddle plot, for example using saddle_plot()

Parameters:
  • hic – A Hi-C matrix
  • percentiles – The percentiles at which to split the EV, and bin genomic regions accordingly into ranges of EV values.
  • only_gc – If True, use only the region’s GC content, and not the EV, to calculate the enrichment profile.
  • symmetric_at – If set to a float, splits the genomic regions into two groups with EV below and above this value. Percentiles are then calculated on each group separately, and it is ensured that the symmetric_at breakpoint is in the centre of the enrichment profile. Note that this doubles the number of bins, and that the number of regions to the left and right of the breakpoint are likely not the same.
  • exclude_chromosomes – List of chromosome names to exclude from the profile calculation.
  • intra_chromosomal – If True (default), include intra-chromosomal contacts in the calculation
  • inter_chromosomal – If True, include inter-chromosomal contacts in the calculation. This is disabled by defaults, due to the way matrices are typically normalised (per-chromosome)
  • eigenvector – Optional. A custom eigenvector of the same length as genomic regions in the Hi-C matrix. This will skip the eigenvector calculation and just use the values in this vector instead. In principle, you could even use this to supply a completely different type of data, such as expression values, for the enrichment analysis.
  • collapse_identical_breakpoints – (experimental) If True, will merge all breakpoints with the same values (such as multiple bins with EV=0) into one. This can make the saddle plot look cleaner.
  • args – Positional arguments for eigenvector()
  • kwargs – Keyword arguments for eigenvector()
Returns:

a ndarray with the enrichment profile matrix, a list of cutoffs

expected_values(selected_chromosome=None, norm=True, *args, **kwargs)

Calculate the expected values for genomic contacts at all distances.

This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.

It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.

Parameters:
  • selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
  • norm – If False, will calculate the expected values on the unnormalised matrix.
  • args – Not used in this context
  • kwargs – Not used in this context
Returns:

list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value

expected_values_and_marginals(selected_chromosome=None, norm=True, force=False, *args, **kwargs)

Calculate the expected values for genomic contacts at all distances and the whole matrix marginals.

This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.

It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.

Parameters:
  • selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
  • norm – If False, will calculate the expected values on the unnormalised matrix.
  • args – Not used in this context
  • kwargs – Not used in this context
Returns:

list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value

filter(edge_filter, queue=False, log_progress=True)

Filter edges in this object by using a MaskFilter.

Parameters:
  • edge_filter – Class implementing MaskFilter.
  • queue – If True, filter will be queued and can be executed along with other queued filters using run_queued_filters()
  • log_progress – If true, process iterating through all edges will be continuously reported.
find_region(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)

Find the region that is at the center of a region.

Parameters:query_regions – Region selector string, :class:~GenomicRegion, or list of the former
Returns:index (or list of indexes) of the region at the center of the query region
flush(silent=False, update_mappability=True)

Write data to file and flush buffers.

Parameters:
  • silent – do not print flush progress
  • update_mappability – After writing data, update mappability and expected values
classmethod from_hic(hic, file_name=None, tmpdir=None, per_chromosome=True, oe_per_chromosome=None, exclude_chromosomes=None)

Generate an AB compartment matrix from a Hi-C object.

Parameters:
  • hic – Hi-C object (FAN-C, Juicer, Cooler)
  • file_name – Path to output file. If not specified, creates file in memory.
  • tmpdir – Optional. Work in temporary directory until file is closed.
  • per_chromosome – If True (default) calculate compartment profile on a per-chromosome basis (recommended). Otherwise calculates profile on the whole matrix - make sure your normalisation is suitable for this (i.e. whole matrix!)
  • oe_per_chromosome – Use the expected value vector matching each chromosome. Do not modify this unless you know what you are doing.+
  • exclude_chromosomes – Exclude these chromosomes from compartment calculations.
Returns:

ABCompartmentMatrix object

get_mask(key)

Search _mask table for key and return Mask.

Parameters:
  • key (int) – search by mask name
  • key – search by mask ID
Returns:

Mask

get_masks(ix)

Extract mask IDs encoded in parameter and return masks.

IDs are powers of 2, so a single int field in the table can hold multiple masks by simply adding up the IDs. Similar principle to UNIX chmod (although that uses base 8)

Parameters:ix (int) – integer that is the sum of powers of 2. Note that this value is not necessarily itself a power of 2.
Returns:list of Masks extracted from ix
Return type:list (Mask)
intervals(*args, **kwargs)

Alias for region_intervals.

mappable(region=None)

Get the mappability of regions in this object.

A “mappable” region has at least one contact to another region in the genome.

Returns:array where True means mappable and False unmappable
marginals(masked=True, *args, **kwargs)

Get the marginals vector of this Hic matrix.

Sums up all contacts for each bin of the Hi-C matrix. Unmappable regoins will be masked in the returned vector unless the masked parameter is set to False.

By default, corrected matrix entries are summed up. To get uncorrected matrix marginals use norm=False. Generally, all parameters accepted by edges() are supported.

Parameters:
  • masked – Use a numpy masked array to mask entries corresponding to unmappable regions
  • kwargs – Keyword arguments passed to edges()
matrix(key=None, log=False, default_value=None, mask=True, log_base=2, *args, **kwargs)

Assemble a RegionMatrix from region pairs.

Parameters:
  • key – Matrix selector. See edges() for all supported key types
  • log – If True, log-transform the matrix entries. Also see log_base
  • log_base – Base of the log transformation. Default: 2; only used when log=True
  • default_value – (optional) set the default value of matrix entries that have no associated edge/contact
  • mask – If False, do not mask unmappable regions
  • args – Positional arguments passed to regions_and_matrix_entries()
  • kwargs – Keyword arguments passed to regions_and_matrix_entries()
Returns:

RegionMatrix

classmethod merge(matrices, *args, **kwargs)

Merge multiple RegionMatrixContainer objects.

Merging is done by adding the weight of edges in each object.

Parameters:matrices – list of RegionMatrixContainer
Returns:merged RegionMatrixContainer
possible_contacts()

Calculate the possible number of contacts in the genome.

This calculates the number of potential region pairs in a genome for any possible separation distance, taking into account the existence of unmappable regions.

It will calculate one number for inter-chromosomal pairs, return a list with the number of possible pairs where the list index corresponds to the number of bins separating two regions, and a dictionary of lists for each chromosome.

Returns:possible intra-chromosomal pairs, possible intra-chromosomal pairs by chromosome, possible inter-chromosomal pairs
region_bins(*args, **kwargs)

Return slice of start and end indices spanned by a region.

Parameters:args – provide a GenomicRegion here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.
Returns:
region_data(key, value=None)

Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced

Parameters:
  • key – Name of the data column
  • value – vector with region-based data (one entry per region)
region_intervals(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)

Return equally-sized genomic intervals and associated scores.

Use either bins or bin_size argument to control binning.

Parameters:
  • region – String or class:~GenomicRegion object denoting the region to be binned
  • bins – Number of bins to divide the region into
  • bin_size – Size of each bin (alternative to bins argument)
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
  • args – Arguments passed to _region_intervals
  • kwargs – Keyword arguments passed to _region_intervals
Returns:

iterator of tuples: (start, end, score)

region_subset(region, *args, **kwargs)

Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.

Parameters:region – String or class:~GenomicRegion object for which covered bins will be returned.
regions

Iterate over genomic regions in this object.

Will return a GenomicRegion object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.

Returns:RegionIter
regions_and_edges(key, *args, **kwargs)

Convenient access to regions and edges selected by key.

Parameters:
  • key – Edge selector, see edges()
  • args – Positional arguments passed to edges()
  • kwargs – Keyword arguments passed to edges()
Returns:

list of row regions, list of col regions, iterator over edges

regions_and_matrix_entries(key=None, score_field=None, *args, **kwargs)

Convenient access to non-zero matrix entries and associated regions.

Parameters:
  • key – Edge key, see edges()
  • oe – If True, will divide observed values by their expected value at the given distance. False by default
  • oe_per_chromosome – If True (default), will do a per-chromosome O/E calculation rather than using the whole matrix to obtain expected values
  • score_field – (optional) any edge attribute that returns a number can be specified here for filling the matrix. Usually this is defined by the _default_score_field attribute of the matrix class.
  • args – Positional arguments passed to edges()
  • kwargs – Keyword arguments passed to edges()
Returns:

list of row regions, list of col regions, iterator over (i, j, weight) tuples

regions_dict

Return a dictionary with region index as keys and regions as values.

Returns:dict {region.ix: region, …}
static regions_identical(pairs)

Check if the regions in all objects in the list are identical.

Parameters:pairslist of RegionBased objects
Returns:True if chromosome, start, and end are identical between all regions in the same list positions.
run_queued_filters(log_progress=True)

Run queued filters.

Parameters:log_progress – If true, process iterating through all edges will be continuously reported.
scaling_factor(matrix, weight_column=None)

Compute the scaling factor to another matrix.

Calculates the ratio between the number of contacts in this Hic object to the number of contacts in another Hic object.

Parameters:
  • matrix – A Hic object
  • weight_column – Name of the column to calculate the scaling factor on
Returns:

float

subset(*regions, **kwargs)

Subset a Hic object by specifying one or more subset regions.

Parameters:
  • regions – string or GenomicRegion object(s)
  • kwargs – Supports file_name: destination file name of subset Hic object; tmpdir: if True works in tmp until object is closed additional parameters are passed to edges()
Returns:

Hic

to_bed(file_name, subset=None, **kwargs)

Export regions as BED file

Parameters:
  • file_name – Path of file to write regions to
  • subset – optional GenomicRegion or str to write only regions overlapping this region
  • kwargs – Passed to write_bed()
to_bigwig(file_name, subset=None, **kwargs)

Export regions as BigWig file.

Parameters:
  • file_name – Path of file to write regions to
  • subset – optional GenomicRegion or str to write only regions overlapping this region
  • kwargs – Passed to write_bigwig()
to_gff(file_name, subset=None, **kwargs)

Export regions as GFF file

Parameters:
  • file_name – Path of file to write regions to
  • subset – optional GenomicRegion or str to write only regions overlapping this region
  • kwargs – Passed to write_gff()