Hic module¶
-
class
fanc.hic.DiagonalFilter(hic, distance=0, mask=None)¶ Bases:
fanc.hic.HicEdgeFilterFilter contacts in the diagonal of a
Hicmatrix.-
set_hic_object(hic_object)¶ Set the
Hicinstance to be filtered by this HicEdgeFilter.Used internally by
Hicinstance.Parameters: hic_object – Hicobject
-
valid(row)¶ Map valid_edge to MaskFilter.valid(self, row).
Parameters: row – A pytables Table row. Returns: The boolean value returned by valid_edge.
-
-
class
fanc.hic.Hic(file_name=None, mode='a', tmpdir=None, partition_strategy='auto', additional_region_fields=None, additional_edge_fields=None, _table_name_regions='regions', _table_name_edges='edges', _edge_buffer_size='3G')¶ Bases:
fanc.matrix.RegionMatrixTableCentral class for working with Hi-C data.
This class adds functions for matrix binning and filtering to the base class
RegionMatrixTable.-
class
ChromosomeDescription¶ Bases:
tables.description.IsDescriptionDescription of the chromosomes in this object.
-
class
MaskDescription¶ Bases:
tables.description.IsDescription
-
class
RegionDescription¶ Bases:
tables.description.IsDescriptionDescription of a genomic region for PyTables Table
-
add_contact(contact, *args, **kwargs)¶ Alias for
add_edge()Parameters: - contact –
Edge - args – Positional arguments passed to
_add_edge() - kwargs – Keyword arguments passed to
_add_edge()
- contact –
-
add_contacts(contacts, *args, **kwargs)¶ Alias for
add_edges()
-
add_edge(edge, check_nodes_exist=True, *args, **kwargs)¶ Add an edge / contact between two regions to this object.
Parameters: - edge –
Edge, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight). - check_nodes_exist – Make sure that there are nodes that match source and sink indexes
- args – Positional arguments passed to
_add_edge() - kwargs – Keyword arguments passed to
_add_edge()
- edge –
-
add_edge_from_dict(edge, *args, **kwargs)¶ Direct method to add an edge from dict input.
Parameters: edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes
-
add_edge_from_edge(edge, *args, **kwargs)¶ Direct method to add an edge from
Edgeinput.Parameters: edge – Edge
-
add_edge_from_list(edge, *args, **kwargs)¶ Direct method to add an edge from list or tuple input.
Parameters: edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)
-
add_edge_simple(source, sink, weight=None, *args, **kwargs)¶ Direct method to add an edge from
Edgeinput.Parameters: - source – Source region index
- sink – Sink region index
- weight – Weight of the edge
-
add_edges(edges, flush=True, *args, **kwargs)¶ Bulk-add edges from a list.
List items can be any of the supported edge types, list, tuple, dict, or
Edge. Repeatedly callsadd_edge(), so may be inefficient for large amounts of data.Parameters: edges – List (or iterator) of edges. See add_edge()for details
-
add_mask_description(name, description)¶ Add a mask description to the _mask table and return its ID.
Parameters: - name (str) – name of the mask
- description (str) – description of the mask
Returns: id of the mask
Return type: int
-
add_region(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
Parameters: region – Can be a GenomicRegion, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
Parameters: regions – List (or any iterator) with objects that describe a genomic region. See add_regionfor options.
-
bias_vector(vector=None)¶ Get or set the vector of region biases in this object.
This internally sets the “bias” attribute of each region in the object.
Parameters: vector – a numpy array with bias values Returns: a numpy array with bias values
-
bin(bin_size, threads=1, chromosomes=None, *args, **kwargs)¶ Map edges in this object to equidistant bins.
Parameters: - bin_size – Bin size in base pairs
- threads – Number of threads used for binning
Returns: Hicobject
-
static
bin_intervals(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
Parameters: - intervals – iterator of tuples (start, end, score)
- bins – Number of bins to divide the region into
- interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
- smoothing_window – Size of window (in bins) to smooth scores over
- nan_replacement – NaN values in the scores will be replaced with this value
- zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
Parameters: - intervals – iterator of tuples (start, end, score)
- bin_size – Size of each bin in base pairs
- interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
- smoothing_window – Size of window (in bins) to smooth scores over
- nan_replacement – NaN values in the scores will be replaced with this value
- zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)
-
bin_size¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
Returns: int
-
binned_regions(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegionobjects instead of tuples.Parameters: - region – String or class:~GenomicRegion object denoting the region to be binned
- bins – Number of bins to divide the region into
- bin_size – Size of each bin (alternative to bins argument)
- smoothing_window – Size of window (in bins) to smooth scores over
- nan_replacement – NaN values in the scores will be replaced with this value
- zero_to_nan – If True, will convert bins with score 0 to NaN
- args – Arguments passed to _region_intervals
- kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of
GenomicRegionobjects
-
bins_to_distance(bins)¶ Convert fraction of bins to base pairs
Parameters: bins – float, fraction of bins Returns: int, base pairs
-
chromosome_bins¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
chromosome_lengths¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
Parameters: - copy_tmp – If False, does not overwrite original with modified file.
- remove_tmp – If False, does not delete temporary copy of file.
-
distance_to_bins(distance)¶ Convert base pairs to fraction of bins.
Parameters: distance – distance in base pairs Returns: float, distance as fraction of bin size
-
downsample(n, file_name=None)¶ Sample edges from this object.
Sampling is always done on uncorrected Hi-C matrices.
Parameters: - n – Sample size or reference object. If n < 1 will be interpreted as a fraction of total reads in this object.
- file_name – Output file name for down-sampled object.
Returns: RegionPairsTable
-
edge_data(attribute, *args, **kwargs)¶ Iterate over specific edge attribute.
Parameters: - attribute – Name of the attribute, e.g. “weight”
- args – Positional arguments passed to
edges() - kwargs – Keyword arguments passed to
edges()
Returns: iterator over edge attribute
-
edge_subset(key=None, *args, **kwargs)¶ Get a subset of edges.
This is an alias for
edges().Returns: generator ( Edge)
-
edges¶ Iterate over contacts / edges.
edges()is the central function ofRegionPairsContainer. Here, we will use theHicimplementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementingRegionPairsContainer, includingJuicerHicandCoolerHic.import fanc # file from FAN-C examples hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")
We can easily find the number of edges in the sample
Hicobject:len(hic.edges) # 8695
When used in an iterator context,
edges()iterates over all edges in theRegionPairsContainer:for edge in hic.edges: # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Calling
edges()as a method has the same effect:# note the '()' for edge in hic.edges(): # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a
GenomicRegion, all non-zero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings orGenomicRegion, only edges between the two regions are returned.# select all edges between chromosome 19 # and any other region: for edge in hic.edges("chr19"): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ... # select all edges that are only on # chromosome 19 for edge in hic.edges(('chr19', 'chr19')): print(edge) # 90--116; bias: 0.00021173151730025176; sink_node: chr19:37000001-38000000; source_node: chr19:11000001-12000000; weight: 0.009104455243910825 # 135--135; bias: 0.00018003890596887822; sink_node: chr19:56000001-57000000; source_node: chr19:56000001-57000000; weight: 0.10028167062466517 # 123--123; bias: 0.00011063368998965993; sink_node: chr19:44000001-45000000; source_node: chr19:44000001-45000000; weight: 0.1386240135570439 # 92--93; bias: 0.00040851066434864896; sink_node: chr19:14000001-15000000; source_node: chr19:13000001-14000000; weight: 0.10090213409411629 # ... # select inter-chromosomal edges # between chromosomes 18 and 19 for edge in hic.edges(('chr18', 'chr19')): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ...
By default,
edges()will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all file-based FAN-CRegionPairsContainerobjects support lazy loading, where attributes are only read on demand.for edge in hic.edges('chr18', lazy=True): print(edge.source, edge.sink, edge.weight, edge) # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0> # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1> # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2> # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3> # ...
Warning
The lazy iterator reuses the
LazyEdgeobject in every iteration, and overwrites theLazyEdgeattributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expectedlist(hic.edges()), with allEdgeobjects stored in the list, while this codelist(hic.edges(lazy=True))will result in a list of identicalLazyEdgeobjects. Always ensure you do all edge processing in the loop when working with lazy iterators!When working with normalised contact frequencies, such as obtained through matrix balancing in the example above,
edges()automatically returns normalised edge weights. In addition, thebiasattribute will (typically) have a value different from 1.When you are interested in the raw contact frequency, use the
norm=Falseparameter:for edge in hic.edges('chr18', lazy=True, norm=False): print(edge.source, edge.sink, edge.weight) # 42 42 2120.0 # 24 28 388.0 # 5 76 94.0 # 66 68 470.0 # ...
You can also choose to omit all intra- or inter-chromosomal edges using
intra_chromosomal=Falseorinter_chromosomal=False, respectively.Returns: Iterator over Edgeor equivalent.
-
edges_dict(*args, **kwargs)¶ Edges iterator with access by bracket notation.
This iterator always returns unnormalised edges.
Returns: dict or dict-like iterator
-
expected_values(selected_chromosome=None, norm=True, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.
Parameters: - selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
- norm – If False, will calculate the expected values on the unnormalised matrix.
- args – Not used in this context
- kwargs – Not used in this context
Returns: list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value
-
expected_values_and_marginals(selected_chromosome=None, norm=True, force=False, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances and the whole matrix marginals.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.
Parameters: - selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
- norm – If False, will calculate the expected values on the unnormalised matrix.
- args – Not used in this context
- kwargs – Not used in this context
Returns: list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value
-
filter(edge_filter, queue=False, log_progress=True)¶ Filter edges in this object by using a
MaskFilter.Parameters: - edge_filter – Class implementing
MaskFilter. - queue – If True, filter will be queued and can be executed
along with other queued filters using
run_queued_filters() - log_progress – If true, process iterating through all edges will be continuously reported.
- edge_filter – Class implementing
-
filter_diagonal(distance=0, queue=False)¶ Convenience function that applies a
DiagonalFilter.Parameters: - distance – Distance from the diagonal up to which matrix entries will be filtered/removed. The default, 0, filters only the diagonal itself.
- queue – If True, filter will be queued and can be executed along with other queued filters using run_queued_filters
-
filter_low_coverage_regions(rel_cutoff=None, cutoff=None, queue=False)¶ Convenience function that applies a
LowCoverageFilter.The cutoff can be provided in two ways: 1. As an absolute threshold. Regions with contact count below this absolute threshold are filtered 2. As a fraction relative to the median contact count of all regions.
If both is supplied, whichever threshold is lower will be selected.
If no parameter is supplied, rel_cutoff will be chosen as 0.1.
Parameters: - rel_cutoff – A cutoff as a fraction (0-1) of the median contact count of all regions.
- cutoff – A cutoff in absolute contact counts (can be float) below which regions are considered “low coverage”
- queue – If True, filter will be queued and can be executed along with other queued filters using run_queued_filters
-
find_region(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
Parameters: query_regions – Region selector string, :class:~GenomicRegion, or list of the former Returns: index (or list of indexes) of the region at the center of the query region
-
flush(silent=False, update_mappability=True)¶ Write data to file and flush buffers.
Parameters: - silent – do not print flush progress
- update_mappability – After writing data, update mappability and expected values
-
get_mask(key)¶ Search _mask table for key and return Mask.
Parameters: - key (int) – search by mask name
- key – search by mask ID
Returns: Mask
-
get_masks(ix)¶ Extract mask IDs encoded in parameter and return masks.
IDs are powers of 2, so a single int field in the table can hold multiple masks by simply adding up the IDs. Similar principle to UNIX chmod (although that uses base 8)
Parameters: ix (int) – integer that is the sum of powers of 2. Note that this value is not necessarily itself a power of 2. Returns: list of Masks extracted from ix Return type: list (Mask)
-
intervals(*args, **kwargs)¶ Alias for region_intervals.
-
load_from_hic(hic, threads=1, chromosomes=None, _edges_by_overlap_method=<function _edge_overlap_split_rao>, _regions_soft_max=50000)¶ Load data from another
Hicobject.If this object has no associated regions, the regions and contacts of the provided object will simply be copied.
If regions are already present, the contacts of the provided matrix will be binned into the regions of this object using the overlap method provided.
Parameters: - hic – Another
Hicobject - threads – Number of parallel processing threads. More threads also means higher memory usage.
- _edges_by_overlap_method – A function that maps reads from
one genomic region to others using
a supplied overlap map. By default
it uses the Rao et al. (2014) method.
See
_edge_overlap_split_rao() - _regions_soft_max – Maximum dimension of each processed submatrix per thread. This is a soft maximum, which may be increased as required for very large chromosomes or small bin sizes
- hic – Another
-
mappable(region=None)¶ Get the mappability of regions in this object.
A “mappable” region has at least one contact to another region in the genome.
Returns: arraywhere True means mappable and False unmappable
-
marginals(masked=True, *args, **kwargs)¶ Get the marginals vector of this Hic matrix.
Sums up all contacts for each bin of the Hi-C matrix. Unmappable regoins will be masked in the returned vector unless the
maskedparameter is set toFalse.By default, corrected matrix entries are summed up. To get uncorrected matrix marginals use
norm=False. Generally, all parameters accepted byedges()are supported.Parameters: - masked – Use a numpy masked array to mask entries corresponding to unmappable regions
- kwargs – Keyword arguments passed to
edges()
-
matrix(key=None, log=False, default_value=None, mask=True, log_base=2, *args, **kwargs)¶ Assemble a
RegionMatrixfrom region pairs.Parameters: - key – Matrix selector. See
edges()for all supported key types - log – If True, log-transform the matrix entries. Also see log_base
- log_base – Base of the log transformation. Default: 2; only used when log=True
- default_value – (optional) set the default value of matrix entries that have no associated edge/contact
- mask – If False, do not mask unmappable regions
- args – Positional arguments passed to
regions_and_matrix_entries() - kwargs – Keyword arguments passed to
regions_and_matrix_entries()
Returns: - key – Matrix selector. See
-
classmethod
merge(matrices, *args, **kwargs)¶ Merge multiple
RegionMatrixContainerobjects.Merging is done by adding the weight of edges in each object.
Parameters: matrices – list of RegionMatrixContainerReturns: merged RegionMatrixContainer
-
possible_contacts()¶ Calculate the possible number of contacts in the genome.
This calculates the number of potential region pairs in a genome for any possible separation distance, taking into account the existence of unmappable regions.
It will calculate one number for inter-chromosomal pairs, return a list with the number of possible pairs where the list index corresponds to the number of bins separating two regions, and a dictionary of lists for each chromosome.
Returns: possible intra-chromosomal pairs, possible intra-chromosomal pairs by chromosome, possible inter-chromosomal pairs
-
region_bins(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
Parameters: args – provide a GenomicRegionhere to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.Returns:
-
region_data(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
Parameters: - key – Name of the data column
- value – vector with region-based data (one entry per region)
-
region_intervals(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
Parameters: - region – String or class:~GenomicRegion object denoting the region to be binned
- bins – Number of bins to divide the region into
- bin_size – Size of each bin (alternative to bins argument)
- smoothing_window – Size of window (in bins) to smooth scores over
- nan_replacement – NaN values in the scores will be replaced with this value
- zero_to_nan – If True, will convert bins with score 0 to NaN
- args – Arguments passed to _region_intervals
- kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of tuples: (start, end, score)
-
region_subset(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
Parameters: region – String or class:~GenomicRegion object for which covered bins will be returned.
-
regions¶ Iterate over genomic regions in this object.
Will return a
GenomicRegionobject in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.Returns: RegionIter
-
regions_and_edges(key, *args, **kwargs)¶ Convenient access to regions and edges selected by key.
Parameters: - key – Edge selector, see
edges() - args – Positional arguments passed to
edges() - kwargs – Keyword arguments passed to
edges()
Returns: list of row regions, list of col regions, iterator over edges
- key – Edge selector, see
-
regions_and_matrix_entries(key=None, score_field=None, *args, **kwargs)¶ Convenient access to non-zero matrix entries and associated regions.
Parameters: - key – Edge key, see
edges() - oe – If True, will divide observed values by their expected value at the given distance. False by default
- oe_per_chromosome – If True (default), will do a per-chromosome O/E calculation rather than using the whole matrix to obtain expected values
- score_field – (optional) any edge attribute that returns a number
can be specified here for filling the matrix. Usually
this is defined by the
_default_score_fieldattribute of the matrix class. - args – Positional arguments passed to
edges() - kwargs – Keyword arguments passed to
edges()
Returns: list of row regions, list of col regions, iterator over (i, j, weight) tuples
- key – Edge key, see
-
regions_dict¶ Return a dictionary with region index as keys and regions as values.
Returns: dict {region.ix: region, …}
-
static
regions_identical(pairs)¶ Check if the regions in all objects in the list are identical.
Parameters: pairs – listofRegionBasedobjectsReturns: True if chromosome, start, and end are identical between all regions in the same list positions.
-
run_queued_filters(log_progress=True)¶ Run queued filters.
Parameters: log_progress – If true, process iterating through all edges will be continuously reported.
-
scaling_factor(matrix, weight_column=None)¶ Compute the scaling factor to another matrix.
Calculates the ratio between the number of contacts in this Hic object to the number of contacts in another Hic object.
Parameters: - matrix – A
Hicobject - weight_column – Name of the column to calculate the scaling factor on
Returns: float
- matrix – A
-
subset(*regions, **kwargs)¶ Subset a Hic object by specifying one or more subset regions.
Parameters: - regions – string or GenomicRegion object(s)
- kwargs – Supports
file_name: destination file name of subset Hic object;
tmpdir: if True works in tmp until object is closed
additional parameters are passed to
edges()
Returns: Hic
-
to_bed(file_name, subset=None, **kwargs)¶ Export regions as BED file
Parameters: - file_name – Path of file to write regions to
- subset – optional
GenomicRegionor str to write only regions overlapping this region - kwargs – Passed to
write_bed()
-
to_bigwig(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
Parameters: - file_name – Path of file to write regions to
- subset – optional
GenomicRegionor str to write only regions overlapping this region - kwargs – Passed to
write_bigwig()
-
to_gff(file_name, subset=None, **kwargs)¶ Export regions as GFF file
Parameters: - file_name – Path of file to write regions to
- subset – optional
GenomicRegionor str to write only regions overlapping this region - kwargs – Passed to
write_gff()
-
class
-
class
fanc.hic.HicEdgeFilter(hic=None, mask=None)¶ Bases:
fanc.general.MaskFilterAbstract class that provides filtering functionality for the edges/contacts in a
Hicobject.Extends MaskFilter and overrides valid(self, row) to make
HicEdgefiltering more “natural”.To create custom filters for the
Hicobject, extend this class and override the valid_edge(self, edge) method. valid_edge should return False for a specificHicEdgeobject if the object is supposed to be filtered/masked and True otherwise. SeeDiagonalFilterfor an example.Pass a custom filter to the
filter()method inHicto apply it.-
set_hic_object(hic_object)¶ Set the
Hicinstance to be filtered by this HicEdgeFilter.Used internally by
Hicinstance.Parameters: hic_object – Hicobject
-
valid(row)¶ Map valid_edge to MaskFilter.valid(self, row).
Parameters: row – A pytables Table row. Returns: The boolean value returned by valid_edge.
-
valid_edge(edge)¶ Determine if a
HicEdgeobject is valid or should be filtered.When implementing custom HicEdgeFilter this method must be overridden. It should return False for
HicEdgeobjects that are to be fitered and True otherwise.Internally, the
Hicobject will iterate over all HicEdge instances to determine their validity on an individual basis.Parameters: edge – A HicEdgeobjectReturns: True if HicEdgeis valid, False otherwise
-
-
class
fanc.hic.LegacyHic(file_name=None, mode='a', tmpdir=None, partition_strategy='chromosome', additional_region_fields=None, additional_edge_fields=None, _table_name_regions='nodes', _table_name_edges='edges', _edge_buffer_size='3G')¶ Bases:
fanc.hic.Hic-
class
ChromosomeDescription¶ Bases:
tables.description.IsDescriptionDescription of the chromosomes in this object.
-
class
MaskDescription¶ Bases:
tables.description.IsDescription
-
class
RegionDescription¶ Bases:
tables.description.IsDescriptionDescription of a genomic region for PyTables Table
-
add_contact(contact, *args, **kwargs)¶ Alias for
add_edge()Parameters: - contact –
Edge - args – Positional arguments passed to
_add_edge() - kwargs – Keyword arguments passed to
_add_edge()
- contact –
-
add_contacts(contacts, *args, **kwargs)¶ Alias for
add_edges()
-
add_edge(edge, check_nodes_exist=True, *args, **kwargs)¶ Add an edge / contact between two regions to this object.
Parameters: - edge –
Edge, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight). - check_nodes_exist – Make sure that there are nodes that match source and sink indexes
- args – Positional arguments passed to
_add_edge() - kwargs – Keyword arguments passed to
_add_edge()
- edge –
-
add_edge_from_dict(edge, *args, **kwargs)¶ Direct method to add an edge from dict input.
Parameters: edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes
-
add_edge_from_edge(edge, *args, **kwargs)¶ Direct method to add an edge from
Edgeinput.Parameters: edge – Edge
-
add_edge_from_list(edge, *args, **kwargs)¶ Direct method to add an edge from list or tuple input.
Parameters: edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)
-
add_edge_simple(source, sink, weight=None, *args, **kwargs)¶ Direct method to add an edge from
Edgeinput.Parameters: - source – Source region index
- sink – Sink region index
- weight – Weight of the edge
-
add_edges(edges, flush=True, *args, **kwargs)¶ Bulk-add edges from a list.
List items can be any of the supported edge types, list, tuple, dict, or
Edge. Repeatedly callsadd_edge(), so may be inefficient for large amounts of data.Parameters: edges – List (or iterator) of edges. See add_edge()for details
-
add_mask_description(name, description)¶ Add a mask description to the _mask table and return its ID.
Parameters: - name (str) – name of the mask
- description (str) – description of the mask
Returns: id of the mask
Return type: int
-
add_region(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
Parameters: region – Can be a GenomicRegion, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
Parameters: regions – List (or any iterator) with objects that describe a genomic region. See add_regionfor options.
-
bias_vector()¶ Get or set the vector of region biases in this object.
This internally sets the “bias” attribute of each region in the object.
Parameters: vector – a numpy array with bias values Returns: a numpy array with bias values
-
bin(bin_size, threads=1, chromosomes=None, *args, **kwargs)¶ Map edges in this object to equidistant bins.
Parameters: - bin_size – Bin size in base pairs
- threads – Number of threads used for binning
Returns: Hicobject
-
static
bin_intervals(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
Parameters: - intervals – iterator of tuples (start, end, score)
- bins – Number of bins to divide the region into
- interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
- smoothing_window – Size of window (in bins) to smooth scores over
- nan_replacement – NaN values in the scores will be replaced with this value
- zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
Parameters: - intervals – iterator of tuples (start, end, score)
- bin_size – Size of each bin in base pairs
- interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
- smoothing_window – Size of window (in bins) to smooth scores over
- nan_replacement – NaN values in the scores will be replaced with this value
- zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)
-
bin_size¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
Returns: int
-
binned_regions(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegionobjects instead of tuples.Parameters: - region – String or class:~GenomicRegion object denoting the region to be binned
- bins – Number of bins to divide the region into
- bin_size – Size of each bin (alternative to bins argument)
- smoothing_window – Size of window (in bins) to smooth scores over
- nan_replacement – NaN values in the scores will be replaced with this value
- zero_to_nan – If True, will convert bins with score 0 to NaN
- args – Arguments passed to _region_intervals
- kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of
GenomicRegionobjects
-
bins_to_distance(bins)¶ Convert fraction of bins to base pairs
Parameters: bins – float, fraction of bins Returns: int, base pairs
-
chromosome_bins¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
chromosome_lengths¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
Parameters: - copy_tmp – If False, does not overwrite original with modified file.
- remove_tmp – If False, does not delete temporary copy of file.
-
distance_to_bins(distance)¶ Convert base pairs to fraction of bins.
Parameters: distance – distance in base pairs Returns: float, distance as fraction of bin size
-
downsample(n, file_name=None)¶ Sample edges from this object.
Sampling is always done on uncorrected Hi-C matrices.
Parameters: - n – Sample size or reference object. If n < 1 will be interpreted as a fraction of total reads in this object.
- file_name – Output file name for down-sampled object.
Returns: RegionPairsTable
-
edge_data(attribute, *args, **kwargs)¶ Iterate over specific edge attribute.
Parameters: - attribute – Name of the attribute, e.g. “weight”
- args – Positional arguments passed to
edges() - kwargs – Keyword arguments passed to
edges()
Returns: iterator over edge attribute
-
edge_subset(key=None, *args, **kwargs)¶ Get a subset of edges.
This is an alias for
edges().Returns: generator ( Edge)
-
edges¶ Iterate over contacts / edges.
edges()is the central function ofRegionPairsContainer. Here, we will use theHicimplementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementingRegionPairsContainer, includingJuicerHicandCoolerHic.import fanc # file from FAN-C examples hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")
We can easily find the number of edges in the sample
Hicobject:len(hic.edges) # 8695
When used in an iterator context,
edges()iterates over all edges in theRegionPairsContainer:for edge in hic.edges: # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Calling
edges()as a method has the same effect:# note the '()' for edge in hic.edges(): # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a
GenomicRegion, all non-zero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings orGenomicRegion, only edges between the two regions are returned.# select all edges between chromosome 19 # and any other region: for edge in hic.edges("chr19"): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ... # select all edges that are only on # chromosome 19 for edge in hic.edges(('chr19', 'chr19')): print(edge) # 90--116; bias: 0.00021173151730025176; sink_node: chr19:37000001-38000000; source_node: chr19:11000001-12000000; weight: 0.009104455243910825 # 135--135; bias: 0.00018003890596887822; sink_node: chr19:56000001-57000000; source_node: chr19:56000001-57000000; weight: 0.10028167062466517 # 123--123; bias: 0.00011063368998965993; sink_node: chr19:44000001-45000000; source_node: chr19:44000001-45000000; weight: 0.1386240135570439 # 92--93; bias: 0.00040851066434864896; sink_node: chr19:14000001-15000000; source_node: chr19:13000001-14000000; weight: 0.10090213409411629 # ... # select inter-chromosomal edges # between chromosomes 18 and 19 for edge in hic.edges(('chr18', 'chr19')): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ...
By default,
edges()will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all file-based FAN-CRegionPairsContainerobjects support lazy loading, where attributes are only read on demand.for edge in hic.edges('chr18', lazy=True): print(edge.source, edge.sink, edge.weight, edge) # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0> # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1> # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2> # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3> # ...
Warning
The lazy iterator reuses the
LazyEdgeobject in every iteration, and overwrites theLazyEdgeattributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expectedlist(hic.edges()), with allEdgeobjects stored in the list, while this codelist(hic.edges(lazy=True))will result in a list of identicalLazyEdgeobjects. Always ensure you do all edge processing in the loop when working with lazy iterators!When working with normalised contact frequencies, such as obtained through matrix balancing in the example above,
edges()automatically returns normalised edge weights. In addition, thebiasattribute will (typically) have a value different from 1.When you are interested in the raw contact frequency, use the
norm=Falseparameter:for edge in hic.edges('chr18', lazy=True, norm=False): print(edge.source, edge.sink, edge.weight) # 42 42 2120.0 # 24 28 388.0 # 5 76 94.0 # 66 68 470.0 # ...
You can also choose to omit all intra- or inter-chromosomal edges using
intra_chromosomal=Falseorinter_chromosomal=False, respectively.Returns: Iterator over Edgeor equivalent.
-
edges_dict(*args, **kwargs)¶ Edges iterator with access by bracket notation.
This iterator always returns unnormalised edges.
Returns: dict or dict-like iterator
-
expected_values(selected_chromosome=None, norm=True, force=False, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.
Parameters: - selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
- norm – If False, will calculate the expected values on the unnormalised matrix.
- args – Not used in this context
- kwargs – Not used in this context
Returns: list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value
-
expected_values_and_marginals(selected_chromosome=None, norm=True, force=False, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances and the whole matrix marginals.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.
Parameters: - selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
- norm – If False, will calculate the expected values on the unnormalised matrix.
- args – Not used in this context
- kwargs – Not used in this context
Returns: list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value
-
filter(edge_filter, queue=False, log_progress=True)¶ Filter edges in this object by using a
MaskFilter.Parameters: - edge_filter – Class implementing
MaskFilter. - queue – If True, filter will be queued and can be executed
along with other queued filters using
run_queued_filters() - log_progress – If true, process iterating through all edges will be continuously reported.
- edge_filter – Class implementing
-
filter_diagonal(distance=0, queue=False)¶ Convenience function that applies a
DiagonalFilter.Parameters: - distance – Distance from the diagonal up to which matrix entries will be filtered/removed. The default, 0, filters only the diagonal itself.
- queue – If True, filter will be queued and can be executed along with other queued filters using run_queued_filters
-
filter_low_coverage_regions(rel_cutoff=None, cutoff=None, queue=False)¶ Convenience function that applies a
LowCoverageFilter.The cutoff can be provided in two ways: 1. As an absolute threshold. Regions with contact count below this absolute threshold are filtered 2. As a fraction relative to the median contact count of all regions.
If both is supplied, whichever threshold is lower will be selected.
If no parameter is supplied, rel_cutoff will be chosen as 0.1.
Parameters: - rel_cutoff – A cutoff as a fraction (0-1) of the median contact count of all regions.
- cutoff – A cutoff in absolute contact counts (can be float) below which regions are considered “low coverage”
- queue – If True, filter will be queued and can be executed along with other queued filters using run_queued_filters
-
find_region(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
Parameters: query_regions – Region selector string, :class:~GenomicRegion, or list of the former Returns: index (or list of indexes) of the region at the center of the query region
-
flush(silent=False, update_mappability=True)¶ Write data to file and flush buffers.
Parameters: - silent – do not print flush progress
- update_mappability – After writing data, update mappability and expected values
-
get_mask(key)¶ Search _mask table for key and return Mask.
Parameters: - key (int) – search by mask name
- key – search by mask ID
Returns: Mask
-
get_masks(ix)¶ Extract mask IDs encoded in parameter and return masks.
IDs are powers of 2, so a single int field in the table can hold multiple masks by simply adding up the IDs. Similar principle to UNIX chmod (although that uses base 8)
Parameters: ix (int) – integer that is the sum of powers of 2. Note that this value is not necessarily itself a power of 2. Returns: list of Masks extracted from ix Return type: list (Mask)
-
intervals(*args, **kwargs)¶ Alias for region_intervals.
-
load_from_hic(hic, threads=1, chromosomes=None, _edges_by_overlap_method=<function _edge_overlap_split_rao>, _regions_soft_max=50000)¶ Load data from another
Hicobject.If this object has no associated regions, the regions and contacts of the provided object will simply be copied.
If regions are already present, the contacts of the provided matrix will be binned into the regions of this object using the overlap method provided.
Parameters: - hic – Another
Hicobject - threads – Number of parallel processing threads. More threads also means higher memory usage.
- _edges_by_overlap_method – A function that maps reads from
one genomic region to others using
a supplied overlap map. By default
it uses the Rao et al. (2014) method.
See
_edge_overlap_split_rao() - _regions_soft_max – Maximum dimension of each processed submatrix per thread. This is a soft maximum, which may be increased as required for very large chromosomes or small bin sizes
- hic – Another
-
mappable()¶ Get the mappability of regions in this object.
A “mappable” region has at least one contact to another region in the genome.
Returns: arraywhere True means mappable and False unmappable
-
marginals(masked=True, *args, **kwargs)¶ Get the marginals vector of this Hic matrix.
Sums up all contacts for each bin of the Hi-C matrix. Unmappable regoins will be masked in the returned vector unless the
maskedparameter is set toFalse.By default, corrected matrix entries are summed up. To get uncorrected matrix marginals use
norm=False. Generally, all parameters accepted byedges()are supported.Parameters: - masked – Use a numpy masked array to mask entries corresponding to unmappable regions
- kwargs – Keyword arguments passed to
edges()
-
matrix(key=None, log=False, default_value=None, mask=True, log_base=2, *args, **kwargs)¶ Assemble a
RegionMatrixfrom region pairs.Parameters: - key – Matrix selector. See
edges()for all supported key types - log – If True, log-transform the matrix entries. Also see log_base
- log_base – Base of the log transformation. Default: 2; only used when log=True
- default_value – (optional) set the default value of matrix entries that have no associated edge/contact
- mask – If False, do not mask unmappable regions
- args – Positional arguments passed to
regions_and_matrix_entries() - kwargs – Keyword arguments passed to
regions_and_matrix_entries()
Returns: - key – Matrix selector. See
-
classmethod
merge(matrices, *args, **kwargs)¶ Merge multiple
RegionMatrixContainerobjects.Merging is done by adding the weight of edges in each object.
Parameters: matrices – list of RegionMatrixContainerReturns: merged RegionMatrixContainer
-
possible_contacts()¶ Calculate the possible number of contacts in the genome.
This calculates the number of potential region pairs in a genome for any possible separation distance, taking into account the existence of unmappable regions.
It will calculate one number for inter-chromosomal pairs, return a list with the number of possible pairs where the list index corresponds to the number of bins separating two regions, and a dictionary of lists for each chromosome.
Returns: possible intra-chromosomal pairs, possible intra-chromosomal pairs by chromosome, possible inter-chromosomal pairs
-
region_bins(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
Parameters: args – provide a GenomicRegionhere to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.Returns:
-
region_data(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
Parameters: - key – Name of the data column
- value – vector with region-based data (one entry per region)
-
region_intervals(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
Parameters: - region – String or class:~GenomicRegion object denoting the region to be binned
- bins – Number of bins to divide the region into
- bin_size – Size of each bin (alternative to bins argument)
- smoothing_window – Size of window (in bins) to smooth scores over
- nan_replacement – NaN values in the scores will be replaced with this value
- zero_to_nan – If True, will convert bins with score 0 to NaN
- args – Arguments passed to _region_intervals
- kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of tuples: (start, end, score)
-
region_subset(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
Parameters: region – String or class:~GenomicRegion object for which covered bins will be returned.
-
regions¶ Iterate over genomic regions in this object.
Will return a
GenomicRegionobject in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.Returns: RegionIter
-
regions_and_edges(key, *args, **kwargs)¶ Convenient access to regions and edges selected by key.
Parameters: - key – Edge selector, see
edges() - args – Positional arguments passed to
edges() - kwargs – Keyword arguments passed to
edges()
Returns: list of row regions, list of col regions, iterator over edges
- key – Edge selector, see
-
regions_and_matrix_entries(key=None, score_field=None, *args, **kwargs)¶ Convenient access to non-zero matrix entries and associated regions.
Parameters: - key – Edge key, see
edges() - oe – If True, will divide observed values by their expected value at the given distance. False by default
- oe_per_chromosome – If True (default), will do a per-chromosome O/E calculation rather than using the whole matrix to obtain expected values
- score_field – (optional) any edge attribute that returns a number
can be specified here for filling the matrix. Usually
this is defined by the
_default_score_fieldattribute of the matrix class. - args – Positional arguments passed to
edges() - kwargs – Keyword arguments passed to
edges()
Returns: list of row regions, list of col regions, iterator over (i, j, weight) tuples
- key – Edge key, see
-
regions_dict¶ Return a dictionary with region index as keys and regions as values.
Returns: dict {region.ix: region, …}
-
static
regions_identical(pairs)¶ Check if the regions in all objects in the list are identical.
Parameters: pairs – listofRegionBasedobjectsReturns: True if chromosome, start, and end are identical between all regions in the same list positions.
-
run_queued_filters(log_progress=True)¶ Run queued filters.
Parameters: log_progress – If true, process iterating through all edges will be continuously reported.
-
scaling_factor(matrix, weight_column=None)¶ Compute the scaling factor to another matrix.
Calculates the ratio between the number of contacts in this Hic object to the number of contacts in another Hic object.
Parameters: - matrix – A
Hicobject - weight_column – Name of the column to calculate the scaling factor on
Returns: float
- matrix – A
-
subset(*regions, **kwargs)¶ Subset a Hic object by specifying one or more subset regions.
Parameters: - regions – string or GenomicRegion object(s)
- kwargs – Supports
file_name: destination file name of subset Hic object;
tmpdir: if True works in tmp until object is closed
additional parameters are passed to
edges()
Returns: Hic
-
to_bed(file_name, subset=None, **kwargs)¶ Export regions as BED file
Parameters: - file_name – Path of file to write regions to
- subset – optional
GenomicRegionor str to write only regions overlapping this region - kwargs – Passed to
write_bed()
-
to_bigwig(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
Parameters: - file_name – Path of file to write regions to
- subset – optional
GenomicRegionor str to write only regions overlapping this region - kwargs – Passed to
write_bigwig()
-
to_gff(file_name, subset=None, **kwargs)¶ Export regions as GFF file
Parameters: - file_name – Path of file to write regions to
- subset – optional
GenomicRegionor str to write only regions overlapping this region - kwargs – Passed to
write_gff()
-
class
-
class
fanc.hic.LowCoverageFilter(hic_object, cutoff=None, rel_cutoff=None, mask=None)¶ Bases:
fanc.hic.HicEdgeFilterFilter a
HicEdgeif it connects a region that does not have a contact count larger than a specified cutoff.If the cutoff is not provided, it is automatically chosen at 10% of the mean contact count of all regions.
-
set_hic_object(hic_object)¶ Set the
Hicinstance to be filtered by this HicEdgeFilter.Used internally by
Hicinstance.Parameters: hic_object – Hicobject
-
valid(row)¶ Map valid_edge to MaskFilter.valid(self, row).
Parameters: row – A pytables Table row. Returns: The boolean value returned by valid_edge.
-
valid_edge(edge)¶ Check if an edge falls into a low-coverage region.
-
-
fanc.hic.ice_balancing(hic, tolerance=0.01, max_iterations=500, whole_matrix=True, inter_chromosomal=True, intra_chromosomal=True, restore_coverage=False, sqrt=True)¶ Apply ICE balancing to Hi-C matrices.
Iteratively calculates and divides by the matrix margins.
Parameters: - hic – Hi-C object
- tolerance – Error tolerance (marginal error)
- max_iterations – Maximum number of iterations to perform to achieve error tolerance
- whole_matrix – Correct the whole matrix at once. Default is to correct each chromosome individually.
- inter_chromosomal – Include inter-chromosomal contacts in balancing (only whole matrix)
- intra_chromosomal – Include intra-chromosomal contacts in balancing (only whole matrix)
- restore_coverage – Restore the matrix to its original coverage after balancing, i.e. the sum of contacts in the matrix after balancing remains (roughly) the same
Returns: bias vector
-
fanc.hic.sqrt_vanilla_coverage_norm(*args, **kwargs)¶ Apply vanilla coverage normalisation to Hi-C matrices with sqrt bias vectors.
Identical to ice_balancing with max_iterations set to 1.
Parameters: - args – see ice_balancing
- kwargs – ice_balancing
Returns: bias vector (numpy)
-
fanc.hic.vanilla_coverage_norm(*args, **kwargs)¶ Apply vanilla coverage normalisation to Hi-C matrices.
Identical to ice_balancing with max_iterations set to 1 and sqrt to False.
Parameters: - args – see ice_balancing
- kwargs – ice_balancing
Returns: bias vector (numpy)