Aggregate module¶
-
class
fanc.architecture.aggregate.
AggregateMatrix
(file_name=None, mode='r', tmpdir=None, x=None, y=None)¶ Bases:
fanc.general.FileGroup
Construct and store aggregate matrices from matrix-based objects.
Methods in this class can be used to generate various kinds of aggregate matrices, constructed from averaging the signal from different regions of a Hi-C (or similar) matrix. Particularly useful is the creation of aggregate matrices from observed/expected data.
Class methods control how exactly an aggregate matrix is constructed:
AggregateMatrix.from_center()
will aggregate Hi-C matrix regions along the diagonal in a fixed window around the region center. This is useful, for example, to observe the signal around TAD boundaries or other local features, such as the start of genes, enhancer locations, …AggregateMatrix.from_regions()
will extract sub-matrices using regions of variable size - such as TADs - and interpolate them to the same number of pixels before aggregating them.AggregateMatrix.from_center_pairs()
will extract arbitrary Hi-C submatrices from a list of region pairs (representing row and column of the matrix). Each submatrix is centered on each region, and a fixed number of pixels around the center is extracted. This is used, for example, to plot aggregate matrices around loops, using the loop anchors as input.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
Parameters: - copy_tmp – If False, does not overwrite original with modified file.
- remove_tmp – If False, does not delete temporary copy of file.
-
components
(components=None)¶ Retrieve or store each individual submatrix composing the aggregate matrix.
Parameters: components – List of (masked) numpy arrays Returns: List of (masked) numpy arrays
-
classmethod
from_center
(matrix, regions, window=200000, rescale=False, scaling_exponent=-0.25, keep_components=True, file_name=None, tmpdir=None, region_viewpoint='center', **kwargs)¶ Construct an aggregate matrix from square regions along the diagonal with a fixed window size.
By default, the submatrix that is extracted from
matrix
is centred on the region centre and has a window size specified bywindow
. You can change where the window will be centered usingregion_viewpoint
, which can be any of “center”, “start”, “end”, “five_prime”, or “three_prime”. The latter two may be particularly useful for genomic features such as genes.Example for TAD boundaries:
import fanc hic = fanc.load("/path/to/matrix.hic") tad_boundaries = fanc.load("/path/to/tad_boundaries.bed") # run aggregate analysis am = fanc.AggregateMatrix.from_center(hic, tad_boundaries.regions, window=500000) # extract matrix when done m = am.matrix()
Parameters: - matrix – An object of type
RegionMatrixContainer
, such as a Hic matrix - regions – A list of
GenomicRegion
objects - window – A window size in base pairs
- rescale – If True, will use
scaling_exponent
to artificially rescale the aggregate matrix values using a power law - scaling_exponent – The power law exponent used if
rescale
is True - keep_components – If True (default) will store each submatrix used
to generate the aggregate matrix in the
AggregateMatrix
object, which can be retrieved usingAggregateMatrix.components()
- file_name – If provided, stores the aggregate matrix object at this location.
- tmpdir – If True will work in temporary directory until the object is closed
- region_viewpoint – point on which window is centred. any of “center”, “start”, “end”, “five_prime”, or “three_prime”
- kwargs – Keyword arguments passed to
extract_submatrices()
Returns: aggregate matrix
- matrix – An object of type
-
classmethod
from_center_pairs
(hic, pair_regions, window=None, pixels=16, keep_components=True, file_name=None, tmpdir=None, region_viewpoint='center', **kwargs)¶ Construct an aggregate matrix from pairs of regions.
Parameters: - hic – A compatible Hi-C matrix
- pair_regions – A list of region pairs
- window – A window size in base pairs
- pixels – The dimension (in pixels) of the output matrix
- keep_components – Keep all submatrices that make up the aggregate matrix
- file_name – Optional path to an output file
- tmpdir – Optional. If
True
, will work in temporary directory until file is closed - region_viewpoint – Location in each region that is used as anchor for the extracted matrix. ‘center’ by default, also valid are ‘start’, ‘end’, ‘five_prime’, and ‘three_prime’
- kwargs – Keyword arguments passed on to
extract_submatrices()
Returns:
-
classmethod
from_regions
(hic, tad_regions, pixels=90, rescale=False, scaling_exponent=-0.25, interpolation=0, boundary_mode='reflect', keep_mask=True, absolute_extension=0, relative_extension=1.0, keep_components=True, anti_aliasing=True, file_name=None, tmpdir=None, **kwargs)¶ Construct aggregate matrix from variable regions along the diagonal.
For each region in
tad_regions
, a submatrix is extracted and interpolated so that it is exactlypixels
xpixels
big. You can expand each region by a relative amount usingrelative_extension
.Example for aggregate TADs:
import fanc hic = fanc.load("/path/to/matrix.hic") tads = fanc.load("/path/to/tads.bed") # run aggregate analysis am = fanc.AggregateMatrix.from_regions(hic, tads.regions, relative_extension=3.) # extract matrix when done m = am.matrix() # 90x90 matrix with aggregate TAD in the centre
Parameters: - hic – An object of type
RegionMatrixContainer
, such as a Hic matrix - tad_regions – A list of
GenomicRegion
objects - pixels – Number of pixels along each dimension of the aggregate matrix
- rescale – If True, will use
scaling_exponent
to artificially rescale the aggregate matrix values using a power law - scaling_exponent – The power law exponent used if
rescale
is True - interpolation – Type of interpolation used on each submatrix in range 0-5. 0: Nearest-neighbor (default), 1: Bi-linear, 2: Bi-quadratic, 3: Bi-cubic, 4: Bi-quartic, 5: Bi-quintic
- boundary_mode – Points outside the boundaries of the input are filled according to the given mode. Options are constant, edge, symmetrix, reflect, and warp. Affects submatrix interpolation.
- keep_mask – If True (default) maksed Hi-C regions will also be interpolated.
- absolute_extension – Absolute number of base pairs by which to expand each region
- absolute_extension – Amount by which to expand each region as a fraction of each region. Values smaller than 1 lead to region shrinking
- keep_components – If True (default) will store each submatrix used
to generate the aggregate matrix in the
AggregateMatrix
object, which can be retrieved usingAggregateMatrix.components()
- file_name – If provided, stores the aggregate matrix object at this location.
- tmpdir – If True will work in temporary directory until the object is closed
- kwargs – Keyword argumnts passed to
extract_submatrices()
Returns: aggregate matrix
- hic – An object of type
-
matrix
(m=None)¶ Retrieve or set the aggregate matrix in this object.
Parameters: m – Numpy matrix Returns: aggregate matrix
-
region_pairs
(pairs=None)¶ Retrieve or set the regions used to generate the aggregate matrix.
Parameters: pairs – Iterable of region tuples of the form [(region1, region2), (region3, region4), …]. If None, simply return the region pairs in this object. Returns: List of region pairs [(region1, region2), (region3, region4), …].