Matrix module¶

class
fanc.matrix.
Edge
(source, sink, _weight_field='weight', **kwargs)¶ Bases:
object
A contact / an Edge between two genomic regions.

source
¶ The index of the “source” genomic region. By convention, source <= sink.

sink
¶ The index of the “sink” genomic region.

bias
¶ Bias factor obtained via normalisation of the HiC matrix

source_node
¶ The first
GenomicRegion
in this contact

sink_node
¶ The second
GenomicRegion
in this contact


class
fanc.matrix.
LazyEdge
(row, regions_table=None, _weight_field='weight')¶ Bases:
object
An
Edge
equivalent supporting lazy loading.
source
¶ The index of the “source” genomic region. By convention, source <= sink.

sink
¶ The index of the “sink” genomic region.

bias
¶ Bias factor obtained via normalisation of the HiC matrix

source_node
¶ The first
GenomicRegion
in this contact

sink_node
¶ The second
GenomicRegion
in this contact


class
fanc.matrix.
MutableLazyEdge
(row, regions_table=None, _weight_field='weight')¶ Bases:
fanc.matrix.LazyEdge

update
()¶ Write changes to PyTables row to file.


class
fanc.matrix.
RegionMatrix
¶ Bases:
numpy.ma.core.MaskedArray
Subclass of
masked_array
with genomic region support.Objects of this type are returned by
matrix
.RegionMatrix
supports subsetting byGenomicRegion
and region strings of the form<chromosome>[:<start><end>]
.import fanc hic = fanc.load("output/hic/binned/fanc_example_1mb.hic") m = hic.matrix(('chr18', 'chr18')) type(m) # fanc.matrix.RegionMatrix m_sub = m['chr18:15mb', 'chr18:110mb'] type(m_sub) # fanc.matrix.RegionMatrix m.shape # 5, 10 m_sub.row_regions # [chr18:11000000, chr18:10000012000000, # chr18:20000013000000, chr18:30000014000000, # chr18:40000015000000]
If the associated row or col regions have a
False
valid
attribute, the rows/cols of the :RegionMatrix
will be masked.
row_regions
¶ A list of regions matching the first matrix dimension

col_regions
¶ A list of regions matching the second matrix dimension

all
(axis=None, out=None, keepdims=<no value>)¶ Returns True if all elements evaluate to True.
The output array is masked where all the values along the given axis are masked: if the output would have been a scalar and that all the values are masked, then the output is masked.
Refer to numpy.all for full documentation.
See also
numpy.ndarray.all()
 corresponding function for ndarrays
numpy.all()
 equivalent function
Examples
>>> np.ma.array([1,2,3]).all() True >>> a = np.ma.array([1,2,3], mask=True) >>> (a.all() is np.ma.masked) True

anom
(axis=None, dtype=None)¶ Compute the anomalies (deviations from the arithmetic mean) along the given axis.
Returns an array of anomalies, with the same shape as the input and where the arithmetic mean is computed along the given axis.
Parameters:  axis (int, optional) – Axis over which the anomalies are taken. The default is to use the mean of the flattened array as reference.
 dtype (dtype, optional) –
 Type to use in computing the variance. For arrays of integer type
 the default is float32; for arrays of float types it is the same as the array type.
See also
mean()
 Compute the mean of the array.
Examples
>>> a = np.ma.array([1,2,3]) >>> a.anom() masked_array(data=[1., 0., 1.], mask=False, fill_value=1e+20)

any
(axis=None, out=None, keepdims=<no value>)¶ Returns True if any of the elements of a evaluate to True.
Masked values are considered as False during computation.
Refer to numpy.any for full documentation.
See also
numpy.ndarray.any()
 corresponding function for ndarrays
numpy.any()
 equivalent function

argmax
(axis=None, fill_value=None, out=None)¶ Returns array of indices of the maximum values along the given axis. Masked values are treated as if they had the value fill_value.
Parameters:  axis ({None, integer}) – If None, the index is into the flattened array, otherwise along the specified axis
 fill_value (scalar or None, optional) – Value used to fill in the masked values. If None, the output of maximum_fill_value(self._data) is used instead.
 out ({None, array}, optional) – Array into which the result can be placed. Its type is preserved and it must be of the right shape to hold the output.
Returns: index_array
Return type: {integer_array}
Examples
>>> a = np.arange(6).reshape(2,3) >>> a.argmax() 5 >>> a.argmax(0) array([1, 1, 1]) >>> a.argmax(1) array([2, 2])

argmin
(axis=None, fill_value=None, out=None)¶ Return array of indices to the minimum values along the given axis.
Parameters:  axis ({None, integer}) – If None, the index is into the flattened array, otherwise along the specified axis
 fill_value (scalar or None, optional) – Value used to fill in the masked values. If None, the output of minimum_fill_value(self._data) is used instead.
 out ({None, array}, optional) – Array into which the result can be placed. Its type is preserved and it must be of the right shape to hold the output.
Returns: If multidimension input, returns a new ndarray of indices to the minimum values along the given axis. Otherwise, returns a scalar of index to the minimum values along the given axis.
Return type: ndarray or scalar
Examples
>>> x = np.ma.array(np.arange(4), mask=[1,1,0,0]) >>> x.shape = (2,2) >>> x masked_array( data=[[, ], [2, 3]], mask=[[ True, True], [False, False]], fill_value=999999) >>> x.argmin(axis=0, fill_value=1) array([0, 0]) >>> x.argmin(axis=0, fill_value=9) array([1, 1])

argpartition
(kth, axis=1, kind='introselect', order=None)¶ Returns the indices that would partition this array.
Refer to numpy.argpartition for full documentation.
New in version 1.8.0.
See also
numpy.argpartition()
 equivalent function

argsort
(axis=<no value>, kind=None, order=None, endwith=True, fill_value=None)¶ Return an ndarray of indices that sort the array along the specified axis. Masked values are filled beforehand to fill_value.
Parameters:  axis (int, optional) –
Axis along which to sort. If None, the default, the flattened array is used.
Changed in version 1.13.0: Previously, the default was documented to be 1, but that was in error. At some future date, the default will change to 1, as originally intended. Until then, the axis should be given explicitly when
arr.ndim > 1
, to avoid a FutureWarning.  kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, optional) – The sorting algorithm used.
 order (list, optional) – When a is an array with fields defined, this argument specifies which fields to compare first, second, etc. Not all fields need be specified.
 endwith ({True, False}, optional) – Whether missing values (if any) should be treated as the largest values (True) or the smallest values (False) When the array contains unmasked values at the same extremes of the datatype, the ordering of these values and the masked values is undefined.
 fill_value (scalar or None, optional) – Value used internally for the masked values.
If
fill_value
is not None, it supersedesendwith
.
Returns: index_array – Array of indices that sort a along the specified axis. In other words,
a[index_array]
yields a sorted a.Return type: ndarray, int
See also
ma.MaskedArray.sort()
 Describes sorting algorithms used.
lexsort()
 Indirect stable sort with multiple keys.
numpy.ndarray.sort()
 Inplace sort.
Notes
See sort for notes on the different sorting algorithms.
Examples
>>> a = np.ma.array([3,2,1], mask=[False, False, True]) >>> a masked_array(data=[3, 2, ], mask=[False, False, True], fill_value=999999) >>> a.argsort() array([1, 0, 2])
 axis (int, optional) –

astype
(dtype, order='K', casting='unsafe', subok=True, copy=True)¶ Copy of the array, cast to a specified type.
Parameters:  dtype (str or dtype) – Typecode or datatype to which the array is cast.
 order ({'C', 'F', 'A', 'K'}, optional) – Controls the memory layout order of the result. ‘C’ means C order, ‘F’ means Fortran order, ‘A’ means ‘F’ order if all the arrays are Fortran contiguous, ‘C’ order otherwise, and ‘K’ means as close to the order the array elements appear in memory as possible. Default is ‘K’.
 casting ({'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional) –
Controls what kind of data casting may occur. Defaults to ‘unsafe’ for backwards compatibility.
 ’no’ means the data types should not be cast at all.
 ’equiv’ means only byteorder changes are allowed.
 ’safe’ means only casts which can preserve values are allowed.
 ’same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed.
 ’unsafe’ means any data conversions may be done.
 subok (bool, optional) – If True, then subclasses will be passedthrough (default), otherwise the returned array will be forced to be a baseclass array.
 copy (bool, optional) – By default, astype always returns a newly allocated array. If this is set to false, and the dtype, order, and subok requirements are satisfied, the input array is returned instead of a copy.
Returns: arr_t – Unless copy is False and the other conditions for returning the input array are satisfied (see description for copy input parameter), arr_t is a new array of the same shape as the input array, with dtype, order given by dtype, order.
Return type: ndarray
Notes
Changed in version 1.17.0: Casting between a simple data type and a structured one is possible only for “unsafe” casting. Casting to multiple fields is allowed, but casting from multiple fields is not.
Changed in version 1.9.0: Casting from numeric to string types in ‘safe’ casting mode requires that the string dtype length is long enough to store the max integer/float value converted.
Raises: ComplexWarning
– When casting from complex to float or int. To avoid this, one should usea.real.astype(t)
.Examples
>>> x = np.array([1, 2, 2.5]) >>> x array([1. , 2. , 2.5])
>>> x.astype(int) array([1, 2, 2])

base
¶ Base object if memory is from some other object.
Examples
The base of an array that owns its memory is None:
>>> x = np.array([1,2,3,4]) >>> x.base is None True
Slicing creates a view, whose memory is shared with x:
>>> y = x[2:] >>> y.base is x True

baseclass
¶ Class of the underlying data (readonly).

byteswap
(inplace=False)¶ Swap the bytes of the array elements
Toggle between lowendian and bigendian data representation by returning a byteswapped array, optionally swapped inplace. Arrays of bytestrings are not swapped. The real and imaginary parts of a complex number are swapped individually.
Parameters: inplace (bool, optional) – If True
, swap bytes inplace, default isFalse
.Returns: out – The byteswapped array. If inplace is True
, this is a view to self.Return type: ndarray Examples
>>> A = np.array([1, 256, 8755], dtype=np.int16) >>> list(map(hex, A)) ['0x1', '0x100', '0x2233'] >>> A.byteswap(inplace=True) array([ 256, 1, 13090], dtype=int16) >>> list(map(hex, A)) ['0x100', '0x1', '0x3322']
Arrays of bytestrings are not swapped
>>> A = np.array([b'ceg', b'fac']) >>> A.byteswap() array([b'ceg', b'fac'], dtype='S3')
A.newbyteorder().byteswap()
produces an array with the same values but different representation in memory
>>> A = np.array([1, 2, 3]) >>> A.view(np.uint8) array([1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0], dtype=uint8) >>> A.newbyteorder().byteswap(inplace=True) array([1, 2, 3]) >>> A.view(np.uint8) array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3], dtype=uint8)

choose
(choices, out=None, mode='raise')¶ Use an index array to construct a new array from a set of choices.
Refer to numpy.choose for full documentation.
See also
numpy.choose()
 equivalent function

clip
(min=None, max=None, out=None, **kwargs)¶ Return an array whose values are limited to
[min, max]
. One of max or min must be given.Refer to numpy.clip for full documentation.
See also
numpy.clip()
 equivalent function

compress
(condition, axis=None, out=None)¶ Return a where condition is
True
.If condition is a ~ma.MaskedArray, missing values are considered as
False
.Parameters:  condition (var) – Boolean 1d array selecting which entries to return. If len(condition) is less than the size of a along the axis, then output is truncated to length of condition array.
 axis ({None, int}, optional) – Axis along which the operation must be performed.
 out ({None, ndarray}, optional) – Alternative output array in which to place the result. It must have the same shape as the expected output but the type will be cast if necessary.
Returns: result – A
MaskedArray
object.Return type: MaskedArray
Notes
Please note the difference with
compressed()
! The output ofcompress()
has a mask, the output ofcompressed()
does not.Examples
>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4) >>> x masked_array( data=[[1, , 3], [, 5, ], [7, , 9]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999) >>> x.compress([1, 0, 1]) masked_array(data=[1, 3], mask=[False, False], fill_value=999999)
>>> x.compress([1, 0, 1], axis=1) masked_array( data=[[1, 3], [, ], [7, 9]], mask=[[False, False], [ True, True], [False, False]], fill_value=999999)

compressed
()¶ Return all the nonmasked data as a 1D array.
Returns: data – A new ndarray holding the nonmasked data is returned. Return type: ndarray Notes
The result is not a MaskedArray!
Examples
>>> x = np.ma.array(np.arange(5), mask=[0]*2 + [1]*3) >>> x.compressed() array([0, 1]) >>> type(x.compressed()) <class 'numpy.ndarray'>

conj
()¶ Complexconjugate all elements.
Refer to numpy.conjugate for full documentation.
See also
numpy.conjugate()
 equivalent function

conjugate
()¶ Return the complex conjugate, elementwise.
Refer to numpy.conjugate for full documentation.
See also
numpy.conjugate()
 equivalent function

copy
(order='C')¶ Return a copy of the array.
Parameters: order ({'C', 'F', 'A', 'K'}, optional) – Controls the memory layout of the copy. ‘C’ means Corder, ‘F’ means Forder, ‘A’ means ‘F’ if a is Fortran contiguous, ‘C’ otherwise. ‘K’ means match the layout of a as closely as possible. (Note that this function and numpy.copy()
are very similar but have different default values for their order= arguments, and this function always passes subclasses through.)See also
numpy.copy()
 Similar function with different default behavior
numpy.copyto()
Notes
This function is the preferred method for creating an array copy. The function
numpy.copy()
is similar, but it defaults to using order ‘K’, and will not pass subclasses through by default.Examples
>>> x = np.array([[1,2,3],[4,5,6]], order='F')
>>> y = x.copy()
>>> x.fill(0)
>>> x array([[0, 0, 0], [0, 0, 0]])
>>> y array([[1, 2, 3], [4, 5, 6]])
>>> y.flags['C_CONTIGUOUS'] True

count
(axis=None, keepdims=<no value>)¶ Count the nonmasked elements of the array along the given axis.
Parameters:  axis (None or int or tuple of ints, optional) –
Axis or axes along which the count is performed. The default, None, performs the count over all the dimensions of the input array. axis may be negative, in which case it counts from the last to the first axis.
New in version 1.10.0.
If this is a tuple of ints, the count is performed on multiple axes, instead of a single axis or all the axes as before.
 keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the array.
Returns: result – An array with the same shape as the input array, with the specified axis removed. If the array is a 0d array, or if axis is None, a scalar is returned.
Return type: ndarray or scalar
See also
ma.count_masked()
 Count masked elements in array or along a given axis.
Examples
>>> import numpy.ma as ma >>> a = ma.arange(6).reshape((2, 3)) >>> a[1, :] = ma.masked >>> a masked_array( data=[[0, 1, 2], [, , ]], mask=[[False, False, False], [ True, True, True]], fill_value=999999) >>> a.count() 3
When the axis keyword is specified an array of appropriate size is returned.
>>> a.count(axis=0) array([1, 1, 1]) >>> a.count(axis=1) array([3, 0])
 axis (None or int or tuple of ints, optional) –

ctypes
¶ An object to simplify the interaction of the array with the ctypes module.
This attribute creates an object that makes it easier to use arrays when calling shared libraries with the ctypes module. The returned object has, among others, data, shape, and strides attributes (see Notes below) which themselves return ctypes objects that can be used as arguments to a shared library.
Parameters: None – Returns: c – Possessing attributes data, shape, strides, etc. Return type: Python object See also
numpy.ctypeslib
Notes
Below are the public attributes of this object which were documented in “Guide to NumPy” (we have omitted undocumented public attributes, as well as documented private attributes):

_ctypes.
data
A pointer to the memory area of the array as a Python integer. This memory area may contain data that is not aligned, or not in correct byteorder. The memory area may not even be writeable. The array flags and datatype of this array should be respected when passing this attribute to arbitrary Ccode to avoid trouble that can include Python crashing. User Beware! The value of this attribute is exactly the same as
self._array_interface_['data'][0]
.Note that unlike
data_as
, a reference will not be kept to the array: code likectypes.c_void_p((a + b).ctypes.data)
will result in a pointer to a deallocated array, and should be spelt(a + b).ctypes.data_as(ctypes.c_void_p)

_ctypes.
shape
A ctypes array of length self.ndim where the basetype is the Cinteger corresponding to
dtype('p')
on this platform. This basetype could be ctypes.c_int, ctypes.c_long, or ctypes.c_longlong depending on the platform. The c_intp type is defined accordingly in numpy.ctypeslib. The ctypes array contains the shape of the underlying array.Type: (c_intp*self.ndim)

_ctypes.
strides
A ctypes array of length self.ndim where the basetype is the same as for the shape attribute. This ctypes array contains the strides information from the underlying array. This strides information is important for showing how many bytes must be jumped to get to the next element in the array.
Type: (c_intp*self.ndim)

_ctypes.
data_as
(obj) Return the data pointer cast to a particular ctypes object. For example, calling
self._as_parameter_
is equivalent toself.data_as(ctypes.c_void_p)
. Perhaps you want to use the data as a pointer to a ctypes array of floatingpoint data:self.data_as(ctypes.POINTER(ctypes.c_double))
.The returned pointer will keep a reference to the array.

_ctypes.
shape_as
(obj) Return the shape tuple as an array of some other ctypes type. For example:
self.shape_as(ctypes.c_short)
.

_ctypes.
strides_as
(obj) Return the strides tuple as an array of some other ctypes type. For example:
self.strides_as(ctypes.c_longlong)
.
If the ctypes module is not available, then the ctypes attribute of array objects still returns something useful, but ctypes objects are not returned and errors may be raised instead. In particular, the object will still have the
as_parameter
attribute which will return an integer equal to the data attribute.Examples
>>> import ctypes >>> x = np.array([[0, 1], [2, 3]], dtype=np.int32) >>> x array([[0, 1], [2, 3]], dtype=int32) >>> x.ctypes.data 31962608 # may vary >>> x.ctypes.data_as(ctypes.POINTER(ctypes.c_uint32)) <__main__.LP_c_uint object at 0x7ff2fc1fc200> # may vary >>> x.ctypes.data_as(ctypes.POINTER(ctypes.c_uint32)).contents c_uint(0) >>> x.ctypes.data_as(ctypes.POINTER(ctypes.c_uint64)).contents c_ulong(4294967296) >>> x.ctypes.shape <numpy.core._internal.c_long_Array_2 object at 0x7ff2fc1fce60> # may vary >>> x.ctypes.strides <numpy.core._internal.c_long_Array_2 object at 0x7ff2fc1ff320> # may vary


cumprod
(axis=None, dtype=None, out=None)¶ Return the cumulative product of the array elements over the given axis.
Masked values are set to 1 internally during the computation. However, their position is saved, and the result will be masked at the same locations.
Refer to numpy.cumprod for full documentation.
Notes
The mask is lost if out is not a valid MaskedArray !
Arithmetic is modular when using integer types, and no error is raised on overflow.
See also
numpy.ndarray.cumprod()
 corresponding function for ndarrays
numpy.cumprod()
 equivalent function

cumsum
(axis=None, dtype=None, out=None)¶ Return the cumulative sum of the array elements over the given axis.
Masked values are set to 0 internally during the computation. However, their position is saved, and the result will be masked at the same locations.
Refer to numpy.cumsum for full documentation.
Notes
The mask is lost if out is not a valid
ma.MaskedArray
!Arithmetic is modular when using integer types, and no error is raised on overflow.
See also
numpy.ndarray.cumsum()
 corresponding function for ndarrays
numpy.cumsum()
 equivalent function
Examples
>>> marr = np.ma.array(np.arange(10), mask=[0,0,0,1,1,1,0,0,0,0]) >>> marr.cumsum() masked_array(data=[0, 1, 3, , , , 9, 16, 24, 33], mask=[False, False, False, True, True, True, False, False, False, False], fill_value=999999)

data
¶ Returns the underlying data, as a view of the masked array.
If the underlying data is a subclass of
numpy.ndarray
, it is returned as such.>>> x = np.ma.array(np.matrix([[1, 2], [3, 4]]), mask=[[0, 1], [1, 0]]) >>> x.data matrix([[1, 2], [3, 4]])
The type of the data can be accessed through the
baseclass
attribute.

diagonal
(offset=0, axis1=0, axis2=1)¶ Return specified diagonals. In NumPy 1.9 the returned array is a readonly view instead of a copy as in previous NumPy versions. In a future version the readonly restriction will be removed.
Refer to
numpy.diagonal()
for full documentation.See also
numpy.diagonal()
 equivalent function

dot
(b, out=None)¶ Masked dot product of two arrays. Note that out and strict are located in different positions than in ma.dot. In order to maintain compatibility with the functional version, it is recommended that the optional arguments be treated as keyword only. At some point that may be mandatory.
New in version 1.10.0.
Parameters:  b (masked_array_like) – Inputs array.
 out (masked_array, optional) – Output argument. This must have the exact kind that would be returned if it was not used. In particular, it must have the right type, must be Ccontiguous, and its dtype must be the dtype that would be returned for ma.dot(a,b). This is a performance feature. Therefore, if these conditions are not met, an exception is raised, instead of attempting to be flexible.
 strict (bool, optional) –
Whether masked data are propagated (True) or set to 0 (False) for the computation. Default is False. Propagating the mask means that if a masked value appears in a row or column, the whole row or column is considered masked.
New in version 1.10.2.
See also
numpy.ma.dot()
 equivalent function

dtype
¶ Datatype of the array’s elements.
Parameters: None – Returns: d Return type: numpy dtype object See also
numpy.dtype
Examples
>>> x array([[0, 1], [2, 3]]) >>> x.dtype dtype('int32') >>> type(x.dtype) <type 'numpy.dtype'>

dump
(file)¶ Dump a pickle of the array to the specified file. The array can be read back with pickle.load or numpy.load.
Parameters: file (str or Path) – A string naming the dump file.
Changed in version 1.17.0: pathlib.Path objects are now accepted.

dumps
()¶ Returns the pickle of the array as a string. pickle.loads or numpy.loads will convert the string back to an array.
Parameters: None –

fill
(value)¶ Fill the array with a scalar value.
Parameters: value (scalar) – All elements of a will be assigned this value. Examples
>>> a = np.array([1, 2]) >>> a.fill(0) >>> a array([0, 0]) >>> a = np.empty(2) >>> a.fill(1) >>> a array([1., 1.])

fill_value
¶ The filling value of the masked array is a scalar. When setting, None will set to a default based on the data type.
Examples
>>> for dt in [np.int32, np.int64, np.float64, np.complex128]: ... np.ma.array([0, 1], dtype=dt).get_fill_value() ... 999999 999999 1e+20 (1e+20+0j)
>>> x = np.ma.array([0, 1.], fill_value=np.inf) >>> x.fill_value inf >>> x.fill_value = np.pi >>> x.fill_value 3.1415926535897931 # may vary
Reset to default:
>>> x.fill_value = None >>> x.fill_value 1e+20

filled
(fill_value=None)¶ Return a copy of self, with masked values filled with a given value. However, if there are no masked values to fill, self will be returned instead as an ndarray.
Parameters: fill_value (array_like, optional) – The value to use for invalid entries. Can be scalar or nonscalar. If nonscalar, the resulting ndarray must be broadcastable over input array. Default is None, in which case, the fill_value attribute of the array is used instead. Returns: filled_array – A copy of self
with invalid entries replaced by fill_value (be it the function argument or the attribute ofself
), orself
itself as an ndarray if there are no invalid entries to be replaced.Return type: ndarray Notes
The result is not a MaskedArray!
Examples
>>> x = np.ma.array([1,2,3,4,5], mask=[0,0,1,0,1], fill_value=999) >>> x.filled() array([ 1, 2, 999, 4, 999]) >>> x.filled(fill_value=1000) array([ 1, 2, 1000, 4, 1000]) >>> type(x.filled()) <class 'numpy.ndarray'>
Subclassing is preserved. This means that if, e.g., the data part of the masked array is a recarray, filled returns a recarray:
>>> x = np.array([(1, 2), (3, 4)], dtype='i8,i8').view(np.recarray) >>> m = np.ma.array(x, mask=[(True, False), (False, True)]) >>> m.filled() rec.array([(999999, 2), ( 3, 999999)], dtype=[('f0', '<i8'), ('f1', '<i8')])

flags
¶ Information about the memory layout of the array.

C_CONTIGUOUS
(C)¶ The data is in a single, Cstyle contiguous segment.

F_CONTIGUOUS
(F)¶ The data is in a single, Fortranstyle contiguous segment.

OWNDATA
(O)¶ The array owns the memory it uses or borrows it from another object.

WRITEABLE
(W)¶ The data area can be written to. Setting this to False locks the data, making it readonly. A view (slice, etc.) inherits WRITEABLE from its base array at creation time, but a view of a writeable array may be subsequently locked while the base array remains writeable. (The opposite is not true, in that a view of a locked array may not be made writeable. However, currently, locking a base object does not lock any views that already reference it, so under that circumstance it is possible to alter the contents of a locked array via a previously created writeable view onto it.) Attempting to change a nonwriteable array raises a RuntimeError exception.

ALIGNED
(A)¶ The data and all elements are aligned appropriately for the hardware.

WRITEBACKIFCOPY
(X)¶ This array is a copy of some other array. The CAPI function PyArray_ResolveWritebackIfCopy must be called before deallocating to the base array will be updated with the contents of this array.

UPDATEIFCOPY
(U)¶ (Deprecated, use WRITEBACKIFCOPY) This array is a copy of some other array. When this array is deallocated, the base array will be updated with the contents of this array.

FNC
¶ F_CONTIGUOUS and not C_CONTIGUOUS.

FORC
¶ F_CONTIGUOUS or C_CONTIGUOUS (onesegment test).

BEHAVED
(B)¶ ALIGNED and WRITEABLE.

CARRAY
(CA)¶ BEHAVED and C_CONTIGUOUS.

FARRAY
(FA)¶ BEHAVED and F_CONTIGUOUS and not C_CONTIGUOUS.
Notes
The flags object can be accessed dictionarylike (as in
a.flags['WRITEABLE']
), or by using lowercased attribute names (as ina.flags.writeable
). Short flag names are only supported in dictionary access.Only the WRITEBACKIFCOPY, UPDATEIFCOPY, WRITEABLE, and ALIGNED flags can be changed by the user, via direct assignment to the attribute or dictionary entry, or by calling ndarray.setflags.
The array flags cannot be set arbitrarily:
 UPDATEIFCOPY can only be set
False
.  WRITEBACKIFCOPY can only be set
False
.  ALIGNED can only be set
True
if the data is truly aligned.  WRITEABLE can only be set
True
if the array owns its own memory or the ultimate owner of the memory exposes a writeable buffer interface or is a string.
Arrays can be both Cstyle and Fortranstyle contiguous simultaneously. This is clear for 1dimensional arrays, but can also be true for higher dimensional arrays.
Even for contiguous arrays a stride for a given dimension
arr.strides[dim]
may be arbitrary ifarr.shape[dim] == 1
or the array has no elements. It does not generally hold thatself.strides[1] == self.itemsize
for Cstyle contiguous arrays orself.strides[0] == self.itemsize
for Fortranstyle contiguous arrays is true.

flat
¶ Return a flat iterator, or set a flattened version of self to value.

flatten
(order='C')¶ Return a copy of the array collapsed into one dimension.
Parameters: order ({'C', 'F', 'A', 'K'}, optional) – ‘C’ means to flatten in rowmajor (Cstyle) order. ‘F’ means to flatten in columnmajor (Fortran style) order. ‘A’ means to flatten in columnmajor order if a is Fortran contiguous in memory, rowmajor order otherwise. ‘K’ means to flatten a in the order the elements occur in memory. The default is ‘C’. Returns: y – A copy of the input array, flattened to one dimension. Return type: ndarray Examples
>>> a = np.array([[1,2], [3,4]]) >>> a.flatten() array([1, 2, 3, 4]) >>> a.flatten('F') array([1, 3, 2, 4])

get_fill_value
()¶ The filling value of the masked array is a scalar. When setting, None will set to a default based on the data type.
Examples
>>> for dt in [np.int32, np.int64, np.float64, np.complex128]: ... np.ma.array([0, 1], dtype=dt).get_fill_value() ... 999999 999999 1e+20 (1e+20+0j)
>>> x = np.ma.array([0, 1.], fill_value=np.inf) >>> x.fill_value inf >>> x.fill_value = np.pi >>> x.fill_value 3.1415926535897931 # may vary
Reset to default:
>>> x.fill_value = None >>> x.fill_value 1e+20

get_imag
()¶ The imaginary part of the masked array.
This property is a view on the imaginary part of this MaskedArray.
See also
Examples
>>> x = np.ma.array([1+1.j, 2j, 3.45+1.6j], mask=[False, True, False]) >>> x.imag masked_array(data=[1.0, , 1.6], mask=[False, True, False], fill_value=1e+20)

get_real
()¶ The real part of the masked array.
This property is a view on the real part of this MaskedArray.
See also
Examples
>>> x = np.ma.array([1+1.j, 2j, 3.45+1.6j], mask=[False, True, False]) >>> x.real masked_array(data=[1.0, , 3.45], mask=[False, True, False], fill_value=1e+20)

getfield
(dtype, offset=0)¶ Returns a field of the given array as a certain type.
A field is a view of the array data with a given datatype. The values in the view are determined by the given type and the offset into the current array in bytes. The offset needs to be such that the view dtype fits in the array dtype; for example an array of dtype complex128 has 16byte elements. If taking a view with a 32bit integer (4 bytes), the offset needs to be between 0 and 12 bytes.
Parameters:  dtype (str or dtype) – The data type of the view. The dtype size of the view can not be larger than that of the array itself.
 offset (int) – Number of bytes to skip before beginning the element view.
Examples
>>> x = np.diag([1.+1.j]*2) >>> x[1, 1] = 2 + 4.j >>> x array([[1.+1.j, 0.+0.j], [0.+0.j, 2.+4.j]]) >>> x.getfield(np.float64) array([[1., 0.], [0., 2.]])
By choosing an offset of 8 bytes we can select the complex part of the array for our view:
>>> x.getfield(np.float64, offset=8) array([[1., 0.], [0., 4.]])

harden_mask
()¶ Force the mask to hard.
Whether the mask of a masked array is hard or soft is determined by its ~ma.MaskedArray.hardmask property. harden_mask sets ~ma.MaskedArray.hardmask to
True
.See also
ma.MaskedArray.hardmask()

hardmask
¶ Hardness of the mask

ids
()¶ Return the addresses of the data and mask areas.
Parameters: None – Examples
>>> x = np.ma.array([1, 2, 3], mask=[0, 1, 1]) >>> x.ids() (166670640, 166659832) # may vary
If the array has no mask, the address of nomask is returned. This address is typically not close to the data in memory:
>>> x = np.ma.array([1, 2, 3]) >>> x.ids() (166691080, 3083169284) # may vary

imag
¶ The imaginary part of the masked array.
This property is a view on the imaginary part of this MaskedArray.
See also
Examples
>>> x = np.ma.array([1+1.j, 2j, 3.45+1.6j], mask=[False, True, False]) >>> x.imag masked_array(data=[1.0, , 1.6], mask=[False, True, False], fill_value=1e+20)

iscontiguous
()¶ Return a boolean indicating whether the data is contiguous.
Parameters: None – Examples
>>> x = np.ma.array([1, 2, 3]) >>> x.iscontiguous() True
iscontiguous returns one of the flags of the masked array:
>>> x.flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False UPDATEIFCOPY : False

item
(*args)¶ Copy an element of an array to a standard Python scalar and return it.
Parameters: *args (Arguments (variable number and type)) –  none: in this case, the method only works for arrays with one element (a.size == 1), which element is copied into a standard Python scalar object and returned.
 int_type: this argument is interpreted as a flat index into the array, specifying which element to copy and return.
 tuple of int_types: functions as does a single int_type argument, except that the argument is interpreted as an ndindex into the array.
Returns: z – A copy of the specified element of the array as a suitable Python scalar Return type: Standard Python scalar object Notes
When the data type of a is longdouble or clongdouble, item() returns a scalar array object because there is no available Python scalar that would not lose information. Void arrays return a buffer object for item(), unless fields are defined, in which case a tuple is returned.
item is very similar to a[args], except, instead of an array scalar, a standard Python scalar is returned. This can be useful for speeding up access to elements of the array and doing arithmetic on elements of the array using Python’s optimized math.
Examples
>>> np.random.seed(123) >>> x = np.random.randint(9, size=(3, 3)) >>> x array([[2, 2, 6], [1, 3, 6], [1, 0, 1]]) >>> x.item(3) 1 >>> x.item(7) 0 >>> x.item((0, 1)) 2 >>> x.item((2, 2)) 1

itemset
(*args)¶ Insert scalar into an array (scalar is cast to array’s dtype, if possible)
There must be at least 1 argument, and define the last argument as item. Then,
a.itemset(*args)
is equivalent to but faster thana[args] = item
. The item should be a scalar value and args must select a single item in the array a.Parameters: *args (Arguments) – If one argument: a scalar, only used in case a is of size 1. If two arguments: the last argument is the value to be set and must be a scalar, the first argument specifies a single array element location. It is either an int or a tuple. Notes
Compared to indexing syntax, itemset provides some speed increase for placing a scalar into a particular location in an ndarray, if you must do this. However, generally this is discouraged: among other problems, it complicates the appearance of the code. Also, when using itemset (and item) inside a loop, be sure to assign the methods to a local variable to avoid the attribute lookup at each loop iteration.
Examples
>>> np.random.seed(123) >>> x = np.random.randint(9, size=(3, 3)) >>> x array([[2, 2, 6], [1, 3, 6], [1, 0, 1]]) >>> x.itemset(4, 0) >>> x.itemset((2, 2), 9) >>> x array([[2, 2, 6], [1, 0, 6], [1, 0, 9]])

itemsize
¶ Length of one array element in bytes.
Examples
>>> x = np.array([1,2,3], dtype=np.float64) >>> x.itemsize 8 >>> x = np.array([1,2,3], dtype=np.complex128) >>> x.itemsize 16

mask
¶ Current mask.

max
(axis=None, out=None, fill_value=None, keepdims=<no value>)¶ Return the maximum along a given axis.
Parameters:  axis ({None, int}, optional) – Axis along which to operate. By default,
axis
is None and the flattened input is used.  out (array_like, optional) – Alternative output array in which to place the result. Must be of the same shape and buffer length as the expected output.
 fill_value (scalar or None, optional) – Value used to fill in the masked values. If None, use the output of maximum_fill_value().
 keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the array.
Returns: amax – New array holding the result. If
out
was specified,out
is returned.Return type: array_like
See also
ma.maximum_fill_value()
 Returns the maximum filling value for a given datatype.
 axis ({None, int}, optional) – Axis along which to operate. By default,

mean
(axis=None, dtype=None, out=None, keepdims=<no value>)¶ Returns the average of the array elements along given axis.
Masked entries are ignored, and result elements which are not finite will be masked.
Refer to numpy.mean for full documentation.
See also
numpy.ndarray.mean()
 corresponding function for ndarrays
numpy.mean()
 Equivalent function
numpy.ma.average()
 Weighted average.
Examples
>>> a = np.ma.array([1,2,3], mask=[False, False, True]) >>> a masked_array(data=[1, 2, ], mask=[False, False, True], fill_value=999999) >>> a.mean() 1.5

min
(axis=None, out=None, fill_value=None, keepdims=<no value>)¶ Return the minimum along a given axis.
Parameters:  axis ({None, int}, optional) – Axis along which to operate. By default,
axis
is None and the flattened input is used.  out (array_like, optional) – Alternative output array in which to place the result. Must be of the same shape and buffer length as the expected output.
 fill_value (scalar or None, optional) – Value used to fill in the masked values. If None, use the output of minimum_fill_value.
 keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the array.
Returns: amin – New array holding the result. If
out
was specified,out
is returned.Return type: array_like
See also
ma.minimum_fill_value()
 Returns the minimum filling value for a given datatype.
 axis ({None, int}, optional) – Axis along which to operate. By default,

mini
(axis=None)¶ Return the array minimum along the specified axis.
Deprecated since version 1.13.0: This function is identical to both:
self.min(keepdims=True, axis=axis).squeeze(axis=axis)
np.ma.minimum.reduce(self, axis=axis)
Typically though,
self.min(axis=axis)
is sufficient.Parameters: axis (int, optional) – The axis along which to find the minima. Default is None, in which case the minimum value in the whole array is returned. Returns: min – If axis is None, the result is a scalar. Otherwise, if axis is given and the array is at least 2D, the result is a masked array with dimension one smaller than the array on which mini is called. Return type: scalar or MaskedArray Examples
>>> x = np.ma.array(np.arange(6), mask=[0 ,1, 0, 0, 0 ,1]).reshape(3, 2) >>> x masked_array( data=[[0, ], [2, 3], [4, ]], mask=[[False, True], [False, False], [False, True]], fill_value=999999) >>> x.mini() masked_array(data=0, mask=False, fill_value=999999) >>> x.mini(axis=0) masked_array(data=[0, 3], mask=[False, False], fill_value=999999) >>> x.mini(axis=1) masked_array(data=[0, 2, 4], mask=[False, False, False], fill_value=999999)
There is a small difference between mini and min:
>>> x[:,1].mini(axis=0) masked_array(data=3, mask=False, fill_value=999999) >>> x[:,1].min(axis=0) 3

nbytes
¶ Total bytes consumed by the elements of the array.
Notes
Does not include memory consumed by nonelement attributes of the array object.
Examples
>>> x = np.zeros((3,5,2), dtype=np.complex128) >>> x.nbytes 480 >>> np.prod(x.shape) * x.itemsize 480

ndim
¶ Number of array dimensions.
Examples
>>> x = np.array([1, 2, 3]) >>> x.ndim 1 >>> y = np.zeros((2, 3, 4)) >>> y.ndim 3

newbyteorder
(new_order='S', /)¶ Return the array with the same data viewed with a different byte order.
Equivalent to:
arr.view(arr.dtype.newbytorder(new_order))
Changes are also made in all fields and subarrays of the array data type.
Parameters: new_order (string, optional) – Byte order to force; a value from the byte order specifications below. new_order codes can be any of:
 ’S’  swap dtype from current to opposite endian
 {‘<’, ‘little’}  little endian
 {‘>’, ‘big’}  big endian
 ’=’  native order, equivalent to sys.byteorder
 {‘’, ‘I’}  ignore (no change to byte order)
The default value (‘S’) results in swapping the current byte order.
Returns: new_arr – New array object with the dtype reflecting given change to the byte order. Return type: array

nonzero
()¶ Return the indices of unmasked elements that are not zero.
Returns a tuple of arrays, one for each dimension, containing the indices of the nonzero elements in that dimension. The corresponding nonzero values can be obtained with:
a[a.nonzero()]
To group the indices by element, rather than dimension, use instead:
np.transpose(a.nonzero())
The result of this is always a 2d array, with a row for each nonzero element.
Parameters: None – Returns: tuple_of_arrays – Indices of elements that are nonzero. Return type: tuple See also
numpy.nonzero()
 Function operating on ndarrays.
flatnonzero()
 Return indices that are nonzero in the flattened version of the input array.
numpy.ndarray.nonzero()
 Equivalent ndarray method.
count_nonzero()
 Counts the number of nonzero elements in the input array.
Examples
>>> import numpy.ma as ma >>> x = ma.array(np.eye(3)) >>> x masked_array( data=[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]], mask=False, fill_value=1e+20) >>> x.nonzero() (array([0, 1, 2]), array([0, 1, 2]))
Masked elements are ignored.
>>> x[1, 1] = ma.masked >>> x masked_array( data=[[1.0, 0.0, 0.0], [0.0, , 0.0], [0.0, 0.0, 1.0]], mask=[[False, False, False], [False, True, False], [False, False, False]], fill_value=1e+20) >>> x.nonzero() (array([0, 2]), array([0, 2]))
Indices can also be grouped by element.
>>> np.transpose(x.nonzero()) array([[0, 0], [2, 2]])
A common use for
nonzero
is to find the indices of an array, where a condition is True. Given an array a, the condition a > 3 is a boolean array and since False is interpreted as 0, ma.nonzero(a > 3) yields the indices of the a where the condition is true.>>> a = ma.array([[1,2,3],[4,5,6],[7,8,9]]) >>> a > 3 masked_array( data=[[False, False, False], [ True, True, True], [ True, True, True]], mask=False, fill_value=True) >>> ma.nonzero(a > 3) (array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))
The
nonzero
method of the condition array can also be called.>>> (a > 3).nonzero() (array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))

partition
(kth, axis=1, kind='introselect', order=None)¶ Rearranges the elements in the array in such a way that the value of the element in kth position is in the position it would be in a sorted array. All elements smaller than the kth element are moved before this element and all equal or greater are moved behind it. The ordering of the elements in the two partitions is undefined.
New in version 1.8.0.
Parameters:  kth (int or sequence of ints) – Element index to partition by. The kth element value will be in its final sorted position and all smaller elements will be moved before it and all equal or greater elements behind it. The order of all elements in the partitions is undefined. If provided with a sequence of kth it will partition all elements indexed by kth of them into their sorted position at once.
 axis (int, optional) – Axis along which to sort. Default is 1, which means sort along the last axis.
 kind ({'introselect'}, optional) – Selection algorithm. Default is ‘introselect’.
 order (str or list of str, optional) – When a is an array with fields defined, this argument specifies which fields to compare first, second, etc. A single field can be specified as a string, and not all fields need to be specified, but unspecified fields will still be used, in the order in which they come up in the dtype, to break ties.
See also
numpy.partition()
 Return a parititioned copy of an array.
argpartition()
 Indirect partition.
sort()
 Full sort.
Notes
See
np.partition
for notes on the different algorithms.Examples
>>> a = np.array([3, 4, 2, 1]) >>> a.partition(3) >>> a array([2, 1, 3, 4])
>>> a.partition((1, 3)) >>> a array([1, 2, 3, 4])

prod
(axis=None, dtype=None, out=None, keepdims=<no value>)¶ Return the product of the array elements over the given axis.
Masked elements are set to 1 internally for computation.
Refer to numpy.prod for full documentation.
Notes
Arithmetic is modular when using integer types, and no error is raised on overflow.
See also
numpy.ndarray.prod()
 corresponding function for ndarrays
numpy.prod()
 equivalent function

product
(axis=None, dtype=None, out=None, keepdims=<no value>)¶ Return the product of the array elements over the given axis.
Masked elements are set to 1 internally for computation.
Refer to numpy.prod for full documentation.
Notes
Arithmetic is modular when using integer types, and no error is raised on overflow.
See also
numpy.ndarray.prod()
 corresponding function for ndarrays
numpy.prod()
 equivalent function

ptp
(axis=None, out=None, fill_value=None, keepdims=False)¶ Return (maximum  minimum) along the given dimension (i.e. peaktopeak value).
Warning
ptp preserves the data type of the array. This means the return value for an input of signed integers with n bits (e.g. np.int8, np.int16, etc) is also a signed integer with n bits. In that case, peaktopeak values greater than
2**(n1)1
will be returned as negative values. An example with a workaround is shown below.Parameters:  axis ({None, int}, optional) – Axis along which to find the peaks. If None (default) the flattened array is used.
 out ({None, array_like}, optional) – Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output but the type will be cast if necessary.
 fill_value (scalar or None, optional) – Value used to fill in the masked values.
 keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the array.
Returns: ptp – A new array holding the result, unless
out
was specified, in which case a reference toout
is returned.Return type: ndarray.
Examples
>>> x = np.ma.MaskedArray([[4, 9, 2, 10], ... [6, 9, 7, 12]])
>>> x.ptp(axis=1) masked_array(data=[8, 6], mask=False, fill_value=999999)
>>> x.ptp(axis=0) masked_array(data=[2, 0, 5, 2], mask=False, fill_value=999999)
>>> x.ptp() 10
This example shows that a negative value can be returned when the input is an array of signed integers.
>>> y = np.ma.MaskedArray([[1, 127], ... [0, 127], ... [1, 127], ... [2, 127]], dtype=np.int8) >>> y.ptp(axis=1) masked_array(data=[ 126, 127, 128, 127], mask=False, fill_value=999999, dtype=int8)
A workaround is to use the view() method to view the result as unsigned integers with the same bit width:
>>> y.ptp(axis=1).view(np.uint8) masked_array(data=[126, 127, 128, 129], mask=False, fill_value=999999, dtype=uint8)

put
(indices, values, mode='raise')¶ Set storageindexed locations to corresponding values.
Sets self._data.flat[n] = values[n] for each n in indices. If values is shorter than indices then it will repeat. If values has some masked values, the initial mask is updated in consequence, else the corresponding values are unmasked.
Parameters:  indices (1D array_like) – Target indices, interpreted as integers.
 values (array_like) – Values to place in self._data copy at target indices.
 mode ({'raise', 'wrap', 'clip'}, optional) – Specifies how outofbounds indices will behave. ‘raise’ : raise an error. ‘wrap’ : wrap around. ‘clip’ : clip to the range.
Notes
values can be a scalar or length 1 array.
Examples
>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4) >>> x masked_array( data=[[1, , 3], [, 5, ], [7, , 9]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999) >>> x.put([0,4,8],[10,20,30]) >>> x masked_array( data=[[10, , 3], [, 20, ], [7, , 30]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999)
>>> x.put(4,999) >>> x masked_array( data=[[10, , 3], [, 999, ], [7, , 30]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999)

ravel
(order='C')¶ Returns a 1D version of self, as a view.
Parameters: order ({'C', 'F', 'A', 'K'}, optional) – The elements of a are read using this index order. ‘C’ means to index the elements in Clike order, with the last axis index changing fastest, back to the first axis index changing slowest. ‘F’ means to index the elements in Fortranlike index order, with the first index changing fastest, and the last index changing slowest. Note that the ‘C’ and ‘F’ options take no account of the memory layout of the underlying array, and only refer to the order of axis indexing. ‘A’ means to read the elements in Fortranlike index order if m is Fortran contiguous in memory, Clike order otherwise. ‘K’ means to read the elements in the order they occur in memory, except for reversing the data when strides are negative. By default, ‘C’ index order is used. Returns: Output view is of shape (self.size,)
(or(np.ma.product(self.shape),)
).Return type: MaskedArray Examples
>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4) >>> x masked_array( data=[[1, , 3], [, 5, ], [7, , 9]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999) >>> x.ravel() masked_array(data=[1, , 3, , 5, , 7, , 9], mask=[False, True, False, True, False, True, False, True, False], fill_value=999999)

real
¶ The real part of the masked array.
This property is a view on the real part of this MaskedArray.
See also
Examples
>>> x = np.ma.array([1+1.j, 2j, 3.45+1.6j], mask=[False, True, False]) >>> x.real masked_array(data=[1.0, , 3.45], mask=[False, True, False], fill_value=1e+20)

recordmask
¶ Get or set the mask of the array if it has no named fields. For structured arrays, returns a ndarray of booleans where entries are
True
if all the fields are masked,False
otherwise:>>> x = np.ma.array([(1, 1), (2, 2), (3, 3), (4, 4), (5, 5)], ... mask=[(0, 0), (1, 0), (1, 1), (0, 1), (0, 0)], ... dtype=[('a', int), ('b', int)]) >>> x.recordmask array([False, False, True, False, False])

repeat
(repeats, axis=None)¶ Repeat elements of an array.
Refer to numpy.repeat for full documentation.
See also
numpy.repeat()
 equivalent function

reshape
(*s, **kwargs)¶ Give a new shape to the array without changing its data.
Returns a masked array containing the same data, but with a new shape. The result is a view on the original array; if this is not possible, a ValueError is raised.
Parameters:  shape (int or tuple of ints) – The new shape should be compatible with the original shape. If an integer is supplied, then the result will be a 1D array of that length.
 order ({'C', 'F'}, optional) – Determines whether the array data should be viewed as in C (rowmajor) or FORTRAN (columnmajor) order.
Returns: reshaped_array – A new view on the array.
Return type: array
See also
reshape()
 Equivalent function in the masked array module.
numpy.ndarray.reshape()
 Equivalent method on ndarray object.
numpy.reshape()
 Equivalent function in the NumPy module.
Notes
The reshaping operation cannot guarantee that a copy will not be made, to modify the shape in place, use
a.shape = s
Examples
>>> x = np.ma.array([[1,2],[3,4]], mask=[1,0,0,1]) >>> x masked_array( data=[[, 2], [3, ]], mask=[[ True, False], [False, True]], fill_value=999999) >>> x = x.reshape((4,1)) >>> x masked_array( data=[[], [2], [3], []], mask=[[ True], [False], [False], [ True]], fill_value=999999)

resize
(newshape, refcheck=True, order=False)¶ Warning
This method does nothing, except raise a ValueError exception. A masked array does not own its data and therefore cannot safely be resized in place. Use the numpy.ma.resize function instead.
This method is difficult to implement safely and may be deprecated in future releases of NumPy.

round
(decimals=0, out=None)¶ Return each element rounded to the given number of decimals.
Refer to numpy.around for full documentation.
See also
numpy.ndarray.round()
 corresponding function for ndarrays
numpy.around()
 equivalent function

searchsorted
(v, side='left', sorter=None)¶ Find indices where elements of v should be inserted in a to maintain order.
For full documentation, see numpy.searchsorted
See also
numpy.searchsorted()
 equivalent function

setfield
(val, dtype, offset=0)¶ Put a value into a specified place in a field defined by a datatype.
Place val into a’s field defined by dtype and beginning offset bytes into the field.
Parameters:  val (object) – Value to be placed in field.
 dtype (dtype object) – Datatype of the field in which to place val.
 offset (int, optional) – The number of bytes into the field at which to place val.
Returns: Return type: None
See also
Examples
>>> x = np.eye(3) >>> x.getfield(np.float64) array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]) >>> x.setfield(3, np.int32) >>> x.getfield(np.int32) array([[3, 3, 3], [3, 3, 3], [3, 3, 3]], dtype=int32) >>> x array([[1.0e+000, 1.5e323, 1.5e323], [1.5e323, 1.0e+000, 1.5e323], [1.5e323, 1.5e323, 1.0e+000]]) >>> x.setfield(np.eye(3), np.int32) >>> x array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])

setflags
(write=None, align=None, uic=None)¶ Set array flags WRITEABLE, ALIGNED, (WRITEBACKIFCOPY and UPDATEIFCOPY), respectively.
These Booleanvalued flags affect how numpy interprets the memory area used by a (see Notes below). The ALIGNED flag can only be set to True if the data is actually aligned according to the type. The WRITEBACKIFCOPY and (deprecated) UPDATEIFCOPY flags can never be set to True. The flag WRITEABLE can only be set to True if the array owns its own memory, or the ultimate owner of the memory exposes a writeable buffer interface, or is a string. (The exception for string is made so that unpickling can be done without copying memory.)
Parameters:  write (bool, optional) – Describes whether or not a can be written to.
 align (bool, optional) – Describes whether or not a is aligned properly for its type.
 uic (bool, optional) – Describes whether or not a is a copy of another “base” array.
Notes
Array flags provide information about how the memory area used for the array is to be interpreted. There are 7 Boolean flags in use, only four of which can be changed by the user: WRITEBACKIFCOPY, UPDATEIFCOPY, WRITEABLE, and ALIGNED.
WRITEABLE (W) the data area can be written to;
ALIGNED (A) the data and strides are aligned appropriately for the hardware (as determined by the compiler);
UPDATEIFCOPY (U) (deprecated), replaced by WRITEBACKIFCOPY;
WRITEBACKIFCOPY (X) this array is a copy of some other array (referenced by .base). When the CAPI function PyArray_ResolveWritebackIfCopy is called, the base array will be updated with the contents of this array.
All flags can be accessed using the single (upper case) letter as well as the full name.
Examples
>>> y = np.array([[3, 1, 7], ... [2, 0, 0], ... [8, 5, 9]]) >>> y array([[3, 1, 7], [2, 0, 0], [8, 5, 9]]) >>> y.flags C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False UPDATEIFCOPY : False >>> y.setflags(write=0, align=0) >>> y.flags C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : False ALIGNED : False WRITEBACKIFCOPY : False UPDATEIFCOPY : False >>> y.setflags(uic=1) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: cannot set WRITEBACKIFCOPY flag to True

shape
¶ Tuple of array dimensions.
The shape property is usually used to get the current shape of an array, but may also be used to reshape the array inplace by assigning a tuple of array dimensions to it. As with numpy.reshape, one of the new shape dimensions can be 1, in which case its value is inferred from the size of the array and the remaining dimensions. Reshaping an array inplace will fail if a copy is required.
Examples
>>> x = np.array([1, 2, 3, 4]) >>> x.shape (4,) >>> y = np.zeros((2, 3, 4)) >>> y.shape (2, 3, 4) >>> y.shape = (3, 8) >>> y array([[ 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0.]]) >>> y.shape = (3, 6) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: total size of new array must be unchanged >>> np.zeros((4,2))[::2].shape = (1,) Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: Incompatible shape for inplace modification. Use `.reshape()` to make a copy with the desired shape.
See also
numpy.reshape
 similar function
ndarray.reshape
 similar method
Share status of the mask (readonly).

shrink_mask
()¶ Reduce a mask to nomask when possible.
Parameters: None – Returns: Return type: None Examples
>>> x = np.ma.array([[1,2 ], [3, 4]], mask=[0]*4) >>> x.mask array([[False, False], [False, False]]) >>> x.shrink_mask() masked_array( data=[[1, 2], [3, 4]], mask=False, fill_value=999999) >>> x.mask False

size
¶ Number of elements in the array.
Equal to
np.prod(a.shape)
, i.e., the product of the array’s dimensions.Notes
a.size returns a standard arbitrary precision Python integer. This may not be the case with other methods of obtaining the same value (like the suggested
np.prod(a.shape)
, which returns an instance ofnp.int_
), and may be relevant if the value is used further in calculations that may overflow a fixed size integer type.Examples
>>> x = np.zeros((3, 5, 2), dtype=np.complex128) >>> x.size 30 >>> np.prod(x.shape) 30

soften_mask
()¶ Force the mask to soft.
Whether the mask of a masked array is hard or soft is determined by its ~ma.MaskedArray.hardmask property. soften_mask sets ~ma.MaskedArray.hardmask to
False
.See also
ma.MaskedArray.hardmask()

sort
(axis=1, kind=None, order=None, endwith=True, fill_value=None)¶ Sort the array, inplace
Parameters:  a (array_like) – Array to be sorted.
 axis (int, optional) – Axis along which to sort. If None, the array is flattened before sorting. The default is 1, which sorts along the last axis.
 kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, optional) – The sorting algorithm used.
 order (list, optional) – When a is a structured array, this argument specifies which fields to compare first, second, and so on. This list does not need to include all of the fields.
 endwith ({True, False}, optional) – Whether missing values (if any) should be treated as the largest values (True) or the smallest values (False) When the array contains unmasked values sorting at the same extremes of the datatype, the ordering of these values and the masked values is undefined.
 fill_value (scalar or None, optional) – Value used internally for the masked values.
If
fill_value
is not None, it supersedesendwith
.
Returns: sorted_array – Array of the same type and shape as a.
Return type: ndarray
See also
numpy.ndarray.sort()
 Method to sort an array inplace.
argsort()
 Indirect sort.
lexsort()
 Indirect stable sort on multiple keys.
searchsorted()
 Find elements in a sorted array.
Notes
See
sort
for notes on the different sorting algorithms.Examples
>>> a = np.ma.array([1, 2, 5, 4, 3],mask=[0, 1, 0, 1, 0]) >>> # Default >>> a.sort() >>> a masked_array(data=[1, 3, 5, , ], mask=[False, False, False, True, True], fill_value=999999)
>>> a = np.ma.array([1, 2, 5, 4, 3],mask=[0, 1, 0, 1, 0]) >>> # Put missing values in the front >>> a.sort(endwith=False) >>> a masked_array(data=[, , 1, 3, 5], mask=[ True, True, False, False, False], fill_value=999999)
>>> a = np.ma.array([1, 2, 5, 4, 3],mask=[0, 1, 0, 1, 0]) >>> # fill_value takes over endwith >>> a.sort(endwith=False, fill_value=3) >>> a masked_array(data=[1, , , 3, 5], mask=[False, True, True, False, False], fill_value=999999)

squeeze
(axis=None)¶ Remove axes of length one from a.
Refer to numpy.squeeze for full documentation.
See also
numpy.squeeze()
 equivalent function

std
(axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)¶ Returns the standard deviation of the array elements along given axis.
Masked entries are ignored.
Refer to numpy.std for full documentation.
See also
numpy.ndarray.std()
 corresponding function for ndarrays
numpy.std()
 Equivalent function

strides
¶ Tuple of bytes to step in each dimension when traversing an array.
The byte offset of element
(i[0], i[1], ..., i[n])
in an array a is:offset = sum(np.array(i) * a.strides)
A more detailed explanation of strides can be found in the “ndarray.rst” file in the NumPy reference guide.
Notes
Imagine an array of 32bit integers (each 4 bytes):
x = np.array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]], dtype=np.int32)
This array is stored in memory as 40 bytes, one after the other (known as a contiguous block of memory). The strides of an array tell us how many bytes we have to skip in memory to move to the next position along a certain axis. For example, we have to skip 4 bytes (1 value) to move to the next column, but 20 bytes (5 values) to get to the same position in the next row. As such, the strides for the array x will be
(20, 4)
.See also
numpy.lib.stride_tricks.as_strided
Examples
>>> y = np.reshape(np.arange(2*3*4), (2,3,4)) >>> y array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]) >>> y.strides (48, 16, 4) >>> y[1,1,1] 17 >>> offset=sum(y.strides * np.array((1,1,1))) >>> offset/y.itemsize 17
>>> x = np.reshape(np.arange(5*6*7*8), (5,6,7,8)).transpose(2,3,1,0) >>> x.strides (32, 4, 224, 1344) >>> i = np.array([3,5,2,2]) >>> offset = sum(i * x.strides) >>> x[3,5,2,2] 813 >>> offset / x.itemsize 813

sum
(axis=None, dtype=None, out=None, keepdims=<no value>)¶ Return the sum of the array elements over the given axis.
Masked elements are set to 0 internally.
Refer to numpy.sum for full documentation.
See also
numpy.ndarray.sum()
 corresponding function for ndarrays
numpy.sum()
 equivalent function
Examples
>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4) >>> x masked_array( data=[[1, , 3], [, 5, ], [7, , 9]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999) >>> x.sum() 25 >>> x.sum(axis=1) masked_array(data=[4, 5, 16], mask=[False, False, False], fill_value=999999) >>> x.sum(axis=0) masked_array(data=[8, 5, 12], mask=[False, False, False], fill_value=999999) >>> print(type(x.sum(axis=0, dtype=np.int64)[0])) <class 'numpy.int64'>

swapaxes
(axis1, axis2)¶ Return a view of the array with axis1 and axis2 interchanged.
Refer to numpy.swapaxes for full documentation.
See also
numpy.swapaxes()
 equivalent function

take
(indices, axis=None, out=None, mode='raise')¶

tobytes
(fill_value=None, order='C')¶ Return the array data as a string containing the raw bytes in the array.
The array is filled with a fill value before the string conversion.
New in version 1.9.0.
Parameters:  fill_value (scalar, optional) – Value used to fill in the masked values. Default is None, in which case MaskedArray.fill_value is used.
 order ({'C','F','A'}, optional) –
Order of the data item in the copy. Default is ‘C’.
 ’C’ – C order (row major).
 ’F’ – Fortran order (column major).
 ’A’ – Any, current order of array.
 None – Same as ‘A’.
Notes
As for ndarray.tobytes, information about the shape, dtype, etc., but also about fill_value, will be lost.
Examples
>>> x = np.ma.array(np.array([[1, 2], [3, 4]]), mask=[[0, 1], [1, 0]]) >>> x.tobytes() b'\x01\x00\x00\x00\x00\x00\x00\x00?B\x0f\x00\x00\x00\x00\x00?B\x0f\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00'

tofile
(fid, sep='', format='%s')¶ Save a masked array to a file in binary format.
Warning
This function is not implemented yet.
Raises: NotImplementedError
– When tofile is called.

toflex
()¶ Transforms a masked array into a flexibletype array.
The flexible type array that is returned will have two fields:
 the
_data
field stores the_data
part of the array.  the
_mask
field stores the_mask
part of the array.
Parameters: None – Returns: record – A new flexibletype ndarray with two fields: the first element containing a value, the second element containing the corresponding mask boolean. The returned record shape matches self.shape. Return type: ndarray Notes
A sideeffect of transforming a masked array into a flexible ndarray is that meta information (
fill_value
, …) will be lost.Examples
>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4) >>> x masked_array( data=[[1, , 3], [, 5, ], [7, , 9]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999) >>> x.toflex() array([[(1, False), (2, True), (3, False)], [(4, True), (5, False), (6, True)], [(7, False), (8, True), (9, False)]], dtype=[('_data', '<i8'), ('_mask', '?')])
 the

tolist
(fill_value=None)¶ Return the data portion of the masked array as a hierarchical Python list.
Data items are converted to the nearest compatible Python type. Masked values are converted to fill_value. If fill_value is None, the corresponding entries in the output list will be
None
.Parameters: fill_value (scalar, optional) – The value to use for invalid entries. Default is None. Returns: result – The Python list representation of the masked array. Return type: list Examples
>>> x = np.ma.array([[1,2,3], [4,5,6], [7,8,9]], mask=[0] + [1,0]*4) >>> x.tolist() [[1, None, 3], [None, 5, None], [7, None, 9]] >>> x.tolist(999) [[1, 999, 3], [999, 5, 999], [7, 999, 9]]

torecords
()¶ Transforms a masked array into a flexibletype array.
The flexible type array that is returned will have two fields:
 the
_data
field stores the_data
part of the array.  the
_mask
field stores the_mask
part of the array.
Parameters: None – Returns: record – A new flexibletype ndarray with two fields: the first element containing a value, the second element containing the corresponding mask boolean. The returned record shape matches self.shape. Return type: ndarray Notes
A sideeffect of transforming a masked array into a flexible ndarray is that meta information (
fill_value
, …) will be lost.Examples
>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4) >>> x masked_array( data=[[1, , 3], [, 5, ], [7, , 9]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999) >>> x.toflex() array([[(1, False), (2, True), (3, False)], [(4, True), (5, False), (6, True)], [(7, False), (8, True), (9, False)]], dtype=[('_data', '<i8'), ('_mask', '?')])
 the

tostring
(fill_value=None, order='C')¶ A compatibility alias for tobytes, with exactly the same behavior.
Despite its name, it returns bytes not strs.
Deprecated since version 1.19.0.

trace
(offset=0, axis1=0, axis2=1, dtype=None, out=None)¶ Return the sum along diagonals of the array.
Refer to numpy.trace for full documentation.
See also
numpy.trace()
 equivalent function

transpose
(*axes)¶ Returns a view of the array with axes transposed.
For a 1D array this has no effect, as a transposed vector is simply the same vector. To convert a 1D array into a 2D column vector, an additional dimension must be added. np.atleast2d(a).T achieves this, as does a[:, np.newaxis]. For a 2D array, this is a standard matrix transpose. For an nD array, if axes are given, their order indicates how the axes are permuted (see Examples). If axes are not provided and
a.shape = (i[0], i[1], ... i[n2], i[n1])
, thena.transpose().shape = (i[n1], i[n2], ... i[1], i[0])
.Parameters: axes (None, tuple of ints, or n ints) –  None or no argument: reverses the order of the axes.
 tuple of ints: i in the jth place in the tuple means a’s ith axis becomes a.transpose()’s jth axis.
 n ints: same as an ntuple of the same ints (this form is intended simply as a “convenience” alternative to the tuple form)
Returns: out – View of a, with axes suitably permuted. Return type: ndarray See also
transpose()
 Equivalent function
ndarray.T()
 Array property returning the array transposed.
ndarray.reshape()
 Give a new shape to an array without changing its data.
Examples
>>> a = np.array([[1, 2], [3, 4]]) >>> a array([[1, 2], [3, 4]]) >>> a.transpose() array([[1, 3], [2, 4]]) >>> a.transpose((1, 0)) array([[1, 3], [2, 4]]) >>> a.transpose(1, 0) array([[1, 3], [2, 4]])
Copy the mask and set the sharedmask flag to False.
Whether the mask is shared between masked arrays can be seen from the sharedmask property. unshare_mask ensures the mask is not shared. A copy of the mask is only made if it was shared.
See also

var
(axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)¶ Compute the variance along the specified axis.
Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.
Parameters:  a (array_like) – Array containing numbers whose variance is desired. If a is not an array, a conversion is attempted.
 axis (None or int or tuple of ints, optional) –
Axis or axes along which the variance is computed. The default is to compute the variance of the flattened array.
New in version 1.7.0.
If this is a tuple of ints, a variance is performed over multiple axes, instead of a single axis or all the axes as before.
 dtype (datatype, optional) – Type to use in computing the variance. For arrays of integer type the default is float64; for arrays of float types it is the same as the array type.
 out (ndarray, optional) – Alternate output array in which to place the result. It must have the same shape as the expected output, but the type is cast if necessary.
 ddof (int, optional) – “Delta Degrees of Freedom”: the divisor used in the calculation is
N  ddof
, whereN
represents the number of elements. By default ddof is zero.  keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
If the default value is passed, then keepdims will not be passed through to the var method of subclasses of ndarray, however any nondefault value will be. If the subclass’ method does not implement keepdims any exceptions will be raised.
 where (array_like of bool, optional) –
Elements to include in the variance. See ~numpy.ufunc.reduce for details.
New in version 1.20.0.
Returns: variance – If
out=None
, returns a new array containing the variance; otherwise, a reference to the output array is returned.Return type: ndarray, see dtype parameter above
Notes
The variance is the average of the squared deviations from the mean, i.e.,
var = mean(x)
, wherex = abs(a  a.mean())**2
.The mean is typically calculated as
x.sum() / N
, whereN = len(x)
. If, however, ddof is specified, the divisorN  ddof
is used instead. In standard statistical practice,ddof=1
provides an unbiased estimator of the variance of a hypothetical infinite population.ddof=0
provides a maximum likelihood estimate of the variance for normally distributed variables.Note that for complex numbers, the absolute value is taken before squaring, so that the result is always real and nonnegative.
For floatingpoint input, the variance is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higheraccuracy accumulator using the
dtype
keyword can alleviate this issue.Examples
>>> a = np.array([[1, 2], [3, 4]]) >>> np.var(a) 1.25 >>> np.var(a, axis=0) array([1., 1.]) >>> np.var(a, axis=1) array([0.25, 0.25])
In single precision, var() can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32) >>> a[0, :] = 1.0 >>> a[1, :] = 0.1 >>> np.var(a) 0.20250003
Computing the variance in float64 is more accurate:
>>> np.var(a, dtype=np.float64) 0.20249999932944759 # may vary >>> ((10.55)**2 + (0.10.55)**2)/2 0.2025
Specifying a where argument:
>>> a = np.array([[14, 8, 11, 10], [7, 9, 10, 11], [10, 15, 5, 10]]) >>> np.var(a) 6.833333333333333 # may vary >>> np.var(a, where=[[True], [True], [False]]) 4.0

view
(dtype=None, type=None, fill_value=None)¶ Return a view of the MaskedArray data.
Parameters:  dtype (datatype or ndarray subclass, optional) – Datatype descriptor of the returned view, e.g., float32 or int16.
The default, None, results in the view having the same datatype
as a. As with
ndarray.view
, dtype can also be specified as an ndarray subclass, which then specifies the type of the returned object (this is equivalent to setting thetype
parameter).  type (Python type, optional) – Type of the returned view, either ndarray or a subclass. The default None results in type preservation.
 fill_value (scalar, optional) – The value to use for invalid entries (None by default). If None, then this argument is inferred from the passed dtype, or in its absence the original array, as discussed in the notes below.
See also
numpy.ndarray.view()
 Equivalent method on ndarray object.
Notes
a.view()
is used two different ways:a.view(some_dtype)
ora.view(dtype=some_dtype)
constructs a view of the array’s memory with a different datatype. This can cause a reinterpretation of the bytes of memory.a.view(ndarray_subclass)
ora.view(type=ndarray_subclass)
just returns an instance of ndarray_subclass that looks at the same array (same shape, dtype, etc.) This does not cause a reinterpretation of the memory.If fill_value is not specified, but dtype is specified (and is not an ndarray subclass), the fill_value of the MaskedArray will be reset. If neither fill_value nor dtype are specified (or if dtype is an ndarray subclass), then the fill value is preserved. Finally, if fill_value is specified, but dtype is not, the fill value is set to the specified value.
For
a.view(some_dtype)
, ifsome_dtype
has a different number of bytes per entry than the previous dtype (for example, converting a regular array to a structured array), then the behavior of the view cannot be predicted just from the superficial appearance ofa
(shown byprint(a)
). It also depends on exactly howa
is stored in memory. Therefore ifa
is Cordered versus fortranordered, versus defined as a slice or transpose, etc., the view may give different results. dtype (datatype or ndarray subclass, optional) – Datatype descriptor of the returned view, e.g., float32 or int16.
The default, None, results in the view having the same datatype
as a. As with


class
fanc.matrix.
RegionMatrixContainer
¶ Bases:
fanc.matrix.RegionPairsContainer
,fanc.regions.RegionBasedWithBins
Class representing matrices where pixels correspond to genomic region pairs.
This is the common interface for all matrixbased classes, such as
Hic
orFoldChangeMatrix
. It provides access to specialised matrix methods, most importantlymatrix()
, which assemblesnumpy
arrays from the list of pairwise contacts stored in each object.It inherits all region methods from
RegionBased
, and all edge/contact methods fromRegionPairsContainer
. You can use the same type of keys formatrix()
that you would use foredges()
, and additionally have the option to retrieve the observed/expected matrix.import fanc hic = fanc.load("output/hic/binned/fanc_example_1mb.hic") # get the wholegenome matrix m = hic.matrix() type(m) # fanc.matrix.RegionMatrix isinstance(m, np.ndarray) # True m.shape # 139, 139 # get just the chromosome 18 intrachromosomal matrix m = hic.matrix(('chr18', 'chr18')) m.shape # 79, 79 # get all rows of the wholegenome matrix # corresponding to chromosome 18 m = hic.matrix('chr18') m.shape # 79, 139 # get unnormalised chromosome 18 matrix m = hic.matrix(('chr18', 'chr18'), norm=False) # get chromosome 18 O/E matrix m = hic.matrix(('chr18', 'chr18'), oe=True) # get log2transformed chromosome 18 O/E matrix m = hic.matrix(('chr18', 'chr18'), oe=True, log=True)

add_contact
(contact, *args, **kwargs)¶ Alias for
add_edge()
Parameters:  contact –
Edge
 args – Positional arguments passed to
_add_edge()
 kwargs – Keyword arguments passed to
_add_edge()
 contact –

add_contacts
(contacts, *args, **kwargs)¶ Alias for
add_edges()

add_edge
(edge, check_nodes_exist=True, *args, **kwargs)¶ Add an edge / contact between two regions to this object.
Parameters:  edge –
Edge
, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).  check_nodes_exist – Make sure that there are nodes that match source and sink indexes
 args – Positional arguments passed to
_add_edge()
 kwargs – Keyword arguments passed to
_add_edge()
 edge –

add_edge_from_dict
(edge, *args, **kwargs)¶ Direct method to add an edge from dict input.
Parameters: edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes

add_edge_from_edge
(edge, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.Parameters: edge – Edge

add_edge_from_list
(edge, *args, **kwargs)¶ Direct method to add an edge from list or tuple input.
Parameters: edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)

add_edge_simple
(source, sink, weight=None, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.Parameters:  source – Source region index
 sink – Sink region index
 weight – Weight of the edge

add_edges
(edges, *args, **kwargs)¶ Bulkadd edges from a list.
List items can be any of the supported edge types, list, tuple, dict, or
Edge
. Repeatedly callsadd_edge()
, so may be inefficient for large amounts of data.Parameters: edges – List (or iterator) of edges. See add_edge()
for details

add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
Parameters: region – Can be a GenomicRegion
, a str in the form ‘<chromosome>:<start><end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).

static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
Parameters:  intervals – iterator of tuples (start, end, score)
 bins – Number of bins to divide the region into
 interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)

static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
Parameters:  intervals – iterator of tuples (start, end, score)
 bin_size – Size of each bin in base pairs
 interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)

bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
Returns: int

binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.Parameters:  region – String or class:~GenomicRegion object denoting the region to be binned
 bins – Number of bins to divide the region into
 bin_size – Size of each bin (alternative to bins argument)
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
 args – Arguments passed to _region_intervals
 kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of
GenomicRegion
objects

bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
Parameters: bins – float, fraction of bins Returns: int, base pairs

chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is rangecompatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.

chromosomes
()¶ Get a list of chromosome names.

distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
Parameters: distance – distance in base pairs Returns: float, distance as fraction of bin size

edge_data
(attribute, *args, **kwargs)¶ Iterate over specific edge attribute.
Parameters: Returns: iterator over edge attribute

edge_subset
(key=None, *args, **kwargs)¶ Get a subset of edges.
This is an alias for
edges()
.Returns: generator ( Edge
)

edges
¶ Iterate over contacts / edges.
edges()
is the central function ofRegionPairsContainer
. Here, we will use theHic
implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementingRegionPairsContainer
, includingJuicerHic
andCoolerHic
.import fanc # file from FANC examples hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")
We can easily find the number of edges in the sample
Hic
object:len(hic.edges) # 8695
When used in an iterator context,
edges()
iterates over all edges in theRegionPairsContainer
:for edge in hic.edges: # do something with edge print(edge) # 4242; bias: 5.797788472650082e05; sink_node: chr18:4200000143000000; source_node: chr18:4200000143000000; weight: 0.12291311562018173 # 2428; bias: 6.496381719803623e05; sink_node: chr18:2800000129000000; source_node: chr18:2400000125000000; weight: 0.025205961072838057 # 576; bias: 0.00010230955745211447; sink_node: chr18:7600000177000000; source_node: chr18:50000016000000; weight: 0.00961709840049876 # 6668; bias: 8.248432587969082e05; sink_node: chr18:6800000169000000; source_node: chr18:6600000167000000; weight: 0.03876763316345468 # ...
Calling
edges()
as a method has the same effect:# note the '()' for edge in hic.edges(): # do something with edge print(edge) # 4242; bias: 5.797788472650082e05; sink_node: chr18:4200000143000000; source_node: chr18:4200000143000000; weight: 0.12291311562018173 # 2428; bias: 6.496381719803623e05; sink_node: chr18:2800000129000000; source_node: chr18:2400000125000000; weight: 0.025205961072838057 # 576; bias: 0.00010230955745211447; sink_node: chr18:7600000177000000; source_node: chr18:50000016000000; weight: 0.00961709840049876 # 6668; bias: 8.248432587969082e05; sink_node: chr18:6800000169000000; source_node: chr18:6600000167000000; weight: 0.03876763316345468 # ...
Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a
GenomicRegion
, all nonzero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings orGenomicRegion
, only edges between the two regions are returned.# select all edges between chromosome 19 # and any other region: for edge in hic.edges("chr19"): print(edge) # 49106; bias: 0.00026372303696871666; sink_node: chr19:2700000128000000; source_node: chr18:4900000150000000; weight: 0.003692122517562033 # 682; bias: 0.00021923129703834945; sink_node: chr19:30000014000000; source_node: chr18:60000017000000; weight: 0.0008769251881533978 # 47107; bias: 0.00012820949175399097; sink_node: chr19:2800000129000000; source_node: chr18:4700000148000000; weight: 0.0015385139010478917 # 38112; bias: 0.0001493344481069762; sink_node: chr19:3300000134000000; source_node: chr18:3800000139000000; weight: 0.0005973377924279048 # ... # select all edges that are only on # chromosome 19 for edge in hic.edges(('chr19', 'chr19')): print(edge) # 90116; bias: 0.00021173151730025176; sink_node: chr19:3700000138000000; source_node: chr19:1100000112000000; weight: 0.009104455243910825 # 135135; bias: 0.00018003890596887822; sink_node: chr19:5600000157000000; source_node: chr19:5600000157000000; weight: 0.10028167062466517 # 123123; bias: 0.00011063368998965993; sink_node: chr19:4400000145000000; source_node: chr19:4400000145000000; weight: 0.1386240135570439 # 9293; bias: 0.00040851066434864896; sink_node: chr19:1400000115000000; source_node: chr19:1300000114000000; weight: 0.10090213409411629 # ... # select interchromosomal edges # between chromosomes 18 and 19 for edge in hic.edges(('chr18', 'chr19')): print(edge) # 49106; bias: 0.00026372303696871666; sink_node: chr19:2700000128000000; source_node: chr18:4900000150000000; weight: 0.003692122517562033 # 682; bias: 0.00021923129703834945; sink_node: chr19:30000014000000; source_node: chr18:60000017000000; weight: 0.0008769251881533978 # 47107; bias: 0.00012820949175399097; sink_node: chr19:2800000129000000; source_node: chr18:4700000148000000; weight: 0.0015385139010478917 # 38112; bias: 0.0001493344481069762; sink_node: chr19:3300000134000000; source_node: chr18:3800000139000000; weight: 0.0005973377924279048 # ...
By default,
edges()
will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all filebased FANCRegionPairsContainer
objects support lazy loading, where attributes are only read on demand.for edge in hic.edges('chr18', lazy=True): print(edge.source, edge.sink, edge.weight, edge) # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0> # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1> # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2> # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3> # ...
Warning
The lazy iterator reuses the
LazyEdge
object in every iteration, and overwrites theLazyEdge
attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expectedlist(hic.edges())
, with allEdge
objects stored in the list, while this codelist(hic.edges(lazy=True))
will result in a list of identicalLazyEdge
objects. Always ensure you do all edge processing in the loop when working with lazy iterators!When working with normalised contact frequencies, such as obtained through matrix balancing in the example above,
edges()
automatically returns normalised edge weights. In addition, thebias
attribute will (typically) have a value different from 1.When you are interested in the raw contact frequency, use the
norm=False
parameter:for edge in hic.edges('chr18', lazy=True, norm=False): print(edge.source, edge.sink, edge.weight) # 42 42 2120.0 # 24 28 388.0 # 5 76 94.0 # 66 68 470.0 # ...
You can also choose to omit all intra or interchromosomal edges using
intra_chromosomal=False
orinter_chromosomal=False
, respectively.Returns: Iterator over Edge
or equivalent.

edges_dict
(*args, **kwargs)¶ Edges iterator with access by bracket notation.
This iterator always returns unnormalised edges.
Returns: dict or dictlike iterator

expected_values
(selected_chromosome=None, norm=True, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genomewide intrachromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intrachromosomal expected values specific to each chromosome, and a float for interchromosomal expected value.
Parameters:  selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
 norm – If False, will calculate the expected values on the unnormalised matrix.
 args – Not used in this context
 kwargs – Not used in this context
Returns: list of intrachromosomal expected values, dict of intrachromosomal expected values by chromosome, interchromosomal expected value

expected_values_and_marginals
(selected_chromosome=None, norm=True, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances and the whole matrix marginals.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genomewide intrachromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intrachromosomal expected values specific to each chromosome, and a float for interchromosomal expected value.
Parameters:  selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
 norm – If False, will calculate the expected values on the unnormalised matrix.
 args – Not used in this context
 kwargs – Not used in this context
Returns: list of intrachromosomal expected values, dict of intrachromosomal expected values by chromosome, interchromosomal expected value

find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
Parameters: query_regions – Region selector string, :class:~GenomicRegion, or list of the former Returns: index (or list of indexes) of the region at the center of the query region

intervals
(*args, **kwargs)¶ Alias for region_intervals.

mappable
(region=None)¶ Get the mappability of regions in this object.
A “mappable” region has at least one contact to another region in the genome.
Returns: array
where True means mappable and False unmappable

marginals
(masked=True, *args, **kwargs)¶ Get the marginals vector of this Hic matrix.
Sums up all contacts for each bin of the HiC matrix. Unmappable regoins will be masked in the returned vector unless the
masked
parameter is set toFalse
.By default, corrected matrix entries are summed up. To get uncorrected matrix marginals use
norm=False
. Generally, all parameters accepted byedges()
are supported.Parameters:  masked – Use a numpy masked array to mask entries corresponding to unmappable regions
 kwargs – Keyword arguments passed to
edges()

matrix
(key=None, log=False, default_value=None, mask=True, log_base=2, *args, **kwargs)¶ Assemble a
RegionMatrix
from region pairs.Parameters:  key – Matrix selector. See
edges()
for all supported key types  log – If True, logtransform the matrix entries. Also see log_base
 log_base – Base of the log transformation. Default: 2; only used when log=True
 default_value – (optional) set the default value of matrix entries that have no associated edge/contact
 mask – If False, do not mask unmappable regions
 args – Positional arguments passed to
regions_and_matrix_entries()
 kwargs – Keyword arguments passed to
regions_and_matrix_entries()
Returns:  key – Matrix selector. See

classmethod
merge
(pairs, *args, **kwargs)¶ Merge two or more
RegionPairsContainer
objects.Parameters:  pairs –
list
ofRegionPairsContainer
 args – Positional arguments passed to constructor of this class
 kwargs – Keyword arguments passed to constructor of this class
 pairs –

possible_contacts
()¶ Calculate the possible number of contacts in the genome.
This calculates the number of potential region pairs in a genome for any possible separation distance, taking into account the existence of unmappable regions.
It will calculate one number for interchromosomal pairs, return a list with the number of possible pairs where the list index corresponds to the number of bins separating two regions, and a dictionary of lists for each chromosome.
Returns: possible intrachromosomal pairs, possible intrachromosomal pairs by chromosome, possible interchromosomal pairs

region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
Parameters: args – provide a GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.Returns:

region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equallysized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
Parameters:  region – String or class:~GenomicRegion object denoting the region to be binned
 bins – Number of bins to divide the region into
 bin_size – Size of each bin (alternative to bins argument)
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
 args – Arguments passed to _region_intervals
 kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of tuples: (start, end, score)

region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
Parameters: region – String or class:~GenomicRegion object for which covered bins will be returned.

regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.Returns: RegionIter

regions_and_edges
(key, *args, **kwargs)¶ Convenient access to regions and edges selected by key.
Parameters: Returns: list of row regions, list of col regions, iterator over edges

regions_and_matrix_entries
(key=None, score_field=None, *args, **kwargs)¶ Convenient access to nonzero matrix entries and associated regions.
Parameters:  key – Edge key, see
edges()
 oe – If True, will divide observed values by their expected value at the given distance. False by default
 oe_per_chromosome – If True (default), will do a perchromosome O/E calculation rather than using the whole matrix to obtain expected values
 score_field – (optional) any edge attribute that returns a number
can be specified here for filling the matrix. Usually
this is defined by the
_default_score_field
attribute of the matrix class.  args – Positional arguments passed to
edges()
 kwargs – Keyword arguments passed to
edges()
Returns: list of row regions, list of col regions, iterator over (i, j, weight) tuples
 key – Edge key, see

regions_dict
¶ Return a dictionary with region index as keys and regions as values.
Returns: dict {region.ix: region, …}

static
regions_identical
(pairs)¶ Check if the regions in all objects in the list are identical.
Parameters: pairs – list
ofRegionBased
objectsReturns: True if chromosome, start, and end are identical between all regions in the same list positions.

scaling_factor
(matrix, weight_column=None)¶ Compute the scaling factor to another matrix.
Calculates the ratio between the number of contacts in this Hic object to the number of contacts in another Hic object.
Parameters:  matrix – A
Hic
object  weight_column – Name of the column to calculate the scaling factor on
Returns: float
 matrix – A

to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_bed()

to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_bigwig()

to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_gff()


class
fanc.matrix.
RegionMatrixTable
(file_name=None, mode='a', tmpdir=None, partition_strategy='auto', additional_region_fields=None, additional_edge_fields=None, default_score_field='weight', default_value=0.0, _table_name_regions='regions', _table_name_edges='edges', _table_name_expected_values='expected_values', _edge_buffer_size='3G')¶ Bases:
fanc.matrix.RegionMatrixContainer
,fanc.matrix.RegionPairsTable
HDF5 implementation of the
RegionMatrixContainer
interface.
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.

class
MaskDescription
¶ Bases:
tables.description.IsDescription

class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table

add_contact
(contact, *args, **kwargs)¶ Alias for
add_edge()
Parameters:  contact –
Edge
 args – Positional arguments passed to
_add_edge()
 kwargs – Keyword arguments passed to
_add_edge()
 contact –

add_contacts
(contacts, *args, **kwargs)¶ Alias for
add_edges()

add_edge
(edge, check_nodes_exist=True, *args, **kwargs)¶ Add an edge / contact between two regions to this object.
Parameters:  edge –
Edge
, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).  check_nodes_exist – Make sure that there are nodes that match source and sink indexes
 args – Positional arguments passed to
_add_edge()
 kwargs – Keyword arguments passed to
_add_edge()
 edge –

add_edge_from_dict
(edge, *args, **kwargs)¶ Direct method to add an edge from dict input.
Parameters: edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes

add_edge_from_edge
(edge, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.Parameters: edge – Edge

add_edge_from_list
(edge, *args, **kwargs)¶ Direct method to add an edge from list or tuple input.
Parameters: edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)

add_edge_simple
(source, sink, weight=None, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.Parameters:  source – Source region index
 sink – Sink region index
 weight – Weight of the edge

add_edges
(edges, flush=True, *args, **kwargs)¶ Bulkadd edges from a list.
List items can be any of the supported edge types, list, tuple, dict, or
Edge
. Repeatedly callsadd_edge()
, so may be inefficient for large amounts of data.Parameters: edges – List (or iterator) of edges. See add_edge()
for details

add_mask_description
(name, description)¶ Add a mask description to the _mask table and return its ID.
Parameters:  name (str) – name of the mask
 description (str) – description of the mask
Returns: id of the mask
Return type: int

add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
Parameters: region – Can be a GenomicRegion
, a str in the form ‘<chromosome>:<start><end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).

add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
Parameters: regions – List (or any iterator) with objects that describe a genomic region. See add_region
for options.

static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
Parameters:  intervals – iterator of tuples (start, end, score)
 bins – Number of bins to divide the region into
 interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)

static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
Parameters:  intervals – iterator of tuples (start, end, score)
 bin_size – Size of each bin in base pairs
 interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)

bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
Returns: int

binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.Parameters:  region – String or class:~GenomicRegion object denoting the region to be binned
 bins – Number of bins to divide the region into
 bin_size – Size of each bin (alternative to bins argument)
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
 args – Arguments passed to _region_intervals
 kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of
GenomicRegion
objects

bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
Parameters: bins – float, fraction of bins Returns: int, base pairs

chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is rangecompatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.

chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.

close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in readonly mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
Parameters:  copy_tmp – If False, does not overwrite original with modified file.
 remove_tmp – If False, does not delete temporary copy of file.

distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
Parameters: distance – distance in base pairs Returns: float, distance as fraction of bin size

downsample
(n, file_name=None)¶ Sample edges from this object.
Sampling is always done on uncorrected HiC matrices.
Parameters:  n – Sample size or reference object. If n < 1 will be interpreted as a fraction of total reads in this object.
 file_name – Output file name for downsampled object.
Returns:

edge_data
(attribute, *args, **kwargs)¶ Iterate over specific edge attribute.
Parameters: Returns: iterator over edge attribute

edge_subset
(key=None, *args, **kwargs)¶ Get a subset of edges.
This is an alias for
edges()
.Returns: generator ( Edge
)

edges
¶ Iterate over contacts / edges.
edges()
is the central function ofRegionPairsContainer
. Here, we will use theHic
implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementingRegionPairsContainer
, includingJuicerHic
andCoolerHic
.import fanc # file from FANC examples hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")
We can easily find the number of edges in the sample
Hic
object:len(hic.edges) # 8695
When used in an iterator context,
edges()
iterates over all edges in theRegionPairsContainer
:for edge in hic.edges: # do something with edge print(edge) # 4242; bias: 5.797788472650082e05; sink_node: chr18:4200000143000000; source_node: chr18:4200000143000000; weight: 0.12291311562018173 # 2428; bias: 6.496381719803623e05; sink_node: chr18:2800000129000000; source_node: chr18:2400000125000000; weight: 0.025205961072838057 # 576; bias: 0.00010230955745211447; sink_node: chr18:7600000177000000; source_node: chr18:50000016000000; weight: 0.00961709840049876 # 6668; bias: 8.248432587969082e05; sink_node: chr18:6800000169000000; source_node: chr18:6600000167000000; weight: 0.03876763316345468 # ...
Calling
edges()
as a method has the same effect:# note the '()' for edge in hic.edges(): # do something with edge print(edge) # 4242; bias: 5.797788472650082e05; sink_node: chr18:4200000143000000; source_node: chr18:4200000143000000; weight: 0.12291311562018173 # 2428; bias: 6.496381719803623e05; sink_node: chr18:2800000129000000; source_node: chr18:2400000125000000; weight: 0.025205961072838057 # 576; bias: 0.00010230955745211447; sink_node: chr18:7600000177000000; source_node: chr18:50000016000000; weight: 0.00961709840049876 # 6668; bias: 8.248432587969082e05; sink_node: chr18:6800000169000000; source_node: chr18:6600000167000000; weight: 0.03876763316345468 # ...
Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a
GenomicRegion
, all nonzero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings orGenomicRegion
, only edges between the two regions are returned.# select all edges between chromosome 19 # and any other region: for edge in hic.edges("chr19"): print(edge) # 49106; bias: 0.00026372303696871666; sink_node: chr19:2700000128000000; source_node: chr18:4900000150000000; weight: 0.003692122517562033 # 682; bias: 0.00021923129703834945; sink_node: chr19:30000014000000; source_node: chr18:60000017000000; weight: 0.0008769251881533978 # 47107; bias: 0.00012820949175399097; sink_node: chr19:2800000129000000; source_node: chr18:4700000148000000; weight: 0.0015385139010478917 # 38112; bias: 0.0001493344481069762; sink_node: chr19:3300000134000000; source_node: chr18:3800000139000000; weight: 0.0005973377924279048 # ... # select all edges that are only on # chromosome 19 for edge in hic.edges(('chr19', 'chr19')): print(edge) # 90116; bias: 0.00021173151730025176; sink_node: chr19:3700000138000000; source_node: chr19:1100000112000000; weight: 0.009104455243910825 # 135135; bias: 0.00018003890596887822; sink_node: chr19:5600000157000000; source_node: chr19:5600000157000000; weight: 0.10028167062466517 # 123123; bias: 0.00011063368998965993; sink_node: chr19:4400000145000000; source_node: chr19:4400000145000000; weight: 0.1386240135570439 # 9293; bias: 0.00040851066434864896; sink_node: chr19:1400000115000000; source_node: chr19:1300000114000000; weight: 0.10090213409411629 # ... # select interchromosomal edges # between chromosomes 18 and 19 for edge in hic.edges(('chr18', 'chr19')): print(edge) # 49106; bias: 0.00026372303696871666; sink_node: chr19:2700000128000000; source_node: chr18:4900000150000000; weight: 0.003692122517562033 # 682; bias: 0.00021923129703834945; sink_node: chr19:30000014000000; source_node: chr18:60000017000000; weight: 0.0008769251881533978 # 47107; bias: 0.00012820949175399097; sink_node: chr19:2800000129000000; source_node: chr18:4700000148000000; weight: 0.0015385139010478917 # 38112; bias: 0.0001493344481069762; sink_node: chr19:3300000134000000; source_node: chr18:3800000139000000; weight: 0.0005973377924279048 # ...
By default,
edges()
will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all filebased FANCRegionPairsContainer
objects support lazy loading, where attributes are only read on demand.for edge in hic.edges('chr18', lazy=True): print(edge.source, edge.sink, edge.weight, edge) # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0> # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1> # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2> # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3> # ...
Warning
The lazy iterator reuses the
LazyEdge
object in every iteration, and overwrites theLazyEdge
attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expectedlist(hic.edges())
, with allEdge
objects stored in the list, while this codelist(hic.edges(lazy=True))
will result in a list of identicalLazyEdge
objects. Always ensure you do all edge processing in the loop when working with lazy iterators!When working with normalised contact frequencies, such as obtained through matrix balancing in the example above,
edges()
automatically returns normalised edge weights. In addition, thebias
attribute will (typically) have a value different from 1.When you are interested in the raw contact frequency, use the
norm=False
parameter:for edge in hic.edges('chr18', lazy=True, norm=False): print(edge.source, edge.sink, edge.weight) # 42 42 2120.0 # 24 28 388.0 # 5 76 94.0 # 66 68 470.0 # ...
You can also choose to omit all intra or interchromosomal edges using
intra_chromosomal=False
orinter_chromosomal=False
, respectively.Returns: Iterator over Edge
or equivalent.

edges_dict
(*args, **kwargs)¶ Edges iterator with access by bracket notation.
This iterator always returns unnormalised edges.
Returns: dict or dictlike iterator

expected_values
(selected_chromosome=None, norm=True, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genomewide intrachromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intrachromosomal expected values specific to each chromosome, and a float for interchromosomal expected value.
Parameters:  selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
 norm – If False, will calculate the expected values on the unnormalised matrix.
 args – Not used in this context
 kwargs – Not used in this context
Returns: list of intrachromosomal expected values, dict of intrachromosomal expected values by chromosome, interchromosomal expected value

expected_values_and_marginals
(selected_chromosome=None, norm=True, force=False, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances and the whole matrix marginals.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genomewide intrachromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intrachromosomal expected values specific to each chromosome, and a float for interchromosomal expected value.
Parameters:  selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
 norm – If False, will calculate the expected values on the unnormalised matrix.
 args – Not used in this context
 kwargs – Not used in this context
Returns: list of intrachromosomal expected values, dict of intrachromosomal expected values by chromosome, interchromosomal expected value

filter
(edge_filter, queue=False, log_progress=True)¶ Filter edges in this object by using a
MaskFilter
.Parameters:  edge_filter – Class implementing
MaskFilter
.  queue – If True, filter will be queued and can be executed
along with other queued filters using
run_queued_filters()
 log_progress – If true, process iterating through all edges will be continuously reported.
 edge_filter – Class implementing

find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
Parameters: query_regions – Region selector string, :class:~GenomicRegion, or list of the former Returns: index (or list of indexes) of the region at the center of the query region

flush
(silent=False, update_mappability=True)¶ Write data to file and flush buffers.
Parameters:  silent – do not print flush progress
 update_mappability – After writing data, update mappability and expected values

get_mask
(key)¶ Search _mask table for key and return Mask.
Parameters:  key (int) – search by mask name
 key – search by mask ID
Returns: Mask

get_masks
(ix)¶ Extract mask IDs encoded in parameter and return masks.
IDs are powers of 2, so a single int field in the table can hold multiple masks by simply adding up the IDs. Similar principle to UNIX chmod (although that uses base 8)
Parameters: ix (int) – integer that is the sum of powers of 2. Note that this value is not necessarily itself a power of 2. Returns: list of Masks extracted from ix Return type: list (Mask)

intervals
(*args, **kwargs)¶ Alias for region_intervals.

mappable
(region=None)¶ Get the mappability of regions in this object.
A “mappable” region has at least one contact to another region in the genome.
Returns: array
where True means mappable and False unmappable

marginals
(masked=True, *args, **kwargs)¶ Get the marginals vector of this Hic matrix.
Sums up all contacts for each bin of the HiC matrix. Unmappable regoins will be masked in the returned vector unless the
masked
parameter is set toFalse
.By default, corrected matrix entries are summed up. To get uncorrected matrix marginals use
norm=False
. Generally, all parameters accepted byedges()
are supported.Parameters:  masked – Use a numpy masked array to mask entries corresponding to unmappable regions
 kwargs – Keyword arguments passed to
edges()

matrix
(key=None, log=False, default_value=None, mask=True, log_base=2, *args, **kwargs)¶ Assemble a
RegionMatrix
from region pairs.Parameters:  key – Matrix selector. See
edges()
for all supported key types  log – If True, logtransform the matrix entries. Also see log_base
 log_base – Base of the log transformation. Default: 2; only used when log=True
 default_value – (optional) set the default value of matrix entries that have no associated edge/contact
 mask – If False, do not mask unmappable regions
 args – Positional arguments passed to
regions_and_matrix_entries()
 kwargs – Keyword arguments passed to
regions_and_matrix_entries()
Returns:  key – Matrix selector. See

classmethod
merge
(matrices, *args, **kwargs)¶ Merge multiple
RegionMatrixContainer
objects.Merging is done by adding the weight of edges in each object.
Parameters: matrices – list of RegionMatrixContainer
Returns: merged RegionMatrixContainer

possible_contacts
()¶ Calculate the possible number of contacts in the genome.
This calculates the number of potential region pairs in a genome for any possible separation distance, taking into account the existence of unmappable regions.
It will calculate one number for interchromosomal pairs, return a list with the number of possible pairs where the list index corresponds to the number of bins separating two regions, and a dictionary of lists for each chromosome.
Returns: possible intrachromosomal pairs, possible intrachromosomal pairs by chromosome, possible interchromosomal pairs

region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
Parameters: args – provide a GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.Returns:

region_data
(key, value=None)¶ Retrieve or add vectordata to this object. If there is existing data in this object with the same name, it will be replaced
Parameters:  key – Name of the data column
 value – vector with regionbased data (one entry per region)

region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equallysized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
Parameters:  region – String or class:~GenomicRegion object denoting the region to be binned
 bins – Number of bins to divide the region into
 bin_size – Size of each bin (alternative to bins argument)
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
 args – Arguments passed to _region_intervals
 kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of tuples: (start, end, score)

region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
Parameters: region – String or class:~GenomicRegion object for which covered bins will be returned.

regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.Returns: RegionIter

regions_and_edges
(key, *args, **kwargs)¶ Convenient access to regions and edges selected by key.
Parameters: Returns: list of row regions, list of col regions, iterator over edges

regions_and_matrix_entries
(key=None, score_field=None, *args, **kwargs)¶ Convenient access to nonzero matrix entries and associated regions.
Parameters:  key – Edge key, see
edges()
 oe – If True, will divide observed values by their expected value at the given distance. False by default
 oe_per_chromosome – If True (default), will do a perchromosome O/E calculation rather than using the whole matrix to obtain expected values
 score_field – (optional) any edge attribute that returns a number
can be specified here for filling the matrix. Usually
this is defined by the
_default_score_field
attribute of the matrix class.  args – Positional arguments passed to
edges()
 kwargs – Keyword arguments passed to
edges()
Returns: list of row regions, list of col regions, iterator over (i, j, weight) tuples
 key – Edge key, see

regions_dict
¶ Return a dictionary with region index as keys and regions as values.
Returns: dict {region.ix: region, …}

static
regions_identical
(pairs)¶ Check if the regions in all objects in the list are identical.
Parameters: pairs – list
ofRegionBased
objectsReturns: True if chromosome, start, and end are identical between all regions in the same list positions.

run_queued_filters
(log_progress=True)¶ Run queued filters.
Parameters: log_progress – If true, process iterating through all edges will be continuously reported.

scaling_factor
(matrix, weight_column=None)¶ Compute the scaling factor to another matrix.
Calculates the ratio between the number of contacts in this Hic object to the number of contacts in another Hic object.
Parameters:  matrix – A
Hic
object  weight_column – Name of the column to calculate the scaling factor on
Returns: float
 matrix – A

subset
(*regions, **kwargs)¶ Subset a Hic object by specifying one or more subset regions.
Parameters:  regions – string or GenomicRegion object(s)
 kwargs – Supports
file_name: destination file name of subset Hic object;
tmpdir: if True works in tmp until object is closed
additional parameters are passed to
edges()
Returns: Hic

to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_bed()

to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_bigwig()

to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_gff()

class

class
fanc.matrix.
RegionPairsContainer
¶ Bases:
genomic_regions.regions.RegionBased
Class representing pairs of genomic regions.
This is the basic interface for all pair and matrix classes in this module. It inherits all methods from
RegionBased
, and is therefore based on a list of genomic regions (GenomicRegion
) representing the underlying genome. You can use theregions()
method to access genomic regions in a intuitive fashion, for example:for region in rpc.regions('chr1'): # do something with region print(region)
For more details on region access, see the
genomic_regions
documentation, on which this module is built.RegionPairsContainer
adds methods for pairs of genomic regions on top of theRegionBased
methods for individual regions. In the nomenclature of this module, which borrows from network analysis terminology, a pair of regions is represented by anEdge
.# iterate over all region pairs / edges in chr1 for edge in rpc.edges(("chr1", "chr1")): # do something with edge / region pair region1 = edge.source_region region2 = edge.sink_region
for more details see the
edges()
method help.This class itself is only an interface and cannot actually be used to add regions and region pairs. Implementations of this interface, i.e. subclasses such as
RegionPairsTable
must override various hidden methods to give them full functionality._add_edge()
is used to save region pairs / edges to the object. It receives a singleEdge
as input and should return the index of the added edge._edges_iter()
is required byedges()
. It is used to iterate over all edges in the object in no particular order. It should return a generator ofEdge
objects representing all region pairs in the object._edges_subset()
is also used byedges()
. It is used to iterate over a subset of edges in this object. It receives as input akey
representing the requested subset (further described inedges()
), and two lists ofGenomicRegion
objects,row_regions
andcol_regions
representing the two dimensions of regions selected bykey
. It should return an iterator overEdge
objects._edges_getitem()
is used byedges()
for retrieval of edges by bracket notation. For integer input, it should return a singleEdge
, forslice
input a list ofEdge
objects.
The above methods cover all the basic
RegionPairsContainer
functionality, but for speed improvements you may also want to override the following method, which by default iterates over all edges_edges_length()
which returns the total number of edges in the object

add_contact
(contact, *args, **kwargs)¶ Alias for
add_edge()
Parameters:  contact –
Edge
 args – Positional arguments passed to
_add_edge()
 kwargs – Keyword arguments passed to
_add_edge()
 contact –

add_contacts
(contacts, *args, **kwargs)¶ Alias for
add_edges()

add_edge
(edge, check_nodes_exist=True, *args, **kwargs)¶ Add an edge / contact between two regions to this object.
Parameters:  edge –
Edge
, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).  check_nodes_exist – Make sure that there are nodes that match source and sink indexes
 args – Positional arguments passed to
_add_edge()
 kwargs – Keyword arguments passed to
_add_edge()
 edge –

add_edge_from_dict
(edge, *args, **kwargs)¶ Direct method to add an edge from dict input.
Parameters: edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes

add_edge_from_edge
(edge, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.Parameters: edge – Edge

add_edge_from_list
(edge, *args, **kwargs)¶ Direct method to add an edge from list or tuple input.
Parameters: edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)

add_edge_simple
(source, sink, weight=None, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.Parameters:  source – Source region index
 sink – Sink region index
 weight – Weight of the edge

add_edges
(edges, *args, **kwargs)¶ Bulkadd edges from a list.
List items can be any of the supported edge types, list, tuple, dict, or
Edge
. Repeatedly callsadd_edge()
, so may be inefficient for large amounts of data.Parameters: edges – List (or iterator) of edges. See add_edge()
for details

add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
Parameters: region – Can be a GenomicRegion
, a str in the form ‘<chromosome>:<start><end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).

static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
Parameters:  intervals – iterator of tuples (start, end, score)
 bins – Number of bins to divide the region into
 interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)

static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
Parameters:  intervals – iterator of tuples (start, end, score)
 bin_size – Size of each bin in base pairs
 interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)

binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.Parameters:  region – String or class:~GenomicRegion object denoting the region to be binned
 bins – Number of bins to divide the region into
 bin_size – Size of each bin (alternative to bins argument)
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
 args – Arguments passed to _region_intervals
 kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of
GenomicRegion
objects

chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.

chromosomes
()¶ Get a list of chromosome names.

edge_data
(attribute, *args, **kwargs)¶ Iterate over specific edge attribute.
Parameters: Returns: iterator over edge attribute

edge_subset
(key=None, *args, **kwargs)¶ Get a subset of edges.
This is an alias for
edges()
.Returns: generator ( Edge
)

edges
¶ Iterate over contacts / edges.
edges()
is the central function ofRegionPairsContainer
. Here, we will use theHic
implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementingRegionPairsContainer
, includingJuicerHic
andCoolerHic
.import fanc # file from FANC examples hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")
We can easily find the number of edges in the sample
Hic
object:len(hic.edges) # 8695
When used in an iterator context,
edges()
iterates over all edges in theRegionPairsContainer
:for edge in hic.edges: # do something with edge print(edge) # 4242; bias: 5.797788472650082e05; sink_node: chr18:4200000143000000; source_node: chr18:4200000143000000; weight: 0.12291311562018173 # 2428; bias: 6.496381719803623e05; sink_node: chr18:2800000129000000; source_node: chr18:2400000125000000; weight: 0.025205961072838057 # 576; bias: 0.00010230955745211447; sink_node: chr18:7600000177000000; source_node: chr18:50000016000000; weight: 0.00961709840049876 # 6668; bias: 8.248432587969082e05; sink_node: chr18:6800000169000000; source_node: chr18:6600000167000000; weight: 0.03876763316345468 # ...
Calling
edges()
as a method has the same effect:# note the '()' for edge in hic.edges(): # do something with edge print(edge) # 4242; bias: 5.797788472650082e05; sink_node: chr18:4200000143000000; source_node: chr18:4200000143000000; weight: 0.12291311562018173 # 2428; bias: 6.496381719803623e05; sink_node: chr18:2800000129000000; source_node: chr18:2400000125000000; weight: 0.025205961072838057 # 576; bias: 0.00010230955745211447; sink_node: chr18:7600000177000000; source_node: chr18:50000016000000; weight: 0.00961709840049876 # 6668; bias: 8.248432587969082e05; sink_node: chr18:6800000169000000; source_node: chr18:6600000167000000; weight: 0.03876763316345468 # ...
Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a
GenomicRegion
, all nonzero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings orGenomicRegion
, only edges between the two regions are returned.# select all edges between chromosome 19 # and any other region: for edge in hic.edges("chr19"): print(edge) # 49106; bias: 0.00026372303696871666; sink_node: chr19:2700000128000000; source_node: chr18:4900000150000000; weight: 0.003692122517562033 # 682; bias: 0.00021923129703834945; sink_node: chr19:30000014000000; source_node: chr18:60000017000000; weight: 0.0008769251881533978 # 47107; bias: 0.00012820949175399097; sink_node: chr19:2800000129000000; source_node: chr18:4700000148000000; weight: 0.0015385139010478917 # 38112; bias: 0.0001493344481069762; sink_node: chr19:3300000134000000; source_node: chr18:3800000139000000; weight: 0.0005973377924279048 # ... # select all edges that are only on # chromosome 19 for edge in hic.edges(('chr19', 'chr19')): print(edge) # 90116; bias: 0.00021173151730025176; sink_node: chr19:3700000138000000; source_node: chr19:1100000112000000; weight: 0.009104455243910825 # 135135; bias: 0.00018003890596887822; sink_node: chr19:5600000157000000; source_node: chr19:5600000157000000; weight: 0.10028167062466517 # 123123; bias: 0.00011063368998965993; sink_node: chr19:4400000145000000; source_node: chr19:4400000145000000; weight: 0.1386240135570439 # 9293; bias: 0.00040851066434864896; sink_node: chr19:1400000115000000; source_node: chr19:1300000114000000; weight: 0.10090213409411629 # ... # select interchromosomal edges # between chromosomes 18 and 19 for edge in hic.edges(('chr18', 'chr19')): print(edge) # 49106; bias: 0.00026372303696871666; sink_node: chr19:2700000128000000; source_node: chr18:4900000150000000; weight: 0.003692122517562033 # 682; bias: 0.00021923129703834945; sink_node: chr19:30000014000000; source_node: chr18:60000017000000; weight: 0.0008769251881533978 # 47107; bias: 0.00012820949175399097; sink_node: chr19:2800000129000000; source_node: chr18:4700000148000000; weight: 0.0015385139010478917 # 38112; bias: 0.0001493344481069762; sink_node: chr19:3300000134000000; source_node: chr18:3800000139000000; weight: 0.0005973377924279048 # ...
By default,
edges()
will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all filebased FANCRegionPairsContainer
objects support lazy loading, where attributes are only read on demand.for edge in hic.edges('chr18', lazy=True): print(edge.source, edge.sink, edge.weight, edge) # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0> # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1> # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2> # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3> # ...
Warning
The lazy iterator reuses the
LazyEdge
object in every iteration, and overwrites theLazyEdge
attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expectedlist(hic.edges())
, with allEdge
objects stored in the list, while this codelist(hic.edges(lazy=True))
will result in a list of identicalLazyEdge
objects. Always ensure you do all edge processing in the loop when working with lazy iterators!When working with normalised contact frequencies, such as obtained through matrix balancing in the example above,
edges()
automatically returns normalised edge weights. In addition, thebias
attribute will (typically) have a value different from 1.When you are interested in the raw contact frequency, use the
norm=False
parameter:for edge in hic.edges('chr18', lazy=True, norm=False): print(edge.source, edge.sink, edge.weight) # 42 42 2120.0 # 24 28 388.0 # 5 76 94.0 # 66 68 470.0 # ...
You can also choose to omit all intra or interchromosomal edges using
intra_chromosomal=False
orinter_chromosomal=False
, respectively.Returns: Iterator over Edge
or equivalent.

edges_dict
(*args, **kwargs)¶ Edges iterator with access by bracket notation.
This iterator always returns unnormalised edges.
Returns: dict or dictlike iterator

find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
Parameters: query_regions – Region selector string, :class:~GenomicRegion, or list of the former Returns: index (or list of indexes) of the region at the center of the query region

intervals
(*args, **kwargs)¶ Alias for region_intervals.

mappable
(region=None)¶ Get the mappability of regions in this object.
A “mappable” region has at least one contact to another region in the genome.
Returns: array
where True means mappable and False unmappable

classmethod
merge
(pairs, *args, **kwargs)¶ Merge two or more
RegionPairsContainer
objects.Parameters:  pairs –
list
ofRegionPairsContainer
 args – Positional arguments passed to constructor of this class
 kwargs – Keyword arguments passed to constructor of this class
 pairs –

region_bins
(region)¶ Takes a genomic region and returns a slice of the bin indices that are covered by the region.
Parameters: region – String or class:~GenomicRegion object for which covered bins will be returned. Returns: slice

region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equallysized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
Parameters:  region – String or class:~GenomicRegion object denoting the region to be binned
 bins – Number of bins to divide the region into
 bin_size – Size of each bin (alternative to bins argument)
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
 args – Arguments passed to _region_intervals
 kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of tuples: (start, end, score)

region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
Parameters: region – String or class:~GenomicRegion object for which covered bins will be returned.

regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.Returns: RegionIter

regions_and_edges
(key, *args, **kwargs)¶ Convenient access to regions and edges selected by key.
Parameters: Returns: list of row regions, list of col regions, iterator over edges

regions_dict
¶ Return a dictionary with region index as keys and regions as values.
Returns: dict {region.ix: region, …}

static
regions_identical
(pairs)¶ Check if the regions in all objects in the list are identical.
Parameters: pairs – list
ofRegionBased
objectsReturns: True if chromosome, start, and end are identical between all regions in the same list positions.

to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_bed()

to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_bigwig()

to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_gff()

class
fanc.matrix.
RegionPairsTable
(file_name=None, mode='a', tmpdir=None, additional_region_fields=None, additional_edge_fields=None, partition_strategy='auto', _table_name_regions='regions', _table_name_edges='edges', _edge_buffer_size='3G', _edge_table_prefix='chrpair_')¶ Bases:
fanc.matrix.RegionPairsContainer
,fanc.general.Maskable
,fanc.regions.RegionsTable
HDF5 implementation of the
RegionPairsContainer
interface.
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.

class
MaskDescription
¶ Bases:
tables.description.IsDescription

class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table

add_contact
(contact, *args, **kwargs)¶ Alias for
add_edge()
Parameters:  contact –
Edge
 args – Positional arguments passed to
_add_edge()
 kwargs – Keyword arguments passed to
_add_edge()
 contact –

add_contacts
(contacts, *args, **kwargs)¶ Alias for
add_edges()

add_edge
(edge, check_nodes_exist=True, *args, **kwargs)¶ Add an edge / contact between two regions to this object.
Parameters:  edge –
Edge
, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).  check_nodes_exist – Make sure that there are nodes that match source and sink indexes
 args – Positional arguments passed to
_add_edge()
 kwargs – Keyword arguments passed to
_add_edge()
 edge –

add_edge_from_dict
(edge, *args, **kwargs)¶ Direct method to add an edge from dict input.
Parameters: edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes

add_edge_from_edge
(edge, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.Parameters: edge – Edge

add_edge_from_list
(edge, *args, **kwargs)¶ Direct method to add an edge from list or tuple input.
Parameters: edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)

add_edge_simple
(source, sink, weight=None, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.Parameters:  source – Source region index
 sink – Sink region index
 weight – Weight of the edge

add_edges
(edges, flush=True, *args, **kwargs)¶ Bulkadd edges from a list.
List items can be any of the supported edge types, list, tuple, dict, or
Edge
. Repeatedly callsadd_edge()
, so may be inefficient for large amounts of data.Parameters: edges – List (or iterator) of edges. See add_edge()
for details

add_mask_description
(name, description)¶ Add a mask description to the _mask table and return its ID.
Parameters:  name (str) – name of the mask
 description (str) – description of the mask
Returns: id of the mask
Return type: int

add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
Parameters: region – Can be a GenomicRegion
, a str in the form ‘<chromosome>:<start><end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).

add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
Parameters: regions – List (or any iterator) with objects that describe a genomic region. See add_region
for options.

static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
Parameters:  intervals – iterator of tuples (start, end, score)
 bins – Number of bins to divide the region into
 interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)

static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
Parameters:  intervals – iterator of tuples (start, end, score)
 bin_size – Size of each bin in base pairs
 interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
Returns: iterator of tuples: (start, end, score)

bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
Returns: int

binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.Parameters:  region – String or class:~GenomicRegion object denoting the region to be binned
 bins – Number of bins to divide the region into
 bin_size – Size of each bin (alternative to bins argument)
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
 args – Arguments passed to _region_intervals
 kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of
GenomicRegion
objects

bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
Parameters: bins – float, fraction of bins Returns: int, base pairs

chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is rangecompatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.

chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.

close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in readonly mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
Parameters:  copy_tmp – If False, does not overwrite original with modified file.
 remove_tmp – If False, does not delete temporary copy of file.

distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
Parameters: distance – distance in base pairs Returns: float, distance as fraction of bin size

downsample
(n, file_name=None)¶ Sample edges from this object.
Sampling is always done on uncorrected HiC matrices.
Parameters:  n – Sample size or reference object. If n < 1 will be interpreted as a fraction of total reads in this object.
 file_name – Output file name for downsampled object.
Returns:

edge_data
(attribute, *args, **kwargs)¶ Iterate over specific edge attribute.
Parameters: Returns: iterator over edge attribute

edge_subset
(key=None, *args, **kwargs)¶ Get a subset of edges.
This is an alias for
edges()
.Returns: generator ( Edge
)

edges
¶ Iterate over contacts / edges.
edges()
is the central function ofRegionPairsContainer
. Here, we will use theHic
implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementingRegionPairsContainer
, includingJuicerHic
andCoolerHic
.import fanc # file from FANC examples hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")
We can easily find the number of edges in the sample
Hic
object:len(hic.edges) # 8695
When used in an iterator context,
edges()
iterates over all edges in theRegionPairsContainer
:for edge in hic.edges: # do something with edge print(edge) # 4242; bias: 5.797788472650082e05; sink_node: chr18:4200000143000000; source_node: chr18:4200000143000000; weight: 0.12291311562018173 # 2428; bias: 6.496381719803623e05; sink_node: chr18:2800000129000000; source_node: chr18:2400000125000000; weight: 0.025205961072838057 # 576; bias: 0.00010230955745211447; sink_node: chr18:7600000177000000; source_node: chr18:50000016000000; weight: 0.00961709840049876 # 6668; bias: 8.248432587969082e05; sink_node: chr18:6800000169000000; source_node: chr18:6600000167000000; weight: 0.03876763316345468 # ...
Calling
edges()
as a method has the same effect:# note the '()' for edge in hic.edges(): # do something with edge print(edge) # 4242; bias: 5.797788472650082e05; sink_node: chr18:4200000143000000; source_node: chr18:4200000143000000; weight: 0.12291311562018173 # 2428; bias: 6.496381719803623e05; sink_node: chr18:2800000129000000; source_node: chr18:2400000125000000; weight: 0.025205961072838057 # 576; bias: 0.00010230955745211447; sink_node: chr18:7600000177000000; source_node: chr18:50000016000000; weight: 0.00961709840049876 # 6668; bias: 8.248432587969082e05; sink_node: chr18:6800000169000000; source_node: chr18:6600000167000000; weight: 0.03876763316345468 # ...
Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a
GenomicRegion
, all nonzero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings orGenomicRegion
, only edges between the two regions are returned.# select all edges between chromosome 19 # and any other region: for edge in hic.edges("chr19"): print(edge) # 49106; bias: 0.00026372303696871666; sink_node: chr19:2700000128000000; source_node: chr18:4900000150000000; weight: 0.003692122517562033 # 682; bias: 0.00021923129703834945; sink_node: chr19:30000014000000; source_node: chr18:60000017000000; weight: 0.0008769251881533978 # 47107; bias: 0.00012820949175399097; sink_node: chr19:2800000129000000; source_node: chr18:4700000148000000; weight: 0.0015385139010478917 # 38112; bias: 0.0001493344481069762; sink_node: chr19:3300000134000000; source_node: chr18:3800000139000000; weight: 0.0005973377924279048 # ... # select all edges that are only on # chromosome 19 for edge in hic.edges(('chr19', 'chr19')): print(edge) # 90116; bias: 0.00021173151730025176; sink_node: chr19:3700000138000000; source_node: chr19:1100000112000000; weight: 0.009104455243910825 # 135135; bias: 0.00018003890596887822; sink_node: chr19:5600000157000000; source_node: chr19:5600000157000000; weight: 0.10028167062466517 # 123123; bias: 0.00011063368998965993; sink_node: chr19:4400000145000000; source_node: chr19:4400000145000000; weight: 0.1386240135570439 # 9293; bias: 0.00040851066434864896; sink_node: chr19:1400000115000000; source_node: chr19:1300000114000000; weight: 0.10090213409411629 # ... # select interchromosomal edges # between chromosomes 18 and 19 for edge in hic.edges(('chr18', 'chr19')): print(edge) # 49106; bias: 0.00026372303696871666; sink_node: chr19:2700000128000000; source_node: chr18:4900000150000000; weight: 0.003692122517562033 # 682; bias: 0.00021923129703834945; sink_node: chr19:30000014000000; source_node: chr18:60000017000000; weight: 0.0008769251881533978 # 47107; bias: 0.00012820949175399097; sink_node: chr19:2800000129000000; source_node: chr18:4700000148000000; weight: 0.0015385139010478917 # 38112; bias: 0.0001493344481069762; sink_node: chr19:3300000134000000; source_node: chr18:3800000139000000; weight: 0.0005973377924279048 # ...
By default,
edges()
will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all filebased FANCRegionPairsContainer
objects support lazy loading, where attributes are only read on demand.for edge in hic.edges('chr18', lazy=True): print(edge.source, edge.sink, edge.weight, edge) # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0> # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1> # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2> # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3> # ...
Warning
The lazy iterator reuses the
LazyEdge
object in every iteration, and overwrites theLazyEdge
attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expectedlist(hic.edges())
, with allEdge
objects stored in the list, while this codelist(hic.edges(lazy=True))
will result in a list of identicalLazyEdge
objects. Always ensure you do all edge processing in the loop when working with lazy iterators!When working with normalised contact frequencies, such as obtained through matrix balancing in the example above,
edges()
automatically returns normalised edge weights. In addition, thebias
attribute will (typically) have a value different from 1.When you are interested in the raw contact frequency, use the
norm=False
parameter:for edge in hic.edges('chr18', lazy=True, norm=False): print(edge.source, edge.sink, edge.weight) # 42 42 2120.0 # 24 28 388.0 # 5 76 94.0 # 66 68 470.0 # ...
You can also choose to omit all intra or interchromosomal edges using
intra_chromosomal=False
orinter_chromosomal=False
, respectively.Returns: Iterator over Edge
or equivalent.

edges_dict
(*args, **kwargs)¶ Edges iterator with access by bracket notation.
This iterator always returns unnormalised edges.
Returns: dict or dictlike iterator

filter
(edge_filter, queue=False, log_progress=True)¶ Filter edges in this object by using a
MaskFilter
.Parameters:  edge_filter – Class implementing
MaskFilter
.  queue – If True, filter will be queued and can be executed
along with other queued filters using
run_queued_filters()
 log_progress – If true, process iterating through all edges will be continuously reported.
 edge_filter – Class implementing

find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
Parameters: query_regions – Region selector string, :class:~GenomicRegion, or list of the former Returns: index (or list of indexes) of the region at the center of the query region

flush
(silent=False, update_mappability=True)¶ Write data to file and flush buffers.
Parameters:  silent – do not print flush progress
 update_mappability – After writing data, update mappability and expected values

get_mask
(key)¶ Search _mask table for key and return Mask.
Parameters:  key (int) – search by mask name
 key – search by mask ID
Returns: Mask

get_masks
(ix)¶ Extract mask IDs encoded in parameter and return masks.
IDs are powers of 2, so a single int field in the table can hold multiple masks by simply adding up the IDs. Similar principle to UNIX chmod (although that uses base 8)
Parameters: ix (int) – integer that is the sum of powers of 2. Note that this value is not necessarily itself a power of 2. Returns: list of Masks extracted from ix Return type: list (Mask)

intervals
(*args, **kwargs)¶ Alias for region_intervals.

mappable
(region=None)¶ Get the mappability of regions in this object.
A “mappable” region has at least one contact to another region in the genome.
Returns: array
where True means mappable and False unmappable

classmethod
merge
(pairs, *args, **kwargs)¶ Merge two or more
RegionPairsTable
objects.Parameters: pairs – list of RegionPairsTable
Returns: merged RegionPairsTable

region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
Parameters: args – provide a GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.Returns:

region_data
(key, value=None)¶ Retrieve or add vectordata to this object. If there is existing data in this object with the same name, it will be replaced
Parameters:  key – Name of the data column
 value – vector with regionbased data (one entry per region)

region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equallysized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
Parameters:  region – String or class:~GenomicRegion object denoting the region to be binned
 bins – Number of bins to divide the region into
 bin_size – Size of each bin (alternative to bins argument)
 smoothing_window – Size of window (in bins) to smooth scores over
 nan_replacement – NaN values in the scores will be replaced with this value
 zero_to_nan – If True, will convert bins with score 0 to NaN
 args – Arguments passed to _region_intervals
 kwargs – Keyword arguments passed to _region_intervals
Returns: iterator of tuples: (start, end, score)

region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
Parameters: region – String or class:~GenomicRegion object for which covered bins will be returned.

regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.Returns: RegionIter

regions_and_edges
(key, *args, **kwargs)¶ Convenient access to regions and edges selected by key.
Parameters: Returns: list of row regions, list of col regions, iterator over edges

regions_dict
¶ Return a dictionary with region index as keys and regions as values.
Returns: dict {region.ix: region, …}

static
regions_identical
(pairs)¶ Check if the regions in all objects in the list are identical.
Parameters: pairs – list
ofRegionBased
objectsReturns: True if chromosome, start, and end are identical between all regions in the same list positions.

run_queued_filters
(log_progress=True)¶ Run queued filters.
Parameters: log_progress – If true, process iterating through all edges will be continuously reported.

subset
(*regions, **kwargs)¶ Subset a Hic object by specifying one or more subset regions.
Parameters:  regions – string or GenomicRegion object(s)
 kwargs – Supports
file_name: destination file name of subset Hic object;
tmpdir: if True works in tmp until object is closed
additional parameters are passed to
edges()
Returns: Hic

to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_bed()

to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_bigwig()

to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
Parameters:  file_name – Path of file to write regions to
 subset – optional
GenomicRegion
or str to write only regions overlapping this region  kwargs – Passed to
write_gff()

class