Matrix module

class fanc.matrix.Edge(source, sink, _weight_field='weight', **kwargs)

Bases: object

A contact / an Edge between two genomic regions.

source

The index of the “source” genomic region. By convention, source <= sink.

sink

The index of the “sink” genomic region.

bias

Bias factor obtained via normalisation of the Hi-C matrix

source_node

The first GenomicRegion in this contact

sink_node

The second GenomicRegion in this contact

class fanc.matrix.LazyEdge(row, regions_table=None, _weight_field='weight')

Bases: object

An Edge equivalent supporting lazy loading.

source

The index of the “source” genomic region. By convention, source <= sink.

sink

The index of the “sink” genomic region.

bias

Bias factor obtained via normalisation of the Hi-C matrix

source_node

The first GenomicRegion in this contact

sink_node

The second GenomicRegion in this contact

class fanc.matrix.MutableLazyEdge(row, regions_table=None, _weight_field='weight')

Bases: fanc.matrix.LazyEdge

update()

Write changes to PyTables row to file.

class fanc.matrix.RegionMatrix

Bases: numpy.ma.core.MaskedArray

Subclass of masked_array with genomic region support.

Objects of this type are returned by matrix. RegionMatrix supports subsetting by GenomicRegion and region strings of the form <chromosome>[:<start>-<end>].

import fanc
hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")

m = hic.matrix(('chr18', 'chr18'))
type(m)  # fanc.matrix.RegionMatrix

m_sub = m['chr18:1-5mb', 'chr18:1-10mb']
type(m_sub)  # fanc.matrix.RegionMatrix
m.shape  # 5, 10
m_sub.row_regions  # [chr18:1-1000000, chr18:1000001-2000000,
                   #  chr18:2000001-3000000, chr18:3000001-4000000,
                   #  chr18:4000001-5000000]

If the associated row or col regions have a False valid attribute, the rows/cols of the :RegionMatrix will be masked.

row_regions

A list of regions matching the first matrix dimension

col_regions

A list of regions matching the second matrix dimension

all(axis=None, out=None, keepdims=<no value>)

Returns True if all elements evaluate to True.

The output array is masked where all the values along the given axis are masked: if the output would have been a scalar and that all the values are masked, then the output is masked.

Refer to numpy.all for full documentation.

See also

numpy.ndarray.all()
corresponding function for ndarrays
numpy.all()
equivalent function

Examples

>>> np.ma.array([1,2,3]).all()
True
>>> a = np.ma.array([1,2,3], mask=True)
>>> (a.all() is np.ma.masked)
True
anom(axis=None, dtype=None)

Compute the anomalies (deviations from the arithmetic mean) along the given axis.

Returns an array of anomalies, with the same shape as the input and where the arithmetic mean is computed along the given axis.

Parameters:
  • axis (int, optional) – Axis over which the anomalies are taken. The default is to use the mean of the flattened array as reference.
  • dtype (dtype, optional) –
    Type to use in computing the variance. For arrays of integer type
    the default is float32; for arrays of float types it is the same as the array type.

See also

mean()
Compute the mean of the array.

Examples

>>> a = np.ma.array([1,2,3])
>>> a.anom()
masked_array(data=[-1.,  0.,  1.],
             mask=False,
       fill_value=1e+20)
any(axis=None, out=None, keepdims=<no value>)

Returns True if any of the elements of a evaluate to True.

Masked values are considered as False during computation.

Refer to numpy.any for full documentation.

See also

numpy.ndarray.any()
corresponding function for ndarrays
numpy.any()
equivalent function
argmax(axis=None, fill_value=None, out=None)

Returns array of indices of the maximum values along the given axis. Masked values are treated as if they had the value fill_value.

Parameters:
  • axis ({None, integer}) – If None, the index is into the flattened array, otherwise along the specified axis
  • fill_value ({var}, optional) – Value used to fill in the masked values. If None, the output of maximum_fill_value(self._data) is used instead.
  • out ({None, array}, optional) – Array into which the result can be placed. Its type is preserved and it must be of the right shape to hold the output.
Returns:

index_array

Return type:

{integer_array}

Examples

>>> a = np.arange(6).reshape(2,3)
>>> a.argmax()
5
>>> a.argmax(0)
array([1, 1, 1])
>>> a.argmax(1)
array([2, 2])
argmin(axis=None, fill_value=None, out=None)

Return array of indices to the minimum values along the given axis.

Parameters:
  • axis ({None, integer}) – If None, the index is into the flattened array, otherwise along the specified axis
  • fill_value ({var}, optional) – Value used to fill in the masked values. If None, the output of minimum_fill_value(self._data) is used instead.
  • out ({None, array}, optional) – Array into which the result can be placed. Its type is preserved and it must be of the right shape to hold the output.
Returns:

If multi-dimension input, returns a new ndarray of indices to the minimum values along the given axis. Otherwise, returns a scalar of index to the minimum values along the given axis.

Return type:

ndarray or scalar

Examples

>>> x = np.ma.array(np.arange(4), mask=[1,1,0,0])
>>> x.shape = (2,2)
>>> x
masked_array(
  data=[[--, --],
        [2, 3]],
  mask=[[ True,  True],
        [False, False]],
  fill_value=999999)
>>> x.argmin(axis=0, fill_value=-1)
array([0, 0])
>>> x.argmin(axis=0, fill_value=9)
array([1, 1])
argpartition(kth, axis=-1, kind='introselect', order=None)

Returns the indices that would partition this array.

Refer to numpy.argpartition for full documentation.

New in version 1.8.0.

See also

numpy.argpartition()
equivalent function
argsort(axis=<no value>, kind=None, order=None, endwith=True, fill_value=None)

Return an ndarray of indices that sort the array along the specified axis. Masked values are filled beforehand to fill_value.

Parameters:
  • axis (int, optional) –

    Axis along which to sort. If None, the default, the flattened array is used.

    Changed in version 1.13.0: Previously, the default was documented to be -1, but that was in error. At some future date, the default will change to -1, as originally intended. Until then, the axis should be given explicitly when arr.ndim > 1, to avoid a FutureWarning.

  • kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, optional) – The sorting algorithm used.
  • order (list, optional) – When a is an array with fields defined, this argument specifies which fields to compare first, second, etc. Not all fields need be specified.
  • endwith ({True, False}, optional) – Whether missing values (if any) should be treated as the largest values (True) or the smallest values (False) When the array contains unmasked values at the same extremes of the datatype, the ordering of these values and the masked values is undefined.
  • fill_value ({var}, optional) – Value used internally for the masked values. If fill_value is not None, it supersedes endwith.
Returns:

index_array – Array of indices that sort a along the specified axis. In other words, a[index_array] yields a sorted a.

Return type:

ndarray, int

See also

MaskedArray.sort()
Describes sorting algorithms used.
lexsort()
Indirect stable sort with multiple keys.
numpy.ndarray.sort()
Inplace sort.

Notes

See sort for notes on the different sorting algorithms.

Examples

>>> a = np.ma.array([3,2,1], mask=[False, False, True])
>>> a
masked_array(data=[3, 2, --],
             mask=[False, False,  True],
       fill_value=999999)
>>> a.argsort()
array([1, 0, 2])
astype(dtype, order='K', casting='unsafe', subok=True, copy=True)

Copy of the array, cast to a specified type.

Parameters:
  • dtype (str or dtype) – Typecode or data-type to which the array is cast.
  • order ({'C', 'F', 'A', 'K'}, optional) – Controls the memory layout order of the result. ‘C’ means C order, ‘F’ means Fortran order, ‘A’ means ‘F’ order if all the arrays are Fortran contiguous, ‘C’ order otherwise, and ‘K’ means as close to the order the array elements appear in memory as possible. Default is ‘K’.
  • casting ({'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional) –

    Controls what kind of data casting may occur. Defaults to ‘unsafe’ for backwards compatibility.

    • ’no’ means the data types should not be cast at all.
    • ’equiv’ means only byte-order changes are allowed.
    • ’safe’ means only casts which can preserve values are allowed.
    • ’same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed.
    • ’unsafe’ means any data conversions may be done.
  • subok (bool, optional) – If True, then sub-classes will be passed-through (default), otherwise the returned array will be forced to be a base-class array.
  • copy (bool, optional) – By default, astype always returns a newly allocated array. If this is set to false, and the dtype, order, and subok requirements are satisfied, the input array is returned instead of a copy.
Returns:

arr_t – Unless copy is False and the other conditions for returning the input array are satisfied (see description for copy input parameter), arr_t is a new array of the same shape as the input array, with dtype, order given by dtype, order.

Return type:

ndarray

Notes

Changed in version 1.17.0: Casting between a simple data type and a structured one is possible only for “unsafe” casting. Casting to multiple fields is allowed, but casting from multiple fields is not.

Changed in version 1.9.0: Casting from numeric to string types in ‘safe’ casting mode requires that the string dtype length is long enough to store the max integer/float value converted.

Raises:ComplexWarning – When casting from complex to float or int. To avoid this, one should use a.real.astype(t).

Examples

>>> x = np.array([1, 2, 2.5])
>>> x
array([1. ,  2. ,  2.5])
>>> x.astype(int)
array([1, 2, 2])
base

Base object if memory is from some other object.

Examples

The base of an array that owns its memory is None:

>>> x = np.array([1,2,3,4])
>>> x.base is None
True

Slicing creates a view, whose memory is shared with x:

>>> y = x[2:]
>>> y.base is x
True
baseclass

Class of the underlying data (read-only).

byteswap(inplace=False)

Swap the bytes of the array elements

Toggle between low-endian and big-endian data representation by returning a byteswapped array, optionally swapped in-place. Arrays of byte-strings are not swapped. The real and imaginary parts of a complex number are swapped individually.

Parameters:inplace (bool, optional) – If True, swap bytes in-place, default is False.
Returns:out – The byteswapped array. If inplace is True, this is a view to self.
Return type:ndarray

Examples

>>> A = np.array([1, 256, 8755], dtype=np.int16)
>>> list(map(hex, A))
['0x1', '0x100', '0x2233']
>>> A.byteswap(inplace=True)
array([  256,     1, 13090], dtype=int16)
>>> list(map(hex, A))
['0x100', '0x1', '0x3322']

Arrays of byte-strings are not swapped

>>> A = np.array([b'ceg', b'fac'])
>>> A.byteswap()
array([b'ceg', b'fac'], dtype='|S3')
A.newbyteorder().byteswap() produces an array with the same values
but different representation in memory
>>> A = np.array([1, 2, 3])
>>> A.view(np.uint8)
array([1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0,
       0, 0], dtype=uint8)
>>> A.newbyteorder().byteswap(inplace=True)
array([1, 2, 3])
>>> A.view(np.uint8)
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0,
       0, 3], dtype=uint8)
choose(choices, out=None, mode='raise')

Use an index array to construct a new array from a set of choices.

Refer to numpy.choose for full documentation.

See also

numpy.choose()
equivalent function
clip(min=None, max=None, out=None, **kwargs)

Return an array whose values are limited to [min, max]. One of max or min must be given.

Refer to numpy.clip for full documentation.

See also

numpy.clip()
equivalent function
compress(condition, axis=None, out=None)

Return a where condition is True.

If condition is a MaskedArray, missing values are considered as False.

Parameters:
  • condition (var) – Boolean 1-d array selecting which entries to return. If len(condition) is less than the size of a along the axis, then output is truncated to length of condition array.
  • axis ({None, int}, optional) – Axis along which the operation must be performed.
  • out ({None, ndarray}, optional) – Alternative output array in which to place the result. It must have the same shape as the expected output but the type will be cast if necessary.
Returns:

result – A MaskedArray object.

Return type:

MaskedArray

Notes

Please note the difference with compressed() ! The output of compress() has a mask, the output of compressed() does not.

Examples

>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4)
>>> x
masked_array(
  data=[[1, --, 3],
        [--, 5, --],
        [7, --, 9]],
  mask=[[False,  True, False],
        [ True, False,  True],
        [False,  True, False]],
  fill_value=999999)
>>> x.compress([1, 0, 1])
masked_array(data=[1, 3],
             mask=[False, False],
       fill_value=999999)
>>> x.compress([1, 0, 1], axis=1)
masked_array(
  data=[[1, 3],
        [--, --],
        [7, 9]],
  mask=[[False, False],
        [ True,  True],
        [False, False]],
  fill_value=999999)
compressed()

Return all the non-masked data as a 1-D array.

Returns:data – A new ndarray holding the non-masked data is returned.
Return type:ndarray

Notes

The result is not a MaskedArray!

Examples

>>> x = np.ma.array(np.arange(5), mask=[0]*2 + [1]*3)
>>> x.compressed()
array([0, 1])
>>> type(x.compressed())
<class 'numpy.ndarray'>
conj()

Complex-conjugate all elements.

Refer to numpy.conjugate for full documentation.

See also

numpy.conjugate()
equivalent function
conjugate()

Return the complex conjugate, element-wise.

Refer to numpy.conjugate for full documentation.

See also

numpy.conjugate()
equivalent function
copy(order='C')

Return a copy of the array.

Parameters:order ({'C', 'F', 'A', 'K'}, optional) – Controls the memory layout of the copy. ‘C’ means C-order, ‘F’ means F-order, ‘A’ means ‘F’ if a is Fortran contiguous, ‘C’ otherwise. ‘K’ means match the layout of a as closely as possible. (Note that this function and numpy.copy() are very similar, but have different default values for their order= arguments.)

See also

numpy.copy(), numpy.copyto()

Examples

>>> x = np.array([[1,2,3],[4,5,6]], order='F')
>>> y = x.copy()
>>> x.fill(0)
>>> x
array([[0, 0, 0],
       [0, 0, 0]])
>>> y
array([[1, 2, 3],
       [4, 5, 6]])
>>> y.flags['C_CONTIGUOUS']
True
count(axis=None, keepdims=<no value>)

Count the non-masked elements of the array along the given axis.

Parameters:
  • axis (None or int or tuple of ints, optional) –

    Axis or axes along which the count is performed. The default, None, performs the count over all the dimensions of the input array. axis may be negative, in which case it counts from the last to the first axis.

    New in version 1.10.0.

    If this is a tuple of ints, the count is performed on multiple axes, instead of a single axis or all the axes as before.

  • keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the array.
Returns:

result – An array with the same shape as the input array, with the specified axis removed. If the array is a 0-d array, or if axis is None, a scalar is returned.

Return type:

ndarray or scalar

See also

count_masked()
Count masked elements in array or along a given axis.

Examples

>>> import numpy.ma as ma
>>> a = ma.arange(6).reshape((2, 3))
>>> a[1, :] = ma.masked
>>> a
masked_array(
  data=[[0, 1, 2],
        [--, --, --]],
  mask=[[False, False, False],
        [ True,  True,  True]],
  fill_value=999999)
>>> a.count()
3

When the axis keyword is specified an array of appropriate size is returned.

>>> a.count(axis=0)
array([1, 1, 1])
>>> a.count(axis=1)
array([3, 0])
ctypes

An object to simplify the interaction of the array with the ctypes module.

This attribute creates an object that makes it easier to use arrays when calling shared libraries with the ctypes module. The returned object has, among others, data, shape, and strides attributes (see Notes below) which themselves return ctypes objects that can be used as arguments to a shared library.

Parameters:None
Returns:c – Possessing attributes data, shape, strides, etc.
Return type:Python object

See also

numpy.ctypeslib

Notes

Below are the public attributes of this object which were documented in “Guide to NumPy” (we have omitted undocumented public attributes, as well as documented private attributes):

_ctypes.data

A pointer to the memory area of the array as a Python integer. This memory area may contain data that is not aligned, or not in correct byte-order. The memory area may not even be writeable. The array flags and data-type of this array should be respected when passing this attribute to arbitrary C-code to avoid trouble that can include Python crashing. User Beware! The value of this attribute is exactly the same as self._array_interface_['data'][0].

Note that unlike data_as, a reference will not be kept to the array: code like ctypes.c_void_p((a + b).ctypes.data) will result in a pointer to a deallocated array, and should be spelt (a + b).ctypes.data_as(ctypes.c_void_p)

_ctypes.shape

A ctypes array of length self.ndim where the basetype is the C-integer corresponding to dtype('p') on this platform. This base-type could be ctypes.c_int, ctypes.c_long, or ctypes.c_longlong depending on the platform. The c_intp type is defined accordingly in numpy.ctypeslib. The ctypes array contains the shape of the underlying array.

Type:(c_intp*self.ndim)
_ctypes.strides

A ctypes array of length self.ndim where the basetype is the same as for the shape attribute. This ctypes array contains the strides information from the underlying array. This strides information is important for showing how many bytes must be jumped to get to the next element in the array.

Type:(c_intp*self.ndim)
_ctypes.data_as(obj)

Return the data pointer cast to a particular c-types object. For example, calling self._as_parameter_ is equivalent to self.data_as(ctypes.c_void_p). Perhaps you want to use the data as a pointer to a ctypes array of floating-point data: self.data_as(ctypes.POINTER(ctypes.c_double)).

The returned pointer will keep a reference to the array.

_ctypes.shape_as(obj)

Return the shape tuple as an array of some other c-types type. For example: self.shape_as(ctypes.c_short).

_ctypes.strides_as(obj)

Return the strides tuple as an array of some other c-types type. For example: self.strides_as(ctypes.c_longlong).

If the ctypes module is not available, then the ctypes attribute of array objects still returns something useful, but ctypes objects are not returned and errors may be raised instead. In particular, the object will still have the as_parameter attribute which will return an integer equal to the data attribute.

Examples

>>> import ctypes
>>> x = np.array([[0, 1], [2, 3]], dtype=np.int32)
>>> x
array([[0, 1],
       [2, 3]], dtype=int32)
>>> x.ctypes.data
31962608 # may vary
>>> x.ctypes.data_as(ctypes.POINTER(ctypes.c_uint32))
<__main__.LP_c_uint object at 0x7ff2fc1fc200> # may vary
>>> x.ctypes.data_as(ctypes.POINTER(ctypes.c_uint32)).contents
c_uint(0)
>>> x.ctypes.data_as(ctypes.POINTER(ctypes.c_uint64)).contents
c_ulong(4294967296)
>>> x.ctypes.shape
<numpy.core._internal.c_long_Array_2 object at 0x7ff2fc1fce60> # may vary
>>> x.ctypes.strides
<numpy.core._internal.c_long_Array_2 object at 0x7ff2fc1ff320> # may vary
cumprod(axis=None, dtype=None, out=None)

Return the cumulative product of the array elements over the given axis.

Masked values are set to 1 internally during the computation. However, their position is saved, and the result will be masked at the same locations.

Refer to numpy.cumprod for full documentation.

Notes

The mask is lost if out is not a valid MaskedArray !

Arithmetic is modular when using integer types, and no error is raised on overflow.

See also

numpy.ndarray.cumprod()
corresponding function for ndarrays
numpy.cumprod()
equivalent function
cumsum(axis=None, dtype=None, out=None)

Return the cumulative sum of the array elements over the given axis.

Masked values are set to 0 internally during the computation. However, their position is saved, and the result will be masked at the same locations.

Refer to numpy.cumsum for full documentation.

Notes

The mask is lost if out is not a valid MaskedArray !

Arithmetic is modular when using integer types, and no error is raised on overflow.

See also

numpy.ndarray.cumsum()
corresponding function for ndarrays
numpy.cumsum()
equivalent function

Examples

>>> marr = np.ma.array(np.arange(10), mask=[0,0,0,1,1,1,0,0,0,0])
>>> marr.cumsum()
masked_array(data=[0, 1, 3, --, --, --, 9, 16, 24, 33],
             mask=[False, False, False,  True,  True,  True, False, False,
                   False, False],
       fill_value=999999)
data

Returns the underlying data, as a view of the masked array.

If the underlying data is a subclass of numpy.ndarray, it is returned as such.

>>> x = np.ma.array(np.matrix([[1, 2], [3, 4]]), mask=[[0, 1], [1, 0]])
>>> x.data
matrix([[1, 2],
        [3, 4]])

The type of the data can be accessed through the baseclass attribute.

diagonal(offset=0, axis1=0, axis2=1)

Return specified diagonals. In NumPy 1.9 the returned array is a read-only view instead of a copy as in previous NumPy versions. In a future version the read-only restriction will be removed.

Refer to numpy.diagonal() for full documentation.

See also

numpy.diagonal()
equivalent function
dot(b, out=None)

Masked dot product of two arrays. Note that out and strict are located in different positions than in ma.dot. In order to maintain compatibility with the functional version, it is recommended that the optional arguments be treated as keyword only. At some point that may be mandatory.

New in version 1.10.0.

Parameters:
  • b (masked_array_like) – Inputs array.
  • out (masked_array, optional) – Output argument. This must have the exact kind that would be returned if it was not used. In particular, it must have the right type, must be C-contiguous, and its dtype must be the dtype that would be returned for ma.dot(a,b). This is a performance feature. Therefore, if these conditions are not met, an exception is raised, instead of attempting to be flexible.
  • strict (bool, optional) –

    Whether masked data are propagated (True) or set to 0 (False) for the computation. Default is False. Propagating the mask means that if a masked value appears in a row or column, the whole row or column is considered masked.

    New in version 1.10.2.

See also

numpy.ma.dot()
equivalent function
dtype

Data-type of the array’s elements.

Parameters:None
Returns:d
Return type:numpy dtype object

See also

numpy.dtype

Examples

>>> x
array([[0, 1],
       [2, 3]])
>>> x.dtype
dtype('int32')
>>> type(x.dtype)
<type 'numpy.dtype'>
dump(file)

Dump a pickle of the array to the specified file. The array can be read back with pickle.load or numpy.load.

Parameters:file (str or Path) –

A string naming the dump file.

Changed in version 1.17.0: pathlib.Path objects are now accepted.

dumps()

Returns the pickle of the array as a string. pickle.loads or numpy.loads will convert the string back to an array.

Parameters:None
fill(value)

Fill the array with a scalar value.

Parameters:value (scalar) – All elements of a will be assigned this value.

Examples

>>> a = np.array([1, 2])
>>> a.fill(0)
>>> a
array([0, 0])
>>> a = np.empty(2)
>>> a.fill(1)
>>> a
array([1.,  1.])
fill_value

The filling value of the masked array is a scalar. When setting, None will set to a default based on the data type.

Examples

>>> for dt in [np.int32, np.int64, np.float64, np.complex128]:
...     np.ma.array([0, 1], dtype=dt).get_fill_value()
...
999999
999999
1e+20
(1e+20+0j)
>>> x = np.ma.array([0, 1.], fill_value=-np.inf)
>>> x.fill_value
-inf
>>> x.fill_value = np.pi
>>> x.fill_value
3.1415926535897931 # may vary

Reset to default:

>>> x.fill_value = None
>>> x.fill_value
1e+20
filled(fill_value=None)

Return a copy of self, with masked values filled with a given value. However, if there are no masked values to fill, self will be returned instead as an ndarray.

Parameters:fill_value (array_like, optional) – The value to use for invalid entries. Can be scalar or non-scalar. If non-scalar, the resulting ndarray must be broadcastable over input array. Default is None, in which case, the fill_value attribute of the array is used instead.
Returns:filled_array – A copy of self with invalid entries replaced by fill_value (be it the function argument or the attribute of self), or self itself as an ndarray if there are no invalid entries to be replaced.
Return type:ndarray

Notes

The result is not a MaskedArray!

Examples

>>> x = np.ma.array([1,2,3,4,5], mask=[0,0,1,0,1], fill_value=-999)
>>> x.filled()
array([   1,    2, -999,    4, -999])
>>> x.filled(fill_value=1000)
array([   1,    2, 1000,    4, 1000])
>>> type(x.filled())
<class 'numpy.ndarray'>

Subclassing is preserved. This means that if, e.g., the data part of the masked array is a recarray, filled returns a recarray:

>>> x = np.array([(-1, 2), (-3, 4)], dtype='i8,i8').view(np.recarray)
>>> m = np.ma.array(x, mask=[(True, False), (False, True)])
>>> m.filled()
rec.array([(999999,      2), (    -3, 999999)],
          dtype=[('f0', '<i8'), ('f1', '<i8')])
flags

Information about the memory layout of the array.

C_CONTIGUOUS(C)

The data is in a single, C-style contiguous segment.

F_CONTIGUOUS(F)

The data is in a single, Fortran-style contiguous segment.

OWNDATA(O)

The array owns the memory it uses or borrows it from another object.

WRITEABLE(W)

The data area can be written to. Setting this to False locks the data, making it read-only. A view (slice, etc.) inherits WRITEABLE from its base array at creation time, but a view of a writeable array may be subsequently locked while the base array remains writeable. (The opposite is not true, in that a view of a locked array may not be made writeable. However, currently, locking a base object does not lock any views that already reference it, so under that circumstance it is possible to alter the contents of a locked array via a previously created writeable view onto it.) Attempting to change a non-writeable array raises a RuntimeError exception.

ALIGNED(A)

The data and all elements are aligned appropriately for the hardware.

WRITEBACKIFCOPY(X)

This array is a copy of some other array. The C-API function PyArray_ResolveWritebackIfCopy must be called before deallocating to the base array will be updated with the contents of this array.

UPDATEIFCOPY(U)

(Deprecated, use WRITEBACKIFCOPY) This array is a copy of some other array. When this array is deallocated, the base array will be updated with the contents of this array.

FNC

F_CONTIGUOUS and not C_CONTIGUOUS.

FORC

F_CONTIGUOUS or C_CONTIGUOUS (one-segment test).

BEHAVED(B)

ALIGNED and WRITEABLE.

CARRAY(CA)

BEHAVED and C_CONTIGUOUS.

FARRAY(FA)

BEHAVED and F_CONTIGUOUS and not C_CONTIGUOUS.

Notes

The flags object can be accessed dictionary-like (as in a.flags['WRITEABLE']), or by using lowercased attribute names (as in a.flags.writeable). Short flag names are only supported in dictionary access.

Only the WRITEBACKIFCOPY, UPDATEIFCOPY, WRITEABLE, and ALIGNED flags can be changed by the user, via direct assignment to the attribute or dictionary entry, or by calling ndarray.setflags.

The array flags cannot be set arbitrarily:

  • UPDATEIFCOPY can only be set False.
  • WRITEBACKIFCOPY can only be set False.
  • ALIGNED can only be set True if the data is truly aligned.
  • WRITEABLE can only be set True if the array owns its own memory or the ultimate owner of the memory exposes a writeable buffer interface or is a string.

Arrays can be both C-style and Fortran-style contiguous simultaneously. This is clear for 1-dimensional arrays, but can also be true for higher dimensional arrays.

Even for contiguous arrays a stride for a given dimension arr.strides[dim] may be arbitrary if arr.shape[dim] == 1 or the array has no elements. It does not generally hold that self.strides[-1] == self.itemsize for C-style contiguous arrays or self.strides[0] == self.itemsize for Fortran-style contiguous arrays is true.

flat

Return a flat iterator, or set a flattened version of self to value.

flatten(order='C')

Return a copy of the array collapsed into one dimension.

Parameters:order ({'C', 'F', 'A', 'K'}, optional) – ‘C’ means to flatten in row-major (C-style) order. ‘F’ means to flatten in column-major (Fortran- style) order. ‘A’ means to flatten in column-major order if a is Fortran contiguous in memory, row-major order otherwise. ‘K’ means to flatten a in the order the elements occur in memory. The default is ‘C’.
Returns:y – A copy of the input array, flattened to one dimension.
Return type:ndarray

See also

ravel()
Return a flattened array.
flat()
A 1-D flat iterator over the array.

Examples

>>> a = np.array([[1,2], [3,4]])
>>> a.flatten()
array([1, 2, 3, 4])
>>> a.flatten('F')
array([1, 3, 2, 4])
get_fill_value()

The filling value of the masked array is a scalar. When setting, None will set to a default based on the data type.

Examples

>>> for dt in [np.int32, np.int64, np.float64, np.complex128]:
...     np.ma.array([0, 1], dtype=dt).get_fill_value()
...
999999
999999
1e+20
(1e+20+0j)
>>> x = np.ma.array([0, 1.], fill_value=-np.inf)
>>> x.fill_value
-inf
>>> x.fill_value = np.pi
>>> x.fill_value
3.1415926535897931 # may vary

Reset to default:

>>> x.fill_value = None
>>> x.fill_value
1e+20
get_imag()

The imaginary part of the masked array.

This property is a view on the imaginary part of this MaskedArray.

See also

real()

Examples

>>> x = np.ma.array([1+1.j, -2j, 3.45+1.6j], mask=[False, True, False])
>>> x.imag
masked_array(data=[1.0, --, 1.6],
             mask=[False,  True, False],
       fill_value=1e+20)
get_real()

The real part of the masked array.

This property is a view on the real part of this MaskedArray.

See also

imag()

Examples

>>> x = np.ma.array([1+1.j, -2j, 3.45+1.6j], mask=[False, True, False])
>>> x.real
masked_array(data=[1.0, --, 3.45],
             mask=[False,  True, False],
       fill_value=1e+20)
getfield(dtype, offset=0)

Returns a field of the given array as a certain type.

A field is a view of the array data with a given data-type. The values in the view are determined by the given type and the offset into the current array in bytes. The offset needs to be such that the view dtype fits in the array dtype; for example an array of dtype complex128 has 16-byte elements. If taking a view with a 32-bit integer (4 bytes), the offset needs to be between 0 and 12 bytes.

Parameters:
  • dtype (str or dtype) – The data type of the view. The dtype size of the view can not be larger than that of the array itself.
  • offset (int) – Number of bytes to skip before beginning the element view.

Examples

>>> x = np.diag([1.+1.j]*2)
>>> x[1, 1] = 2 + 4.j
>>> x
array([[1.+1.j,  0.+0.j],
       [0.+0.j,  2.+4.j]])
>>> x.getfield(np.float64)
array([[1.,  0.],
       [0.,  2.]])

By choosing an offset of 8 bytes we can select the complex part of the array for our view:

>>> x.getfield(np.float64, offset=8)
array([[1.,  0.],
       [0.,  4.]])
harden_mask()

Force the mask to hard.

Whether the mask of a masked array is hard or soft is determined by its hardmask property. harden_mask sets hardmask to True.

See also

hardmask()

hardmask

Hardness of the mask

ids()

Return the addresses of the data and mask areas.

Parameters:None

Examples

>>> x = np.ma.array([1, 2, 3], mask=[0, 1, 1])
>>> x.ids()
(166670640, 166659832) # may vary

If the array has no mask, the address of nomask is returned. This address is typically not close to the data in memory:

>>> x = np.ma.array([1, 2, 3])
>>> x.ids()
(166691080, 3083169284) # may vary
imag

The imaginary part of the masked array.

This property is a view on the imaginary part of this MaskedArray.

See also

real

Examples

>>> x = np.ma.array([1+1.j, -2j, 3.45+1.6j], mask=[False, True, False])
>>> x.imag
masked_array(data=[1.0, --, 1.6],
             mask=[False,  True, False],
       fill_value=1e+20)
iscontiguous()

Return a boolean indicating whether the data is contiguous.

Parameters:None

Examples

>>> x = np.ma.array([1, 2, 3])
>>> x.iscontiguous()
True

iscontiguous returns one of the flags of the masked array:

>>> x.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False
item(*args)

Copy an element of an array to a standard Python scalar and return it.

Parameters:*args (Arguments (variable number and type)) –
  • none: in this case, the method only works for arrays with one element (a.size == 1), which element is copied into a standard Python scalar object and returned.
  • int_type: this argument is interpreted as a flat index into the array, specifying which element to copy and return.
  • tuple of int_types: functions as does a single int_type argument, except that the argument is interpreted as an nd-index into the array.
Returns:z – A copy of the specified element of the array as a suitable Python scalar
Return type:Standard Python scalar object

Notes

When the data type of a is longdouble or clongdouble, item() returns a scalar array object because there is no available Python scalar that would not lose information. Void arrays return a buffer object for item(), unless fields are defined, in which case a tuple is returned.

item is very similar to a[args], except, instead of an array scalar, a standard Python scalar is returned. This can be useful for speeding up access to elements of the array and doing arithmetic on elements of the array using Python’s optimized math.

Examples

>>> np.random.seed(123)
>>> x = np.random.randint(9, size=(3, 3))
>>> x
array([[2, 2, 6],
       [1, 3, 6],
       [1, 0, 1]])
>>> x.item(3)
1
>>> x.item(7)
0
>>> x.item((0, 1))
2
>>> x.item((2, 2))
1
itemset(*args)

Insert scalar into an array (scalar is cast to array’s dtype, if possible)

There must be at least 1 argument, and define the last argument as item. Then, a.itemset(*args) is equivalent to but faster than a[args] = item. The item should be a scalar value and args must select a single item in the array a.

Parameters:*args (Arguments) – If one argument: a scalar, only used in case a is of size 1. If two arguments: the last argument is the value to be set and must be a scalar, the first argument specifies a single array element location. It is either an int or a tuple.

Notes

Compared to indexing syntax, itemset provides some speed increase for placing a scalar into a particular location in an ndarray, if you must do this. However, generally this is discouraged: among other problems, it complicates the appearance of the code. Also, when using itemset (and item) inside a loop, be sure to assign the methods to a local variable to avoid the attribute look-up at each loop iteration.

Examples

>>> np.random.seed(123)
>>> x = np.random.randint(9, size=(3, 3))
>>> x
array([[2, 2, 6],
       [1, 3, 6],
       [1, 0, 1]])
>>> x.itemset(4, 0)
>>> x.itemset((2, 2), 9)
>>> x
array([[2, 2, 6],
       [1, 0, 6],
       [1, 0, 9]])
itemsize

Length of one array element in bytes.

Examples

>>> x = np.array([1,2,3], dtype=np.float64)
>>> x.itemsize
8
>>> x = np.array([1,2,3], dtype=np.complex128)
>>> x.itemsize
16
mask

Current mask.

max(axis=None, out=None, fill_value=None, keepdims=<no value>)

Return the maximum along a given axis.

Parameters:
  • axis ({None, int}, optional) – Axis along which to operate. By default, axis is None and the flattened input is used.
  • out (array_like, optional) – Alternative output array in which to place the result. Must be of the same shape and buffer length as the expected output.
  • fill_value ({var}, optional) – Value used to fill in the masked values. If None, use the output of maximum_fill_value().
  • keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the array.
Returns:

amax – New array holding the result. If out was specified, out is returned.

Return type:

array_like

See also

maximum_fill_value()
Returns the maximum filling value for a given datatype.
mean(axis=None, dtype=None, out=None, keepdims=<no value>)

Returns the average of the array elements along given axis.

Masked entries are ignored, and result elements which are not finite will be masked.

Refer to numpy.mean for full documentation.

See also

numpy.ndarray.mean()
corresponding function for ndarrays
numpy.mean()
Equivalent function
numpy.ma.average()
Weighted average.

Examples

>>> a = np.ma.array([1,2,3], mask=[False, False, True])
>>> a
masked_array(data=[1, 2, --],
             mask=[False, False,  True],
       fill_value=999999)
>>> a.mean()
1.5
min(axis=None, out=None, fill_value=None, keepdims=<no value>)

Return the minimum along a given axis.

Parameters:
  • axis ({None, int}, optional) – Axis along which to operate. By default, axis is None and the flattened input is used.
  • out (array_like, optional) – Alternative output array in which to place the result. Must be of the same shape and buffer length as the expected output.
  • fill_value ({var}, optional) – Value used to fill in the masked values. If None, use the output of minimum_fill_value.
  • keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the array.
Returns:

amin – New array holding the result. If out was specified, out is returned.

Return type:

array_like

See also

minimum_fill_value()
Returns the minimum filling value for a given datatype.
mini(axis=None)

Return the array minimum along the specified axis.

Deprecated since version 1.13.0: This function is identical to both:

  • self.min(keepdims=True, axis=axis).squeeze(axis=axis)
  • np.ma.minimum.reduce(self, axis=axis)

Typically though, self.min(axis=axis) is sufficient.

Parameters:axis (int, optional) – The axis along which to find the minima. Default is None, in which case the minimum value in the whole array is returned.
Returns:min – If axis is None, the result is a scalar. Otherwise, if axis is given and the array is at least 2-D, the result is a masked array with dimension one smaller than the array on which mini is called.
Return type:scalar or MaskedArray

Examples

>>> x = np.ma.array(np.arange(6), mask=[0 ,1, 0, 0, 0 ,1]).reshape(3, 2)
>>> x
masked_array(
  data=[[0, --],
        [2, 3],
        [4, --]],
  mask=[[False,  True],
        [False, False],
        [False,  True]],
  fill_value=999999)
>>> x.mini()
masked_array(data=0,
             mask=False,
       fill_value=999999)
>>> x.mini(axis=0)
masked_array(data=[0, 3],
             mask=[False, False],
       fill_value=999999)
>>> x.mini(axis=1)
masked_array(data=[0, 2, 4],
             mask=[False, False, False],
       fill_value=999999)

There is a small difference between mini and min:

>>> x[:,1].mini(axis=0)
masked_array(data=3,
             mask=False,
       fill_value=999999)
>>> x[:,1].min(axis=0)
3
nbytes

Total bytes consumed by the elements of the array.

Notes

Does not include memory consumed by non-element attributes of the array object.

Examples

>>> x = np.zeros((3,5,2), dtype=np.complex128)
>>> x.nbytes
480
>>> np.prod(x.shape) * x.itemsize
480
ndim

Number of array dimensions.

Examples

>>> x = np.array([1, 2, 3])
>>> x.ndim
1
>>> y = np.zeros((2, 3, 4))
>>> y.ndim
3
newbyteorder(new_order='S')

Return the array with the same data viewed with a different byte order.

Equivalent to:

arr.view(arr.dtype.newbytorder(new_order))

Changes are also made in all fields and sub-arrays of the array data type.

Parameters:new_order (string, optional) –

Byte order to force; a value from the byte order specifications below. new_order codes can be any of:

  • ’S’ - swap dtype from current to opposite endian
  • {‘<’, ‘L’} - little endian
  • {‘>’, ‘B’} - big endian
  • {‘=’, ‘N’} - native order
  • {‘|’, ‘I’} - ignore (no change to byte order)

The default value (‘S’) results in swapping the current byte order. The code does a case-insensitive check on the first letter of new_order for the alternatives above. For example, any of ‘B’ or ‘b’ or ‘biggish’ are valid to specify big-endian.

Returns:new_arr – New array object with the dtype reflecting given change to the byte order.
Return type:array
nonzero()

Return the indices of unmasked elements that are not zero.

Returns a tuple of arrays, one for each dimension, containing the indices of the non-zero elements in that dimension. The corresponding non-zero values can be obtained with:

a[a.nonzero()]

To group the indices by element, rather than dimension, use instead:

np.transpose(a.nonzero())

The result of this is always a 2d array, with a row for each non-zero element.

Parameters:None
Returns:tuple_of_arrays – Indices of elements that are non-zero.
Return type:tuple

See also

numpy.nonzero()
Function operating on ndarrays.
flatnonzero()
Return indices that are non-zero in the flattened version of the input array.
numpy.ndarray.nonzero()
Equivalent ndarray method.
count_nonzero()
Counts the number of non-zero elements in the input array.

Examples

>>> import numpy.ma as ma
>>> x = ma.array(np.eye(3))
>>> x
masked_array(
  data=[[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]],
  mask=False,
  fill_value=1e+20)
>>> x.nonzero()
(array([0, 1, 2]), array([0, 1, 2]))

Masked elements are ignored.

>>> x[1, 1] = ma.masked
>>> x
masked_array(
  data=[[1.0, 0.0, 0.0],
        [0.0, --, 0.0],
        [0.0, 0.0, 1.0]],
  mask=[[False, False, False],
        [False,  True, False],
        [False, False, False]],
  fill_value=1e+20)
>>> x.nonzero()
(array([0, 2]), array([0, 2]))

Indices can also be grouped by element.

>>> np.transpose(x.nonzero())
array([[0, 0],
       [2, 2]])

A common use for nonzero is to find the indices of an array, where a condition is True. Given an array a, the condition a > 3 is a boolean array and since False is interpreted as 0, ma.nonzero(a > 3) yields the indices of the a where the condition is true.

>>> a = ma.array([[1,2,3],[4,5,6],[7,8,9]])
>>> a > 3
masked_array(
  data=[[False, False, False],
        [ True,  True,  True],
        [ True,  True,  True]],
  mask=False,
  fill_value=True)
>>> ma.nonzero(a > 3)
(array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))

The nonzero method of the condition array can also be called.

>>> (a > 3).nonzero()
(array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))
partition(kth, axis=-1, kind='introselect', order=None)

Rearranges the elements in the array in such a way that the value of the element in kth position is in the position it would be in a sorted array. All elements smaller than the kth element are moved before this element and all equal or greater are moved behind it. The ordering of the elements in the two partitions is undefined.

New in version 1.8.0.

Parameters:
  • kth (int or sequence of ints) – Element index to partition by. The kth element value will be in its final sorted position and all smaller elements will be moved before it and all equal or greater elements behind it. The order of all elements in the partitions is undefined. If provided with a sequence of kth it will partition all elements indexed by kth of them into their sorted position at once.
  • axis (int, optional) – Axis along which to sort. Default is -1, which means sort along the last axis.
  • kind ({'introselect'}, optional) – Selection algorithm. Default is ‘introselect’.
  • order (str or list of str, optional) – When a is an array with fields defined, this argument specifies which fields to compare first, second, etc. A single field can be specified as a string, and not all fields need to be specified, but unspecified fields will still be used, in the order in which they come up in the dtype, to break ties.

See also

numpy.partition()
Return a parititioned copy of an array.
argpartition()
Indirect partition.
sort()
Full sort.

Notes

See np.partition for notes on the different algorithms.

Examples

>>> a = np.array([3, 4, 2, 1])
>>> a.partition(3)
>>> a
array([2, 1, 3, 4])
>>> a.partition((1, 3))
>>> a
array([1, 2, 3, 4])
prod(axis=None, dtype=None, out=None, keepdims=<no value>)

Return the product of the array elements over the given axis.

Masked elements are set to 1 internally for computation.

Refer to numpy.prod for full documentation.

Notes

Arithmetic is modular when using integer types, and no error is raised on overflow.

See also

numpy.ndarray.prod()
corresponding function for ndarrays
numpy.prod()
equivalent function
product(axis=None, dtype=None, out=None, keepdims=<no value>)

Return the product of the array elements over the given axis.

Masked elements are set to 1 internally for computation.

Refer to numpy.prod for full documentation.

Notes

Arithmetic is modular when using integer types, and no error is raised on overflow.

See also

numpy.ndarray.prod()
corresponding function for ndarrays
numpy.prod()
equivalent function
ptp(axis=None, out=None, fill_value=None, keepdims=False)

Return (maximum - minimum) along the given dimension (i.e. peak-to-peak value).

Warning

ptp preserves the data type of the array. This means the return value for an input of signed integers with n bits (e.g. np.int8, np.int16, etc) is also a signed integer with n bits. In that case, peak-to-peak values greater than 2**(n-1)-1 will be returned as negative values. An example with a work-around is shown below.

Parameters:
  • axis ({None, int}, optional) – Axis along which to find the peaks. If None (default) the flattened array is used.
  • out ({None, array_like}, optional) – Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output but the type will be cast if necessary.
  • fill_value ({var}, optional) – Value used to fill in the masked values.
  • keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the array.
Returns:

ptp – A new array holding the result, unless out was specified, in which case a reference to out is returned.

Return type:

ndarray.

Examples

>>> x = np.ma.MaskedArray([[4, 9, 2, 10],
...                        [6, 9, 7, 12]])
>>> x.ptp(axis=1)
masked_array(data=[8, 6],
             mask=False,
       fill_value=999999)
>>> x.ptp(axis=0)
masked_array(data=[2, 0, 5, 2],
             mask=False,
       fill_value=999999)
>>> x.ptp()
10

This example shows that a negative value can be returned when the input is an array of signed integers.

>>> y = np.ma.MaskedArray([[1, 127],
...                        [0, 127],
...                        [-1, 127],
...                        [-2, 127]], dtype=np.int8)
>>> y.ptp(axis=1)
masked_array(data=[ 126,  127, -128, -127],
             mask=False,
       fill_value=999999,
            dtype=int8)

A work-around is to use the view() method to view the result as unsigned integers with the same bit width:

>>> y.ptp(axis=1).view(np.uint8)
masked_array(data=[126, 127, 128, 129],
             mask=False,
       fill_value=999999,
            dtype=uint8)
put(indices, values, mode='raise')

Set storage-indexed locations to corresponding values.

Sets self._data.flat[n] = values[n] for each n in indices. If values is shorter than indices then it will repeat. If values has some masked values, the initial mask is updated in consequence, else the corresponding values are unmasked.

Parameters:
  • indices (1-D array_like) – Target indices, interpreted as integers.
  • values (array_like) – Values to place in self._data copy at target indices.
  • mode ({'raise', 'wrap', 'clip'}, optional) – Specifies how out-of-bounds indices will behave. ‘raise’ : raise an error. ‘wrap’ : wrap around. ‘clip’ : clip to the range.

Notes

values can be a scalar or length 1 array.

Examples

>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4)
>>> x
masked_array(
  data=[[1, --, 3],
        [--, 5, --],
        [7, --, 9]],
  mask=[[False,  True, False],
        [ True, False,  True],
        [False,  True, False]],
  fill_value=999999)
>>> x.put([0,4,8],[10,20,30])
>>> x
masked_array(
  data=[[10, --, 3],
        [--, 20, --],
        [7, --, 30]],
  mask=[[False,  True, False],
        [ True, False,  True],
        [False,  True, False]],
  fill_value=999999)
>>> x.put(4,999)
>>> x
masked_array(
  data=[[10, --, 3],
        [--, 999, --],
        [7, --, 30]],
  mask=[[False,  True, False],
        [ True, False,  True],
        [False,  True, False]],
  fill_value=999999)
ravel(order='C')

Returns a 1D version of self, as a view.

Parameters:order ({'C', 'F', 'A', 'K'}, optional) – The elements of a are read using this index order. ‘C’ means to index the elements in C-like order, with the last axis index changing fastest, back to the first axis index changing slowest. ‘F’ means to index the elements in Fortran-like index order, with the first index changing fastest, and the last index changing slowest. Note that the ‘C’ and ‘F’ options take no account of the memory layout of the underlying array, and only refer to the order of axis indexing. ‘A’ means to read the elements in Fortran-like index order if m is Fortran contiguous in memory, C-like order otherwise. ‘K’ means to read the elements in the order they occur in memory, except for reversing the data when strides are negative. By default, ‘C’ index order is used.
Returns:Output view is of shape (self.size,) (or (np.ma.product(self.shape),)).
Return type:MaskedArray

Examples

>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4)
>>> x
masked_array(
  data=[[1, --, 3],
        [--, 5, --],
        [7, --, 9]],
  mask=[[False,  True, False],
        [ True, False,  True],
        [False,  True, False]],
  fill_value=999999)
>>> x.ravel()
masked_array(data=[1, --, 3, --, 5, --, 7, --, 9],
             mask=[False,  True, False,  True, False,  True, False,  True,
                   False],
       fill_value=999999)
real

The real part of the masked array.

This property is a view on the real part of this MaskedArray.

See also

imag

Examples

>>> x = np.ma.array([1+1.j, -2j, 3.45+1.6j], mask=[False, True, False])
>>> x.real
masked_array(data=[1.0, --, 3.45],
             mask=[False,  True, False],
       fill_value=1e+20)
recordmask

Get or set the mask of the array if it has no named fields. For structured arrays, returns a ndarray of booleans where entries are True if all the fields are masked, False otherwise:

>>> x = np.ma.array([(1, 1), (2, 2), (3, 3), (4, 4), (5, 5)],
...         mask=[(0, 0), (1, 0), (1, 1), (0, 1), (0, 0)],
...        dtype=[('a', int), ('b', int)])
>>> x.recordmask
array([False, False,  True, False, False])
repeat(repeats, axis=None)

Repeat elements of an array.

Refer to numpy.repeat for full documentation.

See also

numpy.repeat()
equivalent function
reshape(*s, **kwargs)

Give a new shape to the array without changing its data.

Returns a masked array containing the same data, but with a new shape. The result is a view on the original array; if this is not possible, a ValueError is raised.

Parameters:
  • shape (int or tuple of ints) – The new shape should be compatible with the original shape. If an integer is supplied, then the result will be a 1-D array of that length.
  • order ({'C', 'F'}, optional) – Determines whether the array data should be viewed as in C (row-major) or FORTRAN (column-major) order.
Returns:

reshaped_array – A new view on the array.

Return type:

array

See also

reshape()
Equivalent function in the masked array module.
numpy.ndarray.reshape()
Equivalent method on ndarray object.
numpy.reshape()
Equivalent function in the NumPy module.

Notes

The reshaping operation cannot guarantee that a copy will not be made, to modify the shape in place, use a.shape = s

Examples

>>> x = np.ma.array([[1,2],[3,4]], mask=[1,0,0,1])
>>> x
masked_array(
  data=[[--, 2],
        [3, --]],
  mask=[[ True, False],
        [False,  True]],
  fill_value=999999)
>>> x = x.reshape((4,1))
>>> x
masked_array(
  data=[[--],
        [2],
        [3],
        [--]],
  mask=[[ True],
        [False],
        [False],
        [ True]],
  fill_value=999999)
resize(newshape, refcheck=True, order=False)

Warning

This method does nothing, except raise a ValueError exception. A masked array does not own its data and therefore cannot safely be resized in place. Use the numpy.ma.resize function instead.

This method is difficult to implement safely and may be deprecated in future releases of NumPy.

round(decimals=0, out=None)

Return each element rounded to the given number of decimals.

Refer to numpy.around for full documentation.

See also

numpy.ndarray.around()
corresponding function for ndarrays
numpy.around()
equivalent function
searchsorted(v, side='left', sorter=None)

Find indices where elements of v should be inserted in a to maintain order.

For full documentation, see numpy.searchsorted

See also

numpy.searchsorted()
equivalent function
setfield(val, dtype, offset=0)

Put a value into a specified place in a field defined by a data-type.

Place val into a’s field defined by dtype and beginning offset bytes into the field.

Parameters:
  • val (object) – Value to be placed in field.
  • dtype (dtype object) – Data-type of the field in which to place val.
  • offset (int, optional) – The number of bytes into the field at which to place val.
Returns:

Return type:

None

See also

getfield()

Examples

>>> x = np.eye(3)
>>> x.getfield(np.float64)
array([[1.,  0.,  0.],
       [0.,  1.,  0.],
       [0.,  0.,  1.]])
>>> x.setfield(3, np.int32)
>>> x.getfield(np.int32)
array([[3, 3, 3],
       [3, 3, 3],
       [3, 3, 3]], dtype=int32)
>>> x
array([[1.0e+000, 1.5e-323, 1.5e-323],
       [1.5e-323, 1.0e+000, 1.5e-323],
       [1.5e-323, 1.5e-323, 1.0e+000]])
>>> x.setfield(np.eye(3), np.int32)
>>> x
array([[1.,  0.,  0.],
       [0.,  1.,  0.],
       [0.,  0.,  1.]])
setflags(write=None, align=None, uic=None)

Set array flags WRITEABLE, ALIGNED, (WRITEBACKIFCOPY and UPDATEIFCOPY), respectively.

These Boolean-valued flags affect how numpy interprets the memory area used by a (see Notes below). The ALIGNED flag can only be set to True if the data is actually aligned according to the type. The WRITEBACKIFCOPY and (deprecated) UPDATEIFCOPY flags can never be set to True. The flag WRITEABLE can only be set to True if the array owns its own memory, or the ultimate owner of the memory exposes a writeable buffer interface, or is a string. (The exception for string is made so that unpickling can be done without copying memory.)

Parameters:
  • write (bool, optional) – Describes whether or not a can be written to.
  • align (bool, optional) – Describes whether or not a is aligned properly for its type.
  • uic (bool, optional) – Describes whether or not a is a copy of another “base” array.

Notes

Array flags provide information about how the memory area used for the array is to be interpreted. There are 7 Boolean flags in use, only four of which can be changed by the user: WRITEBACKIFCOPY, UPDATEIFCOPY, WRITEABLE, and ALIGNED.

WRITEABLE (W) the data area can be written to;

ALIGNED (A) the data and strides are aligned appropriately for the hardware (as determined by the compiler);

UPDATEIFCOPY (U) (deprecated), replaced by WRITEBACKIFCOPY;

WRITEBACKIFCOPY (X) this array is a copy of some other array (referenced by .base). When the C-API function PyArray_ResolveWritebackIfCopy is called, the base array will be updated with the contents of this array.

All flags can be accessed using the single (upper case) letter as well as the full name.

Examples

>>> y = np.array([[3, 1, 7],
...               [2, 0, 0],
...               [8, 5, 9]])
>>> y
array([[3, 1, 7],
       [2, 0, 0],
       [8, 5, 9]])
>>> y.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False
>>> y.setflags(write=0, align=0)
>>> y.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : False
  ALIGNED : False
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False
>>> y.setflags(uic=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot set WRITEBACKIFCOPY flag to True
shape

Tuple of array dimensions.

The shape property is usually used to get the current shape of an array, but may also be used to reshape the array in-place by assigning a tuple of array dimensions to it. As with numpy.reshape, one of the new shape dimensions can be -1, in which case its value is inferred from the size of the array and the remaining dimensions. Reshaping an array in-place will fail if a copy is required.

Examples

>>> x = np.array([1, 2, 3, 4])
>>> x.shape
(4,)
>>> y = np.zeros((2, 3, 4))
>>> y.shape
(2, 3, 4)
>>> y.shape = (3, 8)
>>> y
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])
>>> y.shape = (3, 6)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: total size of new array must be unchanged
>>> np.zeros((4,2))[::2].shape = (-1,)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: Incompatible shape for in-place modification. Use
`.reshape()` to make a copy with the desired shape.

See also

numpy.reshape
similar function
ndarray.reshape
similar method
sharedmask

Share status of the mask (read-only).

shrink_mask()

Reduce a mask to nomask when possible.

Parameters:None
Returns:
Return type:None

Examples

>>> x = np.ma.array([[1,2 ], [3, 4]], mask=[0]*4)
>>> x.mask
array([[False, False],
       [False, False]])
>>> x.shrink_mask()
masked_array(
  data=[[1, 2],
        [3, 4]],
  mask=False,
  fill_value=999999)
>>> x.mask
False
size

Number of elements in the array.

Equal to np.prod(a.shape), i.e., the product of the array’s dimensions.

Notes

a.size returns a standard arbitrary precision Python integer. This may not be the case with other methods of obtaining the same value (like the suggested np.prod(a.shape), which returns an instance of np.int_), and may be relevant if the value is used further in calculations that may overflow a fixed size integer type.

Examples

>>> x = np.zeros((3, 5, 2), dtype=np.complex128)
>>> x.size
30
>>> np.prod(x.shape)
30
soften_mask()

Force the mask to soft.

Whether the mask of a masked array is hard or soft is determined by its hardmask property. soften_mask sets hardmask to False.

See also

hardmask()

sort(axis=-1, kind=None, order=None, endwith=True, fill_value=None)

Sort the array, in-place

Parameters:
  • a (array_like) – Array to be sorted.
  • axis (int, optional) – Axis along which to sort. If None, the array is flattened before sorting. The default is -1, which sorts along the last axis.
  • kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, optional) – The sorting algorithm used.
  • order (list, optional) – When a is a structured array, this argument specifies which fields to compare first, second, and so on. This list does not need to include all of the fields.
  • endwith ({True, False}, optional) – Whether missing values (if any) should be treated as the largest values (True) or the smallest values (False) When the array contains unmasked values sorting at the same extremes of the datatype, the ordering of these values and the masked values is undefined.
  • fill_value ({var}, optional) – Value used internally for the masked values. If fill_value is not None, it supersedes endwith.
Returns:

sorted_array – Array of the same type and shape as a.

Return type:

ndarray

See also

numpy.ndarray.sort()
Method to sort an array in-place.
argsort()
Indirect sort.
lexsort()
Indirect stable sort on multiple keys.
searchsorted()
Find elements in a sorted array.

Notes

See sort for notes on the different sorting algorithms.

Examples

>>> a = np.ma.array([1, 2, 5, 4, 3],mask=[0, 1, 0, 1, 0])
>>> # Default
>>> a.sort()
>>> a
masked_array(data=[1, 3, 5, --, --],
             mask=[False, False, False,  True,  True],
       fill_value=999999)
>>> a = np.ma.array([1, 2, 5, 4, 3],mask=[0, 1, 0, 1, 0])
>>> # Put missing values in the front
>>> a.sort(endwith=False)
>>> a
masked_array(data=[--, --, 1, 3, 5],
             mask=[ True,  True, False, False, False],
       fill_value=999999)
>>> a = np.ma.array([1, 2, 5, 4, 3],mask=[0, 1, 0, 1, 0])
>>> # fill_value takes over endwith
>>> a.sort(endwith=False, fill_value=3)
>>> a
masked_array(data=[1, --, --, 3, 5],
             mask=[False,  True,  True, False, False],
       fill_value=999999)
squeeze(axis=None)

Remove single-dimensional entries from the shape of a.

Refer to numpy.squeeze for full documentation.

See also

numpy.squeeze()
equivalent function
std(axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)

Returns the standard deviation of the array elements along given axis.

Masked entries are ignored.

Refer to numpy.std for full documentation.

See also

numpy.ndarray.std()
corresponding function for ndarrays
numpy.std()
Equivalent function
strides

Tuple of bytes to step in each dimension when traversing an array.

The byte offset of element (i[0], i[1], ..., i[n]) in an array a is:

offset = sum(np.array(i) * a.strides)

A more detailed explanation of strides can be found in the “ndarray.rst” file in the NumPy reference guide.

Notes

Imagine an array of 32-bit integers (each 4 bytes):

x = np.array([[0, 1, 2, 3, 4],
              [5, 6, 7, 8, 9]], dtype=np.int32)

This array is stored in memory as 40 bytes, one after the other (known as a contiguous block of memory). The strides of an array tell us how many bytes we have to skip in memory to move to the next position along a certain axis. For example, we have to skip 4 bytes (1 value) to move to the next column, but 20 bytes (5 values) to get to the same position in the next row. As such, the strides for the array x will be (20, 4).

See also

numpy.lib.stride_tricks.as_strided

Examples

>>> y = np.reshape(np.arange(2*3*4), (2,3,4))
>>> y
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],
       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])
>>> y.strides
(48, 16, 4)
>>> y[1,1,1]
17
>>> offset=sum(y.strides * np.array((1,1,1)))
>>> offset/y.itemsize
17
>>> x = np.reshape(np.arange(5*6*7*8), (5,6,7,8)).transpose(2,3,1,0)
>>> x.strides
(32, 4, 224, 1344)
>>> i = np.array([3,5,2,2])
>>> offset = sum(i * x.strides)
>>> x[3,5,2,2]
813
>>> offset / x.itemsize
813
sum(axis=None, dtype=None, out=None, keepdims=<no value>)

Return the sum of the array elements over the given axis.

Masked elements are set to 0 internally.

Refer to numpy.sum for full documentation.

See also

numpy.ndarray.sum()
corresponding function for ndarrays
numpy.sum()
equivalent function

Examples

>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4)
>>> x
masked_array(
  data=[[1, --, 3],
        [--, 5, --],
        [7, --, 9]],
  mask=[[False,  True, False],
        [ True, False,  True],
        [False,  True, False]],
  fill_value=999999)
>>> x.sum()
25
>>> x.sum(axis=1)
masked_array(data=[4, 5, 16],
             mask=[False, False, False],
       fill_value=999999)
>>> x.sum(axis=0)
masked_array(data=[8, 5, 12],
             mask=[False, False, False],
       fill_value=999999)
>>> print(type(x.sum(axis=0, dtype=np.int64)[0]))
<class 'numpy.int64'>
swapaxes(axis1, axis2)

Return a view of the array with axis1 and axis2 interchanged.

Refer to numpy.swapaxes for full documentation.

See also

numpy.swapaxes()
equivalent function
take(indices, axis=None, out=None, mode='raise')
tobytes(fill_value=None, order='C')

Return the array data as a string containing the raw bytes in the array.

The array is filled with a fill value before the string conversion.

New in version 1.9.0.

Parameters:
  • fill_value (scalar, optional) – Value used to fill in the masked values. Default is None, in which case MaskedArray.fill_value is used.
  • order ({'C','F','A'}, optional) –

    Order of the data item in the copy. Default is ‘C’.

    • ’C’ – C order (row major).
    • ’F’ – Fortran order (column major).
    • ’A’ – Any, current order of array.
    • None – Same as ‘A’.

See also

numpy.ndarray.tobytes(), tolist(), tofile()

Notes

As for ndarray.tobytes, information about the shape, dtype, etc., but also about fill_value, will be lost.

Examples

>>> x = np.ma.array(np.array([[1, 2], [3, 4]]), mask=[[0, 1], [1, 0]])
>>> x.tobytes()
b'\x01\x00\x00\x00\x00\x00\x00\x00?B\x0f\x00\x00\x00\x00\x00?B\x0f\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00'
tofile(fid, sep='', format='%s')

Save a masked array to a file in binary format.

Warning

This function is not implemented yet.

Raises:NotImplementedError – When tofile is called.
toflex()

Transforms a masked array into a flexible-type array.

The flexible type array that is returned will have two fields:

  • the _data field stores the _data part of the array.
  • the _mask field stores the _mask part of the array.
Parameters:None
Returns:record – A new flexible-type ndarray with two fields: the first element containing a value, the second element containing the corresponding mask boolean. The returned record shape matches self.shape.
Return type:ndarray

Notes

A side-effect of transforming a masked array into a flexible ndarray is that meta information (fill_value, …) will be lost.

Examples

>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4)
>>> x
masked_array(
  data=[[1, --, 3],
        [--, 5, --],
        [7, --, 9]],
  mask=[[False,  True, False],
        [ True, False,  True],
        [False,  True, False]],
  fill_value=999999)
>>> x.toflex()
array([[(1, False), (2,  True), (3, False)],
       [(4,  True), (5, False), (6,  True)],
       [(7, False), (8,  True), (9, False)]],
      dtype=[('_data', '<i8'), ('_mask', '?')])
tolist(fill_value=None)

Return the data portion of the masked array as a hierarchical Python list.

Data items are converted to the nearest compatible Python type. Masked values are converted to fill_value. If fill_value is None, the corresponding entries in the output list will be None.

Parameters:fill_value (scalar, optional) – The value to use for invalid entries. Default is None.
Returns:result – The Python list representation of the masked array.
Return type:list

Examples

>>> x = np.ma.array([[1,2,3], [4,5,6], [7,8,9]], mask=[0] + [1,0]*4)
>>> x.tolist()
[[1, None, 3], [None, 5, None], [7, None, 9]]
>>> x.tolist(-999)
[[1, -999, 3], [-999, 5, -999], [7, -999, 9]]
torecords()

Transforms a masked array into a flexible-type array.

The flexible type array that is returned will have two fields:

  • the _data field stores the _data part of the array.
  • the _mask field stores the _mask part of the array.
Parameters:None
Returns:record – A new flexible-type ndarray with two fields: the first element containing a value, the second element containing the corresponding mask boolean. The returned record shape matches self.shape.
Return type:ndarray

Notes

A side-effect of transforming a masked array into a flexible ndarray is that meta information (fill_value, …) will be lost.

Examples

>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4)
>>> x
masked_array(
  data=[[1, --, 3],
        [--, 5, --],
        [7, --, 9]],
  mask=[[False,  True, False],
        [ True, False,  True],
        [False,  True, False]],
  fill_value=999999)
>>> x.toflex()
array([[(1, False), (2,  True), (3, False)],
       [(4,  True), (5, False), (6,  True)],
       [(7, False), (8,  True), (9, False)]],
      dtype=[('_data', '<i8'), ('_mask', '?')])
tostring(fill_value=None, order='C')

A compatibility alias for tobytes, with exactly the same behavior.

Despite its name, it returns bytes not strs.

Deprecated since version 1.19.0.

trace(offset=0, axis1=0, axis2=1, dtype=None, out=None)

Return the sum along diagonals of the array.

Refer to numpy.trace for full documentation.

See also

numpy.trace()
equivalent function
transpose(*axes)

Returns a view of the array with axes transposed.

For a 1-D array this has no effect, as a transposed vector is simply the same vector. To convert a 1-D array into a 2D column vector, an additional dimension must be added. np.atleast2d(a).T achieves this, as does a[:, np.newaxis]. For a 2-D array, this is a standard matrix transpose. For an n-D array, if axes are given, their order indicates how the axes are permuted (see Examples). If axes are not provided and a.shape = (i[0], i[1], ... i[n-2], i[n-1]), then a.transpose().shape = (i[n-1], i[n-2], ... i[1], i[0]).

Parameters:axes (None, tuple of ints, or n ints) –
  • None or no argument: reverses the order of the axes.
  • tuple of ints: i in the j-th place in the tuple means a’s i-th axis becomes a.transpose()’s j-th axis.
  • n ints: same as an n-tuple of the same ints (this form is intended simply as a “convenience” alternative to the tuple form)
Returns:out – View of a, with axes suitably permuted.
Return type:ndarray

See also

ndarray.T()
Array property returning the array transposed.
ndarray.reshape()
Give a new shape to an array without changing its data.

Examples

>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
       [3, 4]])
>>> a.transpose()
array([[1, 3],
       [2, 4]])
>>> a.transpose((1, 0))
array([[1, 3],
       [2, 4]])
>>> a.transpose(1, 0)
array([[1, 3],
       [2, 4]])
unshare_mask()

Copy the mask and set the sharedmask flag to False.

Whether the mask is shared between masked arrays can be seen from the sharedmask property. unshare_mask ensures the mask is not shared. A copy of the mask is only made if it was shared.

See also

sharedmask()

var(axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)

Compute the variance along the specified axis.

Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.

Parameters:
  • a (array_like) – Array containing numbers whose variance is desired. If a is not an array, a conversion is attempted.
  • axis (None or int or tuple of ints, optional) –

    Axis or axes along which the variance is computed. The default is to compute the variance of the flattened array.

    New in version 1.7.0.

    If this is a tuple of ints, a variance is performed over multiple axes, instead of a single axis or all the axes as before.

  • dtype (data-type, optional) – Type to use in computing the variance. For arrays of integer type the default is float64; for arrays of float types it is the same as the array type.
  • out (ndarray, optional) – Alternate output array in which to place the result. It must have the same shape as the expected output, but the type is cast if necessary.
  • ddof (int, optional) – “Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is zero.
  • keepdims (bool, optional) –

    If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

    If the default value is passed, then keepdims will not be passed through to the var method of sub-classes of ndarray, however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.

Returns:

variance – If out=None, returns a new array containing the variance; otherwise, a reference to the output array is returned.

Return type:

ndarray, see dtype parameter above

See also

std(), mean(), nanmean(), nanstd(), nanvar(), ufuncs-output-type()

Notes

The variance is the average of the squared deviations from the mean, i.e., var = mean(abs(x - x.mean())**2).

The mean is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables.

Note that for complex numbers, the absolute value is taken before squaring, so that the result is always real and nonnegative.

For floating-point input, the variance is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-accuracy accumulator using the dtype keyword can alleviate this issue.

Examples

>>> a = np.array([[1, 2], [3, 4]])
>>> np.var(a)
1.25
>>> np.var(a, axis=0)
array([1.,  1.])
>>> np.var(a, axis=1)
array([0.25,  0.25])

In single precision, var() can be inaccurate:

>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> np.var(a)
0.20250003

Computing the variance in float64 is more accurate:

>>> np.var(a, dtype=np.float64)
0.20249999932944759 # may vary
>>> ((1-0.55)**2 + (0.1-0.55)**2)/2
0.2025
view(dtype=None, type=None, fill_value=None)

Return a view of the MaskedArray data.

Parameters:
  • dtype (data-type or ndarray sub-class, optional) – Data-type descriptor of the returned view, e.g., float32 or int16. The default, None, results in the view having the same data-type as a. As with ndarray.view, dtype can also be specified as an ndarray sub-class, which then specifies the type of the returned object (this is equivalent to setting the type parameter).
  • type (Python type, optional) – Type of the returned view, either ndarray or a subclass. The default None results in type preservation.
  • fill_value (scalar, optional) – The value to use for invalid entries (None by default). If None, then this argument is inferred from the passed dtype, or in its absence the original array, as discussed in the notes below.

See also

numpy.ndarray.view()
Equivalent method on ndarray object.

Notes

a.view() is used two different ways:

a.view(some_dtype) or a.view(dtype=some_dtype) constructs a view of the array’s memory with a different data-type. This can cause a reinterpretation of the bytes of memory.

a.view(ndarray_subclass) or a.view(type=ndarray_subclass) just returns an instance of ndarray_subclass that looks at the same array (same shape, dtype, etc.) This does not cause a reinterpretation of the memory.

If fill_value is not specified, but dtype is specified (and is not an ndarray sub-class), the fill_value of the MaskedArray will be reset. If neither fill_value nor dtype are specified (or if dtype is an ndarray sub-class), then the fill value is preserved. Finally, if fill_value is specified, but dtype is not, the fill value is set to the specified value.

For a.view(some_dtype), if some_dtype has a different number of bytes per entry than the previous dtype (for example, converting a regular array to a structured array), then the behavior of the view cannot be predicted just from the superficial appearance of a (shown by print(a)). It also depends on exactly how a is stored in memory. Therefore if a is C-ordered versus fortran-ordered, versus defined as a slice or transpose, etc., the view may give different results.

class fanc.matrix.RegionMatrixContainer

Bases: fanc.matrix.RegionPairsContainer, fanc.regions.RegionBasedWithBins

Class representing matrices where pixels correspond to genomic region pairs.

This is the common interface for all matrix-based classes, such as Hic or FoldChangeMatrix. It provides access to specialised matrix methods, most importantly matrix(), which assembles numpy arrays from the list of pairwise contacts stored in each object.

It inherits all region methods from RegionBased, and all edge/contact methods from RegionPairsContainer. You can use the same type of keys for matrix() that you would use for edges(), and additionally have the option to retrieve the observed/expected matrix.

import fanc
hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")

# get the whole-genome matrix
m = hic.matrix()
type(m)  # fanc.matrix.RegionMatrix
isinstance(m, np.ndarray)  # True
m.shape  # 139, 139

# get just the chromosome 18 intra-chromosomal matrix
m = hic.matrix(('chr18', 'chr18'))
m.shape  # 79, 79

# get all rows of the whole-genome matrix
# corresponding to chromosome 18
m = hic.matrix('chr18')
m.shape  # 79, 139

# get unnormalised chromosome 18 matrix
m = hic.matrix(('chr18', 'chr18'), norm=False)

# get chromosome 18 O/E matrix
m = hic.matrix(('chr18', 'chr18'), oe=True)

# get log2-transformed chromosome 18 O/E matrix
m = hic.matrix(('chr18', 'chr18'), oe=True, log=True)
add_contact(contact, *args, **kwargs)

Alias for add_edge()

Parameters:
  • contactEdge
  • args – Positional arguments passed to _add_edge()
  • kwargs – Keyword arguments passed to _add_edge()
add_contacts(contacts, *args, **kwargs)

Alias for add_edges()

add_edge(edge, check_nodes_exist=True, *args, **kwargs)

Add an edge / contact between two regions to this object.

Parameters:
  • edgeEdge, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).
  • check_nodes_exist – Make sure that there are nodes that match source and sink indexes
  • args – Positional arguments passed to _add_edge()
  • kwargs – Keyword arguments passed to _add_edge()
add_edge_from_dict(edge, *args, **kwargs)

Direct method to add an edge from dict input.

Parameters:edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes
add_edge_from_edge(edge, *args, **kwargs)

Direct method to add an edge from Edge input.

Parameters:edgeEdge
add_edge_from_list(edge, *args, **kwargs)

Direct method to add an edge from list or tuple input.

Parameters:edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)
add_edge_simple(source, sink, weight=None, *args, **kwargs)

Direct method to add an edge from Edge input.

Parameters:
  • source – Source region index
  • sink – Sink region index
  • weight – Weight of the edge
add_edges(edges, *args, **kwargs)

Bulk-add edges from a list.

List items can be any of the supported edge types, list, tuple, dict, or Edge. Repeatedly calls add_edge(), so may be inefficient for large amounts of data.

Parameters:edges – List (or iterator) of edges. See add_edge() for details
add_region(region, *args, **kwargs)

Add a genomic region to this object.

This method offers some flexibility in the types of objects that can be loaded. See parameters for details.

Parameters:region – Can be a GenomicRegion, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
static bin_intervals(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into a fixed number of bins.

Parameters:
  • intervals – iterator of tuples (start, end, score)
  • bins – Number of bins to divide the region into
  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
Returns:

iterator of tuples: (start, end, score)

static bin_intervals_equidistant(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into bins with a fixed size.

Parameters:
  • intervals – iterator of tuples (start, end, score)
  • bin_size – Size of each bin in base pairs
  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
Returns:

iterator of tuples: (start, end, score)

bin_size

Return the length of the first region in the dataset.

Assumes all bins have equal size.

Returns:int
binned_regions(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)

Same as region_intervals, but returns GenomicRegion objects instead of tuples.

Parameters:
  • region – String or class:~GenomicRegion object denoting the region to be binned
  • bins – Number of bins to divide the region into
  • bin_size – Size of each bin (alternative to bins argument)
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
  • args – Arguments passed to _region_intervals
  • kwargs – Keyword arguments passed to _region_intervals
Returns:

iterator of GenomicRegion objects

bins_to_distance(bins)

Convert fraction of bins to base pairs

Parameters:bins – float, fraction of bins
Returns:int, base pairs
chromosome_bins

Returns a dictionary of chromosomes and the start and end index of the bins they cover.

Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

chromosome_lengths

Returns a dictionary of chromosomes and their length in bp.

chromosomes()

Get a list of chromosome names.

distance_to_bins(distance)

Convert base pairs to fraction of bins.

Parameters:distance – distance in base pairs
Returns:float, distance as fraction of bin size
edge_data(attribute, *args, **kwargs)

Iterate over specific edge attribute.

Parameters:
  • attribute – Name of the attribute, e.g. “weight”
  • args – Positional arguments passed to edges()
  • kwargs – Keyword arguments passed to edges()
Returns:

iterator over edge attribute

edge_subset(key=None, *args, **kwargs)

Get a subset of edges.

This is an alias for edges().

Returns:generator (Edge)
edges

Iterate over contacts / edges.

edges() is the central function of RegionPairsContainer. Here, we will use the Hic implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementing RegionPairsContainer, including JuicerHic and CoolerHic.

import fanc

# file from FAN-C examples
hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")

We can easily find the number of edges in the sample Hic object:

len(hic.edges)  # 8695

When used in an iterator context, edges() iterates over all edges in the RegionPairsContainer:

for edge in hic.edges:
    # do something with edge
    print(edge)
    # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173
    # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057
    # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876
    # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468
    # ...

Calling edges() as a method has the same effect:

# note the '()'
for edge in hic.edges():
    # do something with edge
    print(edge)
    # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173
    # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057
    # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876
    # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468
    # ...

Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a GenomicRegion, all non-zero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings or GenomicRegion, only edges between the two regions are returned.

# select all edges between chromosome 19
# and any other region:
for edge in hic.edges("chr19"):
    print(edge)
    # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033
    # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978
    # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917
    # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048
    # ...

# select all edges that are only on
# chromosome 19
for edge in hic.edges(('chr19', 'chr19')):
    print(edge)
    # 90--116; bias: 0.00021173151730025176; sink_node: chr19:37000001-38000000; source_node: chr19:11000001-12000000; weight: 0.009104455243910825
    # 135--135; bias: 0.00018003890596887822; sink_node: chr19:56000001-57000000; source_node: chr19:56000001-57000000; weight: 0.10028167062466517
    # 123--123; bias: 0.00011063368998965993; sink_node: chr19:44000001-45000000; source_node: chr19:44000001-45000000; weight: 0.1386240135570439
    # 92--93; bias: 0.00040851066434864896; sink_node: chr19:14000001-15000000; source_node: chr19:13000001-14000000; weight: 0.10090213409411629
    # ...

# select inter-chromosomal edges
# between chromosomes 18 and 19
for edge in hic.edges(('chr18', 'chr19')):
    print(edge)
    # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033
    # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978
    # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917
    # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048
    # ...

By default, edges() will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all file-based FAN-C RegionPairsContainer objects support lazy loading, where attributes are only read on demand.

for edge in hic.edges('chr18', lazy=True):
    print(edge.source, edge.sink, edge.weight, edge)
    # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0>
    # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1>
    # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2>
    # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3>
    # ...

Warning

The lazy iterator reuses the LazyEdge object in every iteration, and overwrites the LazyEdge attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expected list(hic.edges()), with all Edge objects stored in the list, while this code list(hic.edges(lazy=True)) will result in a list of identical LazyEdge objects. Always ensure you do all edge processing in the loop when working with lazy iterators!

When working with normalised contact frequencies, such as obtained through matrix balancing in the example above, edges() automatically returns normalised edge weights. In addition, the bias attribute will (typically) have a value different from 1.

When you are interested in the raw contact frequency, use the norm=False parameter:

for edge in hic.edges('chr18', lazy=True, norm=False):
    print(edge.source, edge.sink, edge.weight)
    # 42 42 2120.0
    # 24 28 388.0
    # 5 76 94.0
    # 66 68 470.0
    # ...

You can also choose to omit all intra- or inter-chromosomal edges using intra_chromosomal=False or inter_chromosomal=False, respectively.

Returns:Iterator over Edge or equivalent.
edges_dict(*args, **kwargs)

Edges iterator with access by bracket notation.

This iterator always returns unnormalised edges.

Returns:dict or dict-like iterator
expected_values(selected_chromosome=None, norm=True, *args, **kwargs)

Calculate the expected values for genomic contacts at all distances.

This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.

It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.

Parameters:
  • selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
  • norm – If False, will calculate the expected values on the unnormalised matrix.
  • args – Not used in this context
  • kwargs – Not used in this context
Returns:

list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value

expected_values_and_marginals(selected_chromosome=None, norm=True, *args, **kwargs)

Calculate the expected values for genomic contacts at all distances and the whole matrix marginals.

This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.

It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.

Parameters:
  • selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
  • norm – If False, will calculate the expected values on the unnormalised matrix.
  • args – Not used in this context
  • kwargs – Not used in this context
Returns:

list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value

find_region(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)

Find the region that is at the center of a region.

Parameters:query_regions – Region selector string, :class:~GenomicRegion, or list of the former
Returns:index (or list of indexes) of the region at the center of the query region
intervals(*args, **kwargs)

Alias for region_intervals.

mappable(region=None)

Get the mappability of regions in this object.

A “mappable” region has at least one contact to another region in the genome.

Returns:array where True means mappable and False unmappable
marginals(masked=True, *args, **kwargs)

Get the marginals vector of this Hic matrix.

Sums up all contacts for each bin of the Hi-C matrix. Unmappable regoins will be masked in the returned vector unless the masked parameter is set to False.

By default, corrected matrix entries are summed up. To get uncorrected matrix marginals use norm=False. Generally, all parameters accepted by edges() are supported.

Parameters:
  • masked – Use a numpy masked array to mask entries corresponding to unmappable regions
  • kwargs – Keyword arguments passed to edges()
matrix(key=None, log=False, default_value=None, mask=True, log_base=2, *args, **kwargs)

Assemble a RegionMatrix from region pairs.

Parameters:
  • key – Matrix selector. See edges() for all supported key types
  • log – If True, log-transform the matrix entries. Also see log_base
  • log_base – Base of the log transformation. Default: 2; only used when log=True
  • default_value – (optional) set the default value of matrix entries that have no associated edge/contact
  • mask – If False, do not mask unmappable regions
  • args – Positional arguments passed to regions_and_matrix_entries()
  • kwargs – Keyword arguments passed to regions_and_matrix_entries()
Returns:

RegionMatrix

classmethod merge(pairs, *args, **kwargs)

Merge two or more RegionPairsContainer objects.

Parameters:
  • pairslist of RegionPairsContainer
  • args – Positional arguments passed to constructor of this class
  • kwargs – Keyword arguments passed to constructor of this class
possible_contacts()

Calculate the possible number of contacts in the genome.

This calculates the number of potential region pairs in a genome for any possible separation distance, taking into account the existence of unmappable regions.

It will calculate one number for inter-chromosomal pairs, return a list with the number of possible pairs where the list index corresponds to the number of bins separating two regions, and a dictionary of lists for each chromosome.

Returns:possible intra-chromosomal pairs, possible intra-chromosomal pairs by chromosome, possible inter-chromosomal pairs
region_bins(*args, **kwargs)

Return slice of start and end indices spanned by a region.

Parameters:args – provide a GenomicRegion here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.
Returns:
region_intervals(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)

Return equally-sized genomic intervals and associated scores.

Use either bins or bin_size argument to control binning.

Parameters:
  • region – String or class:~GenomicRegion object denoting the region to be binned
  • bins – Number of bins to divide the region into
  • bin_size – Size of each bin (alternative to bins argument)
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
  • args – Arguments passed to _region_intervals
  • kwargs – Keyword arguments passed to _region_intervals
Returns:

iterator of tuples: (start, end, score)

region_subset(region, *args, **kwargs)

Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.

Parameters:region – String or class:~GenomicRegion object for which covered bins will be returned.
regions

Iterate over genomic regions in this object.

Will return a GenomicRegion object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.

Returns:RegionIter
regions_and_edges(key, *args, **kwargs)

Convenient access to regions and edges selected by key.

Parameters:
  • key – Edge selector, see edges()
  • args – Positional arguments passed to edges()
  • kwargs – Keyword arguments passed to edges()
Returns:

list of row regions, list of col regions, iterator over edges

regions_and_matrix_entries(key=None, score_field=None, *args, **kwargs)

Convenient access to non-zero matrix entries and associated regions.

Parameters:
  • key – Edge key, see edges()
  • oe – If True, will divide observed values by their expected value at the given distance. False by default
  • oe_per_chromosome – If True (default), will do a per-chromosome O/E calculation rather than using the whole matrix to obtain expected values
  • score_field – (optional) any edge attribute that returns a number can be specified here for filling the matrix. Usually this is defined by the _default_score_field attribute of the matrix class.
  • args – Positional arguments passed to edges()
  • kwargs – Keyword arguments passed to edges()
Returns:

list of row regions, list of col regions, iterator over (i, j, weight) tuples

regions_dict

Return a dictionary with region index as keys and regions as values.

Returns:dict {region.ix: region, …}
static regions_identical(pairs)

Check if the regions in all objects in the list are identical.

Parameters:pairslist of RegionBased objects
Returns:True if chromosome, start, and end are identical between all regions in the same list positions.
scaling_factor(matrix, weight_column=None)

Compute the scaling factor to another matrix.

Calculates the ratio between the number of contacts in this Hic object to the number of contacts in another Hic object.

Parameters:
  • matrix – A Hic object
  • weight_column – Name of the column to calculate the scaling factor on
Returns:

float

to_bed(file_name, subset=None, **kwargs)

Export regions as BED file

Parameters:
  • file_name – Path of file to write regions to
  • subset – optional GenomicRegion or str to write only regions overlapping this region
  • kwargs – Passed to write_bed()
to_bigwig(file_name, subset=None, **kwargs)

Export regions as BigWig file.

Parameters:
  • file_name – Path of file to write regions to
  • subset – optional GenomicRegion or str to write only regions overlapping this region
  • kwargs – Passed to write_bigwig()
to_gff(file_name, subset=None, **kwargs)

Export regions as GFF file

Parameters:
  • file_name – Path of file to write regions to
  • subset – optional GenomicRegion or str to write only regions overlapping this region
  • kwargs – Passed to write_gff()
class fanc.matrix.RegionMatrixTable(file_name=None, mode='a', tmpdir=None, partition_strategy='auto', additional_region_fields=None, additional_edge_fields=None, default_score_field='weight', default_value=0.0, _table_name_regions='regions', _table_name_edges='edges', _table_name_expected_values='expected_values', _edge_buffer_size='3G')

Bases: fanc.matrix.RegionMatrixContainer, fanc.matrix.RegionPairsTable

HDF5 implementation of the RegionMatrixContainer interface.

class ChromosomeDescription

Bases: tables.description.IsDescription

Description of the chromosomes in this object.

class MaskDescription

Bases: tables.description.IsDescription

class RegionDescription

Bases: tables.description.IsDescription

Description of a genomic region for PyTables Table

add_contact(contact, *args, **kwargs)

Alias for add_edge()

Parameters:
  • contactEdge
  • args – Positional arguments passed to _add_edge()
  • kwargs – Keyword arguments passed to _add_edge()
add_contacts(contacts, *args, **kwargs)

Alias for add_edges()

add_edge(edge, check_nodes_exist=True, *args, **kwargs)

Add an edge / contact between two regions to this object.

Parameters:
  • edgeEdge, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).
  • check_nodes_exist – Make sure that there are nodes that match source and sink indexes
  • args – Positional arguments passed to _add_edge()
  • kwargs – Keyword arguments passed to _add_edge()
add_edge_from_dict(edge, *args, **kwargs)

Direct method to add an edge from dict input.

Parameters:edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes
add_edge_from_edge(edge, *args, **kwargs)

Direct method to add an edge from Edge input.

Parameters:edgeEdge
add_edge_from_list(edge, *args, **kwargs)

Direct method to add an edge from list or tuple input.

Parameters:edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)
add_edge_simple(source, sink, weight=None, *args, **kwargs)

Direct method to add an edge from Edge input.

Parameters:
  • source – Source region index
  • sink – Sink region index
  • weight – Weight of the edge
add_edges(edges, flush=True, *args, **kwargs)

Bulk-add edges from a list.

List items can be any of the supported edge types, list, tuple, dict, or Edge. Repeatedly calls add_edge(), so may be inefficient for large amounts of data.

Parameters:edges – List (or iterator) of edges. See add_edge() for details
add_mask_description(name, description)

Add a mask description to the _mask table and return its ID.

Parameters:
  • name (str) – name of the mask
  • description (str) – description of the mask
Returns:

id of the mask

Return type:

int

add_region(region, *args, **kwargs)

Add a genomic region to this object.

This method offers some flexibility in the types of objects that can be loaded. See parameters for details.

Parameters:region – Can be a GenomicRegion, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
add_regions(regions, *args, **kwargs)

Bulk insert multiple genomic regions.

Parameters:regions – List (or any iterator) with objects that describe a genomic region. See add_region for options.
static bin_intervals(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into a fixed number of bins.

Parameters:
  • intervals – iterator of tuples (start, end, score)
  • bins – Number of bins to divide the region into
  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
Returns:

iterator of tuples: (start, end, score)

static bin_intervals_equidistant(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into bins with a fixed size.

Parameters:
  • intervals – iterator of tuples (start, end, score)
  • bin_size – Size of each bin in base pairs
  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
Returns:

iterator of tuples: (start, end, score)

bin_size

Return the length of the first region in the dataset.

Assumes all bins have equal size.

Returns:int
binned_regions(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)

Same as region_intervals, but returns GenomicRegion objects instead of tuples.

Parameters:
  • region – String or class:~GenomicRegion object denoting the region to be binned
  • bins – Number of bins to divide the region into
  • bin_size – Size of each bin (alternative to bins argument)
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
  • args – Arguments passed to _region_intervals
  • kwargs – Keyword arguments passed to _region_intervals
Returns:

iterator of GenomicRegion objects

bins_to_distance(bins)

Convert fraction of bins to base pairs

Parameters:bins – float, fraction of bins
Returns:int, base pairs
chromosome_bins

Returns a dictionary of chromosomes and the start and end index of the bins they cover.

Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

chromosome_lengths

Returns a dictionary of chromosomes and their length in bp.

chromosomes()

List all chromosomes in this regions table. :return: list of chromosome names.

close(copy_tmp=True, remove_tmp=True)

Close this HDF5 file and run exit operations.

If file was opened with tmpdir in read-only mode: close file and delete temporary copy.

If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.

Parameters:
  • copy_tmp – If False, does not overwrite original with modified file.
  • remove_tmp – If False, does not delete temporary copy of file.
distance_to_bins(distance)

Convert base pairs to fraction of bins.

Parameters:distance – distance in base pairs
Returns:float, distance as fraction of bin size
downsample(n, file_name=None)

Sample edges from this object.

Sampling is always done on uncorrected Hi-C matrices.

Parameters:
  • n – Sample size or reference object. If n < 1 will be interpreted as a fraction of total reads in this object.
  • file_name – Output file name for down-sampled object.
Returns:

RegionPairsTable

edge_data(attribute, *args, **kwargs)

Iterate over specific edge attribute.

Parameters:
  • attribute – Name of the attribute, e.g. “weight”
  • args – Positional arguments passed to edges()
  • kwargs – Keyword arguments passed to edges()
Returns:

iterator over edge attribute

edge_subset(key=None, *args, **kwargs)

Get a subset of edges.

This is an alias for edges().

Returns:generator (Edge)
edges

Iterate over contacts / edges.

edges() is the central function of RegionPairsContainer. Here, we will use the Hic implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementing RegionPairsContainer, including JuicerHic and CoolerHic.

import fanc

# file from FAN-C examples
hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")

We can easily find the number of edges in the sample Hic object:

len(hic.edges)  # 8695

When used in an iterator context, edges() iterates over all edges in the RegionPairsContainer:

for edge in hic.edges:
    # do something with edge
    print(edge)
    # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173
    # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057
    # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876
    # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468
    # ...

Calling edges() as a method has the same effect:

# note the '()'
for edge in hic.edges():
    # do something with edge
    print(edge)
    # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173
    # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057
    # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876
    # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468
    # ...

Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a GenomicRegion, all non-zero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings or GenomicRegion, only edges between the two regions are returned.

# select all edges between chromosome 19
# and any other region:
for edge in hic.edges("chr19"):
    print(edge)
    # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033
    # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978
    # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917
    # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048
    # ...

# select all edges that are only on
# chromosome 19
for edge in hic.edges(('chr19', 'chr19')):
    print(edge)
    # 90--116; bias: 0.00021173151730025176; sink_node: chr19:37000001-38000000; source_node: chr19:11000001-12000000; weight: 0.009104455243910825
    # 135--135; bias: 0.00018003890596887822; sink_node: chr19:56000001-57000000; source_node: chr19:56000001-57000000; weight: 0.10028167062466517
    # 123--123; bias: 0.00011063368998965993; sink_node: chr19:44000001-45000000; source_node: chr19:44000001-45000000; weight: 0.1386240135570439
    # 92--93; bias: 0.00040851066434864896; sink_node: chr19:14000001-15000000; source_node: chr19:13000001-14000000; weight: 0.10090213409411629
    # ...

# select inter-chromosomal edges
# between chromosomes 18 and 19
for edge in hic.edges(('chr18', 'chr19')):
    print(edge)
    # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033
    # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978
    # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917
    # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048
    # ...

By default, edges() will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all file-based FAN-C RegionPairsContainer objects support lazy loading, where attributes are only read on demand.

for edge in hic.edges('chr18', lazy=True):
    print(edge.source, edge.sink, edge.weight, edge)
    # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0>
    # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1>
    # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2>
    # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3>
    # ...

Warning

The lazy iterator reuses the LazyEdge object in every iteration, and overwrites the LazyEdge attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expected list(hic.edges()), with all Edge objects stored in the list, while this code list(hic.edges(lazy=True)) will result in a list of identical LazyEdge objects. Always ensure you do all edge processing in the loop when working with lazy iterators!

When working with normalised contact frequencies, such as obtained through matrix balancing in the example above, edges() automatically returns normalised edge weights. In addition, the bias attribute will (typically) have a value different from 1.

When you are interested in the raw contact frequency, use the norm=False parameter:

for edge in hic.edges('chr18', lazy=True, norm=False):
    print(edge.source, edge.sink, edge.weight)
    # 42 42 2120.0
    # 24 28 388.0
    # 5 76 94.0
    # 66 68 470.0
    # ...

You can also choose to omit all intra- or inter-chromosomal edges using intra_chromosomal=False or inter_chromosomal=False, respectively.

Returns:Iterator over Edge or equivalent.
edges_dict(*args, **kwargs)

Edges iterator with access by bracket notation.

This iterator always returns unnormalised edges.

Returns:dict or dict-like iterator
expected_values(selected_chromosome=None, norm=True, *args, **kwargs)

Calculate the expected values for genomic contacts at all distances.

This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.

It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.

Parameters:
  • selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
  • norm – If False, will calculate the expected values on the unnormalised matrix.
  • args – Not used in this context
  • kwargs – Not used in this context
Returns:

list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value

expected_values_and_marginals(selected_chromosome=None, norm=True, force=False, *args, **kwargs)

Calculate the expected values for genomic contacts at all distances and the whole matrix marginals.

This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.

It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.

Parameters:
  • selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
  • norm – If False, will calculate the expected values on the unnormalised matrix.
  • args – Not used in this context
  • kwargs – Not used in this context
Returns:

list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value

filter(edge_filter, queue=False, log_progress=True)

Filter edges in this object by using a MaskFilter.

Parameters:
  • edge_filter – Class implementing MaskFilter.
  • queue – If True, filter will be queued and can be executed along with other queued filters using run_queued_filters()
  • log_progress – If true, process iterating through all edges will be continuously reported.
find_region(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)

Find the region that is at the center of a region.

Parameters:query_regions – Region selector string, :class:~GenomicRegion, or list of the former
Returns:index (or list of indexes) of the region at the center of the query region
flush(silent=False, update_mappability=True)

Write data to file and flush buffers.

Parameters:
  • silent – do not print flush progress
  • update_mappability – After writing data, update mappability and expected values
get_mask(key)

Search _mask table for key and return Mask.

Parameters:
  • key (int) – search by mask name
  • key – search by mask ID
Returns:

Mask

get_masks(ix)

Extract mask IDs encoded in parameter and return masks.

IDs are powers of 2, so a single int field in the table can hold multiple masks by simply adding up the IDs. Similar principle to UNIX chmod (although that uses base 8)

Parameters:ix (int) – integer that is the sum of powers of 2. Note that this value is not necessarily itself a power of 2.
Returns:list of Masks extracted from ix
Return type:list (Mask)
intervals(*args, **kwargs)

Alias for region_intervals.

mappable(region=None)

Get the mappability of regions in this object.

A “mappable” region has at least one contact to another region in the genome.

Returns:array where True means mappable and False unmappable
marginals(masked=True, *args, **kwargs)

Get the marginals vector of this Hic matrix.

Sums up all contacts for each bin of the Hi-C matrix. Unmappable regoins will be masked in the returned vector unless the masked parameter is set to False.

By default, corrected matrix entries are summed up. To get uncorrected matrix marginals use norm=False. Generally, all parameters accepted by edges() are supported.

Parameters:
  • masked – Use a numpy masked array to mask entries corresponding to unmappable regions
  • kwargs – Keyword arguments passed to edges()
matrix(key=None, log=False, default_value=None, mask=True, log_base=2, *args, **kwargs)

Assemble a RegionMatrix from region pairs.

Parameters:
  • key – Matrix selector. See edges() for all supported key types
  • log – If True, log-transform the matrix entries. Also see log_base
  • log_base – Base of the log transformation. Default: 2; only used when log=True
  • default_value – (optional) set the default value of matrix entries that have no associated edge/contact
  • mask – If False, do not mask unmappable regions
  • args – Positional arguments passed to regions_and_matrix_entries()
  • kwargs – Keyword arguments passed to regions_and_matrix_entries()
Returns:

RegionMatrix

classmethod merge(matrices, *args, **kwargs)

Merge multiple RegionMatrixContainer objects.

Merging is done by adding the weight of edges in each object.

Parameters:matrices – list of RegionMatrixContainer
Returns:merged RegionMatrixContainer
possible_contacts()

Calculate the possible number of contacts in the genome.

This calculates the number of potential region pairs in a genome for any possible separation distance, taking into account the existence of unmappable regions.

It will calculate one number for inter-chromosomal pairs, return a list with the number of possible pairs where the list index corresponds to the number of bins separating two regions, and a dictionary of lists for each chromosome.

Returns:possible intra-chromosomal pairs, possible intra-chromosomal pairs by chromosome, possible inter-chromosomal pairs
region_bins(*args, **kwargs)

Return slice of start and end indices spanned by a region.

Parameters:args – provide a GenomicRegion here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.
Returns:
region_data(key, value=None)

Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced

Parameters:
  • key – Name of the data column
  • value – vector with region-based data (one entry per region)
region_intervals(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)

Return equally-sized genomic intervals and associated scores.

Use either bins or bin_size argument to control binning.

Parameters:
  • region – String or class:~GenomicRegion object denoting the region to be binned
  • bins – Number of bins to divide the region into
  • bin_size – Size of each bin (alternative to bins argument)
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
  • args – Arguments passed to _region_intervals
  • kwargs – Keyword arguments passed to _region_intervals
Returns:

iterator of tuples: (start, end, score)

region_subset(region, *args, **kwargs)

Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.

Parameters:region – String or class:~GenomicRegion object for which covered bins will be returned.
regions

Iterate over genomic regions in this object.

Will return a GenomicRegion object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.

Returns:RegionIter
regions_and_edges(key, *args, **kwargs)

Convenient access to regions and edges selected by key.

Parameters:
  • key – Edge selector, see edges()
  • args – Positional arguments passed to edges()
  • kwargs – Keyword arguments passed to edges()
Returns:

list of row regions, list of col regions, iterator over edges

regions_and_matrix_entries(key=None, score_field=None, *args, **kwargs)

Convenient access to non-zero matrix entries and associated regions.

Parameters:
  • key – Edge key, see edges()
  • oe – If True, will divide observed values by their expected value at the given distance. False by default
  • oe_per_chromosome – If True (default), will do a per-chromosome O/E calculation rather than using the whole matrix to obtain expected values
  • score_field – (optional) any edge attribute that returns a number can be specified here for filling the matrix. Usually this is defined by the _default_score_field attribute of the matrix class.
  • args – Positional arguments passed to edges()
  • kwargs – Keyword arguments passed to edges()
Returns:

list of row regions, list of col regions, iterator over (i, j, weight) tuples

regions_dict

Return a dictionary with region index as keys and regions as values.

Returns:dict {region.ix: region, …}
static regions_identical(pairs)

Check if the regions in all objects in the list are identical.

Parameters:pairslist of RegionBased objects
Returns:True if chromosome, start, and end are identical between all regions in the same list positions.
run_queued_filters(log_progress=True)

Run queued filters.

Parameters:log_progress – If true, process iterating through all edges will be continuously reported.
scaling_factor(matrix, weight_column=None)

Compute the scaling factor to another matrix.

Calculates the ratio between the number of contacts in this Hic object to the number of contacts in another Hic object.

Parameters:
  • matrix – A Hic object
  • weight_column – Name of the column to calculate the scaling factor on
Returns:

float

subset(*regions, **kwargs)

Subset a Hic object by specifying one or more subset regions.

Parameters:
  • regions – string or GenomicRegion object(s)
  • kwargs – Supports file_name: destination file name of subset Hic object; tmpdir: if True works in tmp until object is closed additional parameters are passed to edges()
Returns:

Hic

to_bed(file_name, subset=None, **kwargs)

Export regions as BED file

Parameters:
  • file_name – Path of file to write regions to
  • subset – optional GenomicRegion or str to write only regions overlapping this region
  • kwargs – Passed to write_bed()
to_bigwig(file_name, subset=None, **kwargs)

Export regions as BigWig file.

Parameters:
  • file_name – Path of file to write regions to
  • subset – optional GenomicRegion or str to write only regions overlapping this region
  • kwargs – Passed to write_bigwig()
to_gff(file_name, subset=None, **kwargs)

Export regions as GFF file

Parameters:
  • file_name – Path of file to write regions to
  • subset – optional GenomicRegion or str to write only regions overlapping this region
  • kwargs – Passed to write_gff()
class fanc.matrix.RegionPairsContainer

Bases: genomic_regions.regions.RegionBased

Class representing pairs of genomic regions.

This is the basic interface for all pair and matrix classes in this module. It inherits all methods from RegionBased, and is therefore based on a list of genomic regions (GenomicRegion) representing the underlying genome. You can use the regions() method to access genomic regions in a intuitive fashion, for example:

for region in rpc.regions('chr1'):
    # do something with region
    print(region)

For more details on region access, see the genomic_regions documentation, on which this module is built.

RegionPairsContainer adds methods for pairs of genomic regions on top of the RegionBased methods for individual regions. In the nomenclature of this module, which borrows from network analysis terminology, a pair of regions is represented by an Edge.

# iterate over all region pairs / edges in chr1
for edge in rpc.edges(("chr1", "chr1")):
    # do something with edge / region pair
    region1 = edge.source_region
    region2 = edge.sink_region

for more details see the edges() method help.

This class itself is only an interface and cannot actually be used to add regions and region pairs. Implementations of this interface, i.e. subclasses such as RegionPairsTable must override various hidden methods to give them full functionality.

  • _add_edge() is used to save region pairs / edges to the object. It receives a single Edge as input and should return the index of the added edge.
  • _edges_iter() is required by edges(). It is used to iterate over all edges in the object in no particular order. It should return a generator of Edge objects representing all region pairs in the object.
  • _edges_subset() is also used by edges(). It is used to iterate over a subset of edges in this object. It receives as input a key representing the requested subset (further described in edges()), and two lists of GenomicRegion objects, row_regions and col_regions representing the two dimensions of regions selected by key. It should return an iterator over Edge objects.
  • _edges_getitem() is used by edges() for retrieval of edges by bracket notation. For integer input, it should return a single Edge, for slice input a list of Edge objects.

The above methods cover all the basic RegionPairsContainer functionality, but for speed improvements you may also want to override the following method, which by default iterates over all edges

  • _edges_length() which returns the total number of edges in the object
add_contact(contact, *args, **kwargs)

Alias for add_edge()

Parameters:
  • contactEdge
  • args – Positional arguments passed to _add_edge()
  • kwargs – Keyword arguments passed to _add_edge()
add_contacts(contacts, *args, **kwargs)

Alias for add_edges()

add_edge(edge, check_nodes_exist=True, *args, **kwargs)

Add an edge / contact between two regions to this object.

Parameters:
  • edgeEdge, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).
  • check_nodes_exist – Make sure that there are nodes that match source and sink indexes
  • args – Positional arguments passed to _add_edge()
  • kwargs – Keyword arguments passed to _add_edge()
add_edge_from_dict(edge, *args, **kwargs)

Direct method to add an edge from dict input.

Parameters:edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes
add_edge_from_edge(edge, *args, **kwargs)

Direct method to add an edge from Edge input.

Parameters:edgeEdge
add_edge_from_list(edge, *args, **kwargs)

Direct method to add an edge from list or tuple input.

Parameters:edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)
add_edge_simple(source, sink, weight=None, *args, **kwargs)

Direct method to add an edge from Edge input.

Parameters:
  • source – Source region index
  • sink – Sink region index
  • weight – Weight of the edge
add_edges(edges, *args, **kwargs)

Bulk-add edges from a list.

List items can be any of the supported edge types, list, tuple, dict, or Edge. Repeatedly calls add_edge(), so may be inefficient for large amounts of data.

Parameters:edges – List (or iterator) of edges. See add_edge() for details
add_region(region, *args, **kwargs)

Add a genomic region to this object.

This method offers some flexibility in the types of objects that can be loaded. See parameters for details.

Parameters:region – Can be a GenomicRegion, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
static bin_intervals(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into a fixed number of bins.

Parameters:
  • intervals – iterator of tuples (start, end, score)
  • bins – Number of bins to divide the region into
  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
Returns:

iterator of tuples: (start, end, score)

static bin_intervals_equidistant(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into bins with a fixed size.

Parameters:
  • intervals – iterator of tuples (start, end, score)
  • bin_size – Size of each bin in base pairs
  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
Returns:

iterator of tuples: (start, end, score)

binned_regions(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)

Same as region_intervals, but returns GenomicRegion objects instead of tuples.

Parameters:
  • region – String or class:~GenomicRegion object denoting the region to be binned
  • bins – Number of bins to divide the region into
  • bin_size – Size of each bin (alternative to bins argument)
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
  • args – Arguments passed to _region_intervals
  • kwargs – Keyword arguments passed to _region_intervals
Returns:

iterator of GenomicRegion objects

chromosome_lengths

Returns a dictionary of chromosomes and their length in bp.

chromosomes()

Get a list of chromosome names.

edge_data(attribute, *args, **kwargs)

Iterate over specific edge attribute.

Parameters:
  • attribute – Name of the attribute, e.g. “weight”
  • args – Positional arguments passed to edges()
  • kwargs – Keyword arguments passed to edges()
Returns:

iterator over edge attribute

edge_subset(key=None, *args, **kwargs)

Get a subset of edges.

This is an alias for edges().

Returns:generator (Edge)
edges

Iterate over contacts / edges.

edges() is the central function of RegionPairsContainer. Here, we will use the Hic implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementing RegionPairsContainer, including JuicerHic and CoolerHic.

import fanc

# file from FAN-C examples
hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")

We can easily find the number of edges in the sample Hic object:

len(hic.edges)  # 8695

When used in an iterator context, edges() iterates over all edges in the RegionPairsContainer:

for edge in hic.edges:
    # do something with edge
    print(edge)
    # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173
    # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057
    # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876
    # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468
    # ...

Calling edges() as a method has the same effect:

# note the '()'
for edge in hic.edges():
    # do something with edge
    print(edge)
    # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173
    # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057
    # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876
    # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468
    # ...

Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a GenomicRegion, all non-zero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings or GenomicRegion, only edges between the two regions are returned.

# select all edges between chromosome 19
# and any other region:
for edge in hic.edges("chr19"):
    print(edge)
    # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033
    # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978
    # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917
    # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048
    # ...

# select all edges that are only on
# chromosome 19
for edge in hic.edges(('chr19', 'chr19')):
    print(edge)
    # 90--116; bias: 0.00021173151730025176; sink_node: chr19:37000001-38000000; source_node: chr19:11000001-12000000; weight: 0.009104455243910825
    # 135--135; bias: 0.00018003890596887822; sink_node: chr19:56000001-57000000; source_node: chr19:56000001-57000000; weight: 0.10028167062466517
    # 123--123; bias: 0.00011063368998965993; sink_node: chr19:44000001-45000000; source_node: chr19:44000001-45000000; weight: 0.1386240135570439
    # 92--93; bias: 0.00040851066434864896; sink_node: chr19:14000001-15000000; source_node: chr19:13000001-14000000; weight: 0.10090213409411629
    # ...

# select inter-chromosomal edges
# between chromosomes 18 and 19
for edge in hic.edges(('chr18', 'chr19')):
    print(edge)
    # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033
    # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978
    # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917
    # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048
    # ...

By default, edges() will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all file-based FAN-C RegionPairsContainer objects support lazy loading, where attributes are only read on demand.

for edge in hic.edges('chr18', lazy=True):
    print(edge.source, edge.sink, edge.weight, edge)
    # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0>
    # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1>
    # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2>
    # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3>
    # ...

Warning

The lazy iterator reuses the LazyEdge object in every iteration, and overwrites the LazyEdge attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expected list(hic.edges()), with all Edge objects stored in the list, while this code list(hic.edges(lazy=True)) will result in a list of identical LazyEdge objects. Always ensure you do all edge processing in the loop when working with lazy iterators!

When working with normalised contact frequencies, such as obtained through matrix balancing in the example above, edges() automatically returns normalised edge weights. In addition, the bias attribute will (typically) have a value different from 1.

When you are interested in the raw contact frequency, use the norm=False parameter:

for edge in hic.edges('chr18', lazy=True, norm=False):
    print(edge.source, edge.sink, edge.weight)
    # 42 42 2120.0
    # 24 28 388.0
    # 5 76 94.0
    # 66 68 470.0
    # ...

You can also choose to omit all intra- or inter-chromosomal edges using intra_chromosomal=False or inter_chromosomal=False, respectively.

Returns:Iterator over Edge or equivalent.
edges_dict(*args, **kwargs)

Edges iterator with access by bracket notation.

This iterator always returns unnormalised edges.

Returns:dict or dict-like iterator
find_region(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)

Find the region that is at the center of a region.

Parameters:query_regions – Region selector string, :class:~GenomicRegion, or list of the former
Returns:index (or list of indexes) of the region at the center of the query region
intervals(*args, **kwargs)

Alias for region_intervals.

mappable(region=None)

Get the mappability of regions in this object.

A “mappable” region has at least one contact to another region in the genome.

Returns:array where True means mappable and False unmappable
classmethod merge(pairs, *args, **kwargs)

Merge two or more RegionPairsContainer objects.

Parameters:
  • pairslist of RegionPairsContainer
  • args – Positional arguments passed to constructor of this class
  • kwargs – Keyword arguments passed to constructor of this class
region_bins(region)

Takes a genomic region and returns a slice of the bin indices that are covered by the region.

Parameters:region – String or class:~GenomicRegion object for which covered bins will be returned.
Returns:slice
region_intervals(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)

Return equally-sized genomic intervals and associated scores.

Use either bins or bin_size argument to control binning.

Parameters:
  • region – String or class:~GenomicRegion object denoting the region to be binned
  • bins – Number of bins to divide the region into
  • bin_size – Size of each bin (alternative to bins argument)
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
  • args – Arguments passed to _region_intervals
  • kwargs – Keyword arguments passed to _region_intervals
Returns:

iterator of tuples: (start, end, score)

region_subset(region, *args, **kwargs)

Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.

Parameters:region – String or class:~GenomicRegion object for which covered bins will be returned.
regions

Iterate over genomic regions in this object.

Will return a GenomicRegion object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.

Returns:RegionIter
regions_and_edges(key, *args, **kwargs)

Convenient access to regions and edges selected by key.

Parameters:
  • key – Edge selector, see edges()
  • args – Positional arguments passed to edges()
  • kwargs – Keyword arguments passed to edges()
Returns:

list of row regions, list of col regions, iterator over edges

regions_dict

Return a dictionary with region index as keys and regions as values.

Returns:dict {region.ix: region, …}
static regions_identical(pairs)

Check if the regions in all objects in the list are identical.

Parameters:pairslist of RegionBased objects
Returns:True if chromosome, start, and end are identical between all regions in the same list positions.
to_bed(file_name, subset=None, **kwargs)

Export regions as BED file

Parameters:
  • file_name – Path of file to write regions to
  • subset – optional GenomicRegion or str to write only regions overlapping this region
  • kwargs – Passed to write_bed()
to_bigwig(file_name, subset=None, **kwargs)

Export regions as BigWig file.

Parameters:
  • file_name – Path of file to write regions to
  • subset – optional GenomicRegion or str to write only regions overlapping this region
  • kwargs – Passed to write_bigwig()
to_gff(file_name, subset=None, **kwargs)

Export regions as GFF file

Parameters:
  • file_name – Path of file to write regions to
  • subset – optional GenomicRegion or str to write only regions overlapping this region
  • kwargs – Passed to write_gff()
class fanc.matrix.RegionPairsTable(file_name=None, mode='a', tmpdir=None, additional_region_fields=None, additional_edge_fields=None, partition_strategy='auto', _table_name_regions='regions', _table_name_edges='edges', _edge_buffer_size='3G', _edge_table_prefix='chrpair_')

Bases: fanc.matrix.RegionPairsContainer, fanc.general.Maskable, fanc.regions.RegionsTable

HDF5 implementation of the RegionPairsContainer interface.

class ChromosomeDescription

Bases: tables.description.IsDescription

Description of the chromosomes in this object.

class MaskDescription

Bases: tables.description.IsDescription

class RegionDescription

Bases: tables.description.IsDescription

Description of a genomic region for PyTables Table

add_contact(contact, *args, **kwargs)

Alias for add_edge()

Parameters:
  • contactEdge
  • args – Positional arguments passed to _add_edge()
  • kwargs – Keyword arguments passed to _add_edge()
add_contacts(contacts, *args, **kwargs)

Alias for add_edges()

add_edge(edge, check_nodes_exist=True, *args, **kwargs)

Add an edge / contact between two regions to this object.

Parameters:
  • edgeEdge, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).
  • check_nodes_exist – Make sure that there are nodes that match source and sink indexes
  • args – Positional arguments passed to _add_edge()
  • kwargs – Keyword arguments passed to _add_edge()
add_edge_from_dict(edge, *args, **kwargs)

Direct method to add an edge from dict input.

Parameters:edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes
add_edge_from_edge(edge, *args, **kwargs)

Direct method to add an edge from Edge input.

Parameters:edgeEdge
add_edge_from_list(edge, *args, **kwargs)

Direct method to add an edge from list or tuple input.

Parameters:edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)
add_edge_simple(source, sink, weight=None, *args, **kwargs)

Direct method to add an edge from Edge input.

Parameters:
  • source – Source region index
  • sink – Sink region index
  • weight – Weight of the edge
add_edges(edges, flush=True, *args, **kwargs)

Bulk-add edges from a list.

List items can be any of the supported edge types, list, tuple, dict, or Edge. Repeatedly calls add_edge(), so may be inefficient for large amounts of data.

Parameters:edges – List (or iterator) of edges. See add_edge() for details
add_mask_description(name, description)

Add a mask description to the _mask table and return its ID.

Parameters:
  • name (str) – name of the mask
  • description (str) – description of the mask
Returns:

id of the mask

Return type:

int

add_region(region, *args, **kwargs)

Add a genomic region to this object.

This method offers some flexibility in the types of objects that can be loaded. See parameters for details.

Parameters:region – Can be a GenomicRegion, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
add_regions(regions, *args, **kwargs)

Bulk insert multiple genomic regions.

Parameters:regions – List (or any iterator) with objects that describe a genomic region. See add_region for options.
static bin_intervals(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into a fixed number of bins.

Parameters:
  • intervals – iterator of tuples (start, end, score)
  • bins – Number of bins to divide the region into
  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
Returns:

iterator of tuples: (start, end, score)

static bin_intervals_equidistant(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into bins with a fixed size.

Parameters:
  • intervals – iterator of tuples (start, end, score)
  • bin_size – Size of each bin in base pairs
  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
Returns:

iterator of tuples: (start, end, score)

bin_size

Return the length of the first region in the dataset.

Assumes all bins have equal size.

Returns:int
binned_regions(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)

Same as region_intervals, but returns GenomicRegion objects instead of tuples.

Parameters:
  • region – String or class:~GenomicRegion object denoting the region to be binned
  • bins – Number of bins to divide the region into
  • bin_size – Size of each bin (alternative to bins argument)
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
  • args – Arguments passed to _region_intervals
  • kwargs – Keyword arguments passed to _region_intervals
Returns:

iterator of GenomicRegion objects

bins_to_distance(bins)

Convert fraction of bins to base pairs

Parameters:bins – float, fraction of bins
Returns:int, base pairs
chromosome_bins

Returns a dictionary of chromosomes and the start and end index of the bins they cover.

Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

chromosome_lengths

Returns a dictionary of chromosomes and their length in bp.

chromosomes()

List all chromosomes in this regions table. :return: list of chromosome names.

close(copy_tmp=True, remove_tmp=True)

Close this HDF5 file and run exit operations.

If file was opened with tmpdir in read-only mode: close file and delete temporary copy.

If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.

Parameters:
  • copy_tmp – If False, does not overwrite original with modified file.
  • remove_tmp – If False, does not delete temporary copy of file.
distance_to_bins(distance)

Convert base pairs to fraction of bins.

Parameters:distance – distance in base pairs
Returns:float, distance as fraction of bin size
downsample(n, file_name=None)

Sample edges from this object.

Sampling is always done on uncorrected Hi-C matrices.

Parameters:
  • n – Sample size or reference object. If n < 1 will be interpreted as a fraction of total reads in this object.
  • file_name – Output file name for down-sampled object.
Returns:

RegionPairsTable

edge_data(attribute, *args, **kwargs)

Iterate over specific edge attribute.

Parameters:
  • attribute – Name of the attribute, e.g. “weight”
  • args – Positional arguments passed to edges()
  • kwargs – Keyword arguments passed to edges()
Returns:

iterator over edge attribute

edge_subset(key=None, *args, **kwargs)

Get a subset of edges.

This is an alias for edges().

Returns:generator (Edge)
edges

Iterate over contacts / edges.

edges() is the central function of RegionPairsContainer. Here, we will use the Hic implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementing RegionPairsContainer, including JuicerHic and CoolerHic.

import fanc

# file from FAN-C examples
hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")

We can easily find the number of edges in the sample Hic object:

len(hic.edges)  # 8695

When used in an iterator context, edges() iterates over all edges in the RegionPairsContainer:

for edge in hic.edges:
    # do something with edge
    print(edge)
    # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173
    # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057
    # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876
    # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468
    # ...

Calling edges() as a method has the same effect:

# note the '()'
for edge in hic.edges():
    # do something with edge
    print(edge)
    # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173
    # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057
    # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876
    # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468
    # ...

Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a GenomicRegion, all non-zero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings or GenomicRegion, only edges between the two regions are returned.

# select all edges between chromosome 19
# and any other region:
for edge in hic.edges("chr19"):
    print(edge)
    # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033
    # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978
    # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917
    # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048
    # ...

# select all edges that are only on
# chromosome 19
for edge in hic.edges(('chr19', 'chr19')):
    print(edge)
    # 90--116; bias: 0.00021173151730025176; sink_node: chr19:37000001-38000000; source_node: chr19:11000001-12000000; weight: 0.009104455243910825
    # 135--135; bias: 0.00018003890596887822; sink_node: chr19:56000001-57000000; source_node: chr19:56000001-57000000; weight: 0.10028167062466517
    # 123--123; bias: 0.00011063368998965993; sink_node: chr19:44000001-45000000; source_node: chr19:44000001-45000000; weight: 0.1386240135570439
    # 92--93; bias: 0.00040851066434864896; sink_node: chr19:14000001-15000000; source_node: chr19:13000001-14000000; weight: 0.10090213409411629
    # ...

# select inter-chromosomal edges
# between chromosomes 18 and 19
for edge in hic.edges(('chr18', 'chr19')):
    print(edge)
    # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033
    # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978
    # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917
    # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048
    # ...

By default, edges() will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all file-based FAN-C RegionPairsContainer objects support lazy loading, where attributes are only read on demand.

for edge in hic.edges('chr18', lazy=True):
    print(edge.source, edge.sink, edge.weight, edge)
    # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0>
    # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1>
    # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2>
    # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3>
    # ...

Warning

The lazy iterator reuses the LazyEdge object in every iteration, and overwrites the LazyEdge attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expected list(hic.edges()), with all Edge objects stored in the list, while this code list(hic.edges(lazy=True)) will result in a list of identical LazyEdge objects. Always ensure you do all edge processing in the loop when working with lazy iterators!

When working with normalised contact frequencies, such as obtained through matrix balancing in the example above, edges() automatically returns normalised edge weights. In addition, the bias attribute will (typically) have a value different from 1.

When you are interested in the raw contact frequency, use the norm=False parameter:

for edge in hic.edges('chr18', lazy=True, norm=False):
    print(edge.source, edge.sink, edge.weight)
    # 42 42 2120.0
    # 24 28 388.0
    # 5 76 94.0
    # 66 68 470.0
    # ...

You can also choose to omit all intra- or inter-chromosomal edges using intra_chromosomal=False or inter_chromosomal=False, respectively.

Returns:Iterator over Edge or equivalent.
edges_dict(*args, **kwargs)

Edges iterator with access by bracket notation.

This iterator always returns unnormalised edges.

Returns:dict or dict-like iterator
filter(edge_filter, queue=False, log_progress=True)

Filter edges in this object by using a MaskFilter.

Parameters:
  • edge_filter – Class implementing MaskFilter.
  • queue – If True, filter will be queued and can be executed along with other queued filters using run_queued_filters()
  • log_progress – If true, process iterating through all edges will be continuously reported.
find_region(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)

Find the region that is at the center of a region.

Parameters:query_regions – Region selector string, :class:~GenomicRegion, or list of the former
Returns:index (or list of indexes) of the region at the center of the query region
flush(silent=False, update_mappability=True)

Write data to file and flush buffers.

Parameters:
  • silent – do not print flush progress
  • update_mappability – After writing data, update mappability and expected values
get_mask(key)

Search _mask table for key and return Mask.

Parameters:
  • key (int) – search by mask name
  • key – search by mask ID
Returns:

Mask

get_masks(ix)

Extract mask IDs encoded in parameter and return masks.

IDs are powers of 2, so a single int field in the table can hold multiple masks by simply adding up the IDs. Similar principle to UNIX chmod (although that uses base 8)

Parameters:ix (int) – integer that is the sum of powers of 2. Note that this value is not necessarily itself a power of 2.
Returns:list of Masks extracted from ix
Return type:list (Mask)
intervals(*args, **kwargs)

Alias for region_intervals.

mappable(region=None)

Get the mappability of regions in this object.

A “mappable” region has at least one contact to another region in the genome.

Returns:array where True means mappable and False unmappable
classmethod merge(pairs, *args, **kwargs)

Merge two or more RegionPairsTable objects.

Parameters:pairs – list of RegionPairsTable
Returns:merged RegionPairsTable
region_bins(*args, **kwargs)

Return slice of start and end indices spanned by a region.

Parameters:args – provide a GenomicRegion here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.
Returns:
region_data(key, value=None)

Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced

Parameters:
  • key – Name of the data column
  • value – vector with region-based data (one entry per region)
region_intervals(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)

Return equally-sized genomic intervals and associated scores.

Use either bins or bin_size argument to control binning.

Parameters:
  • region – String or class:~GenomicRegion object denoting the region to be binned
  • bins – Number of bins to divide the region into
  • bin_size – Size of each bin (alternative to bins argument)
  • smoothing_window – Size of window (in bins) to smooth scores over
  • nan_replacement – NaN values in the scores will be replaced with this value
  • zero_to_nan – If True, will convert bins with score 0 to NaN
  • args – Arguments passed to _region_intervals
  • kwargs – Keyword arguments passed to _region_intervals
Returns:

iterator of tuples: (start, end, score)

region_subset(region, *args, **kwargs)

Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.

Parameters:region – String or class:~GenomicRegion object for which covered bins will be returned.
regions

Iterate over genomic regions in this object.

Will return a GenomicRegion object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.

Returns:RegionIter
regions_and_edges(key, *args, **kwargs)

Convenient access to regions and edges selected by key.

Parameters:
  • key – Edge selector, see edges()
  • args – Positional arguments passed to edges()
  • kwargs – Keyword arguments passed to edges()
Returns:

list of row regions, list of col regions, iterator over edges

regions_dict

Return a dictionary with region index as keys and regions as values.

Returns:dict {region.ix: region, …}
static regions_identical(pairs)

Check if the regions in all objects in the list are identical.

Parameters:pairslist of RegionBased objects
Returns:True if chromosome, start, and end are identical between all regions in the same list positions.
run_queued_filters(log_progress=True)

Run queued filters.

Parameters:log_progress – If true, process iterating through all edges will be continuously reported.
subset(*regions, **kwargs)

Subset a Hic object by specifying one or more subset regions.

Parameters:
  • regions – string or GenomicRegion object(s)
  • kwargs – Supports file_name: destination file name of subset Hic object; tmpdir: if True works in tmp until object is closed additional parameters are passed to edges()
Returns:

Hic

to_bed(file_name, subset=None, **kwargs)

Export regions as BED file

Parameters:
  • file_name – Path of file to write regions to
  • subset – optional GenomicRegion or str to write only regions overlapping this region
  • kwargs – Passed to write_bed()
to_bigwig(file_name, subset=None, **kwargs)

Export regions as BigWig file.

Parameters:
  • file_name – Path of file to write regions to
  • subset – optional GenomicRegion or str to write only regions overlapping this region
  • kwargs – Passed to write_bigwig()
to_gff(file_name, subset=None, **kwargs)

Export regions as GFF file

Parameters:
  • file_name – Path of file to write regions to
  • subset – optional GenomicRegion or str to write only regions overlapping this region
  • kwargs – Passed to write_gff()
fanc.matrix.as_edge(edge)

Convert input to Edge.

Parameters:edge – Can be Edge, tuple or list of the form (source, sink, weight), tuple of the form (GenomicRegion, GenomicRegion), dict, or Edge equivalent
Returns:Edge