# RegionMatrixContainer¶

This interface simplifies and unifies working with matrix data in the context of genomic
region pairs, such as you would find in a Hi-C matrix. It builds on the
`RegionPairsContainer`

(see previous section RegionPairsContainer),
which dealt with scores and other attributes between genomic regions, and extends it by
adding functions for representing the scores in a numeric matrix.

After loading a dataset using `load()`

, you can check for
support of the `RegionMatrixContainer`

interface with:

```
hic = fanc.load("examples/output/hic/binned/fanc_example_500kb.hic")
isinstance(hic, fanc.matrix.RegionMatrixContainer) # True if interface supported
```

The current list of FAN-C classes supporting `RegionMatrixContainer`

is:
`CoolerHic`

,
`JuicerHic`

,
`Hic`

,
`ABCompartmentMatrix`

,
`DifferenceMatrix`

,
`FoldChangeMatrix`

,
`PeakInfo`

,
and
`RaoPeakInfo`

.

## The matrix function¶

To obtain the whole-genome, normalised matrix from an object, use the
`matrix()`

function:

```
m = hic.matrix()
```

Of course, the `matrix()`

function supports matrix subsets:

```
m_chr19 = hic.matrix(('chr19', 'chr19'))
```

When using tuples as keys, the first entry will select the rows, and the second entry the columns of the matrix:

```
m_inter1 = hic.matrix(('chr18', 'chr19'))
m_inter1.shape # (157, 119)
m_inter2 = hic.matrix(('chr19', 'chr18'))
m_inter2.shape # (119, 157)
```

The returned object is of type `RegionMatrix`

, which is a subclass
of Numpy’s masked `array`

with added perks for genomic region handling.

A `RegionMatrix`

can be used like any other numpy matrix,
for example calculating marginals by summing up values in rows or columns:

```
m_chr19.shape # (119, 119)
marginals = np.sum(m_chr19, axis=0)
marginals.shape # (119,)
marginals[:5]
# [1.0000000074007467, 0.9999999585562779,
# 1.0000000102533806, 0.999999987196381, 1.0000000140165086]
```

(this Hi-C object is normalised on a per-chromosome basis, so each marginal will be close to 1)

Rows and columns in a matrix can be masked, i.e. their entries are considered invalid and
are ignored for most downstream analysis to prevent artifacts. By default, FAN-C masks
regions that have no edges (after filtering), typically due to mappability issues.
You can turn off masking using the `mask=False`

parameter:

```
m_unmasked = hic.matrix(mask=False)
```

However, we recommend working with masked matrices to ensure no unwanted edges are part of your analyses.

`RegionMatrix`

objects also keep track of the regions corresponding to
columns and rows of a matrix:

```
m_inter1.row_regions
# [chr18:1-500000,
# chr18:500001-1000000,
# chr18:1000001-1500000,
# chr18:1500001-2000000,
# ...
m_inter1.col_regions
# [chr19:1-500000,
# chr19:500001-1000000,
# chr19:1000001-1500000,
# chr19:1500001-2000000,
# ...
```

You can subset a `RegionMatrix`

using indexes or region intervals:

```
# subset by index
m_chr19_sub1 = m_chr19[0:3, 0:3]
m_chr19_sub1.row_regions
# [chr19:1-500000, chr19:500001-1000000, chr19:1000001-1500000]
m_chr19_sub1.col_regions
# [chr19:1-500000, chr19:500001-1000000, chr19:1000001-1500000]
# subset by region interval
m_chr19_sub2 = m_chr19['chr19:2mb-3mb', 'chr19:500kb-1mb']
m_chr19_sub2.row_regions
# [chr19:1500001-2000000, chr19:2000001-2500000, chr19:2500001-3000000]
m_chr19_sub2.col_regions
```

Note that region interval definitions are always interpreted as 1-based, inclusive, and any
overlapping region is returned (in the above example the region `chr19:150001-200000`

has a 1 base overlap with the requested interval).

`matrix()`

supports all arguments also available for
`edges()`

, but it is not necessary to use lazy loading.
You can, for example, output an uncorrected matrix with

```
m_chr19_uncorr = hic.matrix(('chr19', 'chr19'), norm=False)
```

In addition, there are several parameters specific to
`matrix()`

. Most notably, you can use the
`oe=True`

parameter to return an observed/expected (O/E) matrix:

```
m_chr19_oe = hic.matrix(('chr19', 'chr19'), oe=True)
```

Internally, `oe=True`

uses
`expected_values`

to calculate the expected
(average) weight of all edges connecting regions at a certain distance. The matrix
is then simply divided by the expected matrix. You may want to log2-transform the
matrix for a symmetric scale of values:

```
m_chr19_log_oe = hic.matrix(('chr19', 'chr19'), oe=True, log=True)
```