Common interfaces

Note

Before we start introducing the neat little interface functions that FAN-C provides for interacting with genomic regions, region pairs and matrix data, we want to briefly discuss the terminology used by FAN-C:

Chromosome conformation capture data describes associations between pairs of genomic regions. In the literature, you will see them called “bins”, or “loci”, FAN-C generally uses the term “regions” to describe stretches of DNA defined by a set of coordinates. Similarly, associations between genomic regions, such as produced by Hi-C, have been called “interactions”, “contacts”, or “proximity”. While these names are certainly appropriate in some situations, they can also be misleading by implying more than the ligation frequency measured in an experiment. Some studies and tools therefore use other terminology, such as “pixels”. FAN-C uses a term from network analysis: “edges”. In networks, edges connect “nodes” and can be assigned a weight. In a Hi-C “network”, nodes would correspond to regions, and the (normalised) ligation frequency (contact intensity, interaction frequency, proximity, …) is assigned as a weight on an “edge” connecting a pair of regions. Two regions in FAN-C are considered unconnected if their edge weight is 0. Instead of calling the regions connected by an edge “region1” and “region2” or something similar, they are called “source” and “sink” in FAN-C, again borrowing from network terminology. While the edge in Hi-C matrices has no directionality (we cannot say that region A interacts with region B but not vice versa), the convention in FAN-C is that the region index of the source region is smaller that that of the sink region. We believe this is a sufficiently neutral terminology for FAN-C data.

Being able to load different data types with the same command is the first convenient feature of load(). The second is that most objects returned by load() and supported by FAN-C in general share common interfaces that unify and greatly simply handling of different datasets. We will summarise them here to provide you with a rough overview, and the next sections will discuss each interface in detail.

  • RegionBased: FAN-C builds heavily on the genomic_regions package, which provides a unified, powerful interface for region-based data. Its functionality includes iterators over regions that return the same type of object (GenomicRegion) regardless of data origin. You can query regions in a specific interval, bin scores associated with regions, and much more. This interface supports the following file types: BED, GFF, BigWig, Tabix, BEDPE, and most FAN-C files. Find out more in RegionBased.
  • RegionPairsContainer: This interface extends RegionBased by adding properties and functions for pairs of genomic regions and their relationships (“edges”). You can use the powerful fanc.RegionpairsContainer.edges() function to iterate over all edges in an object, query only a subset, normalise edge weights on the fly, hide and reveal filtered edges, and more. This interface supports FAN-C files like ReadPairs and Hic, Cooler single and multi-resolution files and Juicer files. To find out everything about this interface, go to RegionPairsContainer.
  • fanc.RegionMatrixContainer: In many cases, such as Hi-C data, edges and their associated scores (“weights”) can be represented as a matrix. This interface extends RegionPairsContainer with matrix-specific properties and functions, including a versatile matrix() function for retrieving whole-genome or subset matrices with support for on-the-fly normalisation and O/E transformation, masking of unmappable regions and more. This interface supports FAN-C files like Hic and ComparisonMatrix, Cooler single and multi-resolution files and Juicer files. Go to RegionMatrixContainer for all the details.