Reads a Simple Interaction Format (SIF) file and converts it into an undirected igraph graph object. Following the Java reference's SIFReader, the column count is taken from the first line: a 2-column file uses columns 1 and 2 as the interacting nodes, a 3-column file uses columns 1 and 3 (the middle interaction-type column is ignored). Node names are upper-cased and self-interactions are discarded.

build_network(sif_path)

Arguments

sif_path

Character string specifying the path to the SIF file. The file should be whitespace/tab-delimited with 2 or 3 columns.

Value

A list with elements: g (an igraph graph), nodes (node names in Java networkNodeList order), nbr (named list of neighbour-name vectors in Java HashSet order), nbr_idx (list of 1-based neighbour-id vectors aligned to nodes, in Java HashSet order, ready for run_greedy_search()) and name2id (named integer vector mapping node name to its index in nodes) and csr_offsets / csr_nbrs (a compressed sparse-row, 0-based adjacency used by the SA / GA component scorer).

Details

To reproduce the Java implementation's greedy search bit-for-bit, this function also reconstructs two order-sensitive structures that the Java code derives from its HashMap/HashSet traversal: * nodes: the node order of Java's networkNodeList (adjacency.keySet() iteration order), via java_node_order(). * nbr_idx: per-node neighbour lists in Java's HashSet iteration order, via java_neighbour_order(). These orders drive both the Monte-Carlo calibration (which shuffles z-scores in node order) and the greedy expansion/removal, so matching them is what makes the R/C++ output align with the Java reference. The igraph object is retained for the SA / GA algorithms, whose component scoring is order-independent.