R/active_snw_search_utils.R
build_network.RdReads a Simple Interaction Format (SIF) file and converts it into an
undirected igraph graph object. Following the Java
reference's SIFReader, the column count is taken from the first line: a 2-column
file uses columns 1 and 2 as the interacting nodes, a 3-column file uses columns
1 and 3 (the middle interaction-type column is ignored). Node names are
upper-cased and self-interactions are discarded.
build_network(sif_path)A list with elements: g (an igraph graph),
nodes (node names in Java networkNodeList order),
nbr (named list of neighbour-name vectors in Java HashSet order),
nbr_idx (list of 1-based neighbour-id vectors aligned to nodes,
in Java HashSet order, ready for run_greedy_search()) and
name2id (named integer vector mapping node name to its index in
nodes) and csr_offsets / csr_nbrs (a compressed
sparse-row, 0-based adjacency used by the SA / GA component scorer).
To reproduce the Java implementation's greedy search bit-for-bit, this function
also reconstructs two order-sensitive structures that the Java code derives from
its HashMap/HashSet traversal:
* nodes: the node order of Java's networkNodeList
(adjacency.keySet() iteration order), via java_node_order().
* nbr_idx: per-node neighbour lists in Java's HashSet iteration
order, via java_neighbour_order().
These orders drive both the Monte-Carlo calibration (which shuffles z-scores in
node order) and the greedy expansion/removal, so matching them is what makes the
R/C++ output align with the Java reference. The igraph object is retained
for the SA / GA algorithms, whose component scoring is order-independent.