Quick Start

In this short tutorial, we will qucikly go through the basic and the advanced usage of SGL. The tutorial is composed of following parts:

Basic usage

In this part, we will introduce the basic usage of SGL, including how to excute graph-related tasks and how to use the NAS (Neural Architecture Search) functionality.

Auto neural architrcture search (TODO)

Advanced usage

In this part, we will introduce the advanced usage of SGL, including adopting user-defined datasets, building models under SGAP paradigm, implementing new graph operators and message operators.

Adopt user-defined datasets

SGL designs two base classes, NodeDataset and HeteroNodeDataset, for the homogeneous graph datasets and the heterogeneous graph datasets, respectively. We will take implementing a homogeneous graph dataset as an example below to explain how to adopt user-defined datasets.

To implement a new homogeneous graph dataset, one has to first to inherit the base class NodeDataset, whose detailed introduction can be found in the data part. Then, there exist two important virtual functions to implement:

  • download: download the raw files of the dataset from the Interent and store them in pre-defined places;

  • process: process the raw files fetched by download and store the processed file defined by the data class Graph.

The data class Graph is designed to store the critical data for the homogeneous graph; the corresponding data class for the heterogeneous graph is HeteroGraph. To instantiate Graph, one needs to at least provide the following information:

  • row: the row index of the edges in the graph;

  • col: the column index of the edges in the graph;

  • edge_weight: the weight of the edges in the graph;

  • edge_type: the type of the edges in the graph;

  • num_node: the total number of nodes in the graph;

  • node_type: the type of the nodes in the graph.

The datasets in the datasets part all follow the same construction scheme.

Please refer to the data part for more detailed introduction of the two base classes, NodeDataset and HeteroNodeDataset.

Build models under SGAP paradigm

SGL adopts the SGAP (Scalable Graph Architecture Paradigm) as its training paradigm. Corresponding to that, the model construction paradigm differs from the conventional message passing paradigm. The detailed introduciton of the model construction paradigm of SGL is provided in overview. Below will explain how to build a SGC in SGL.

As introduced in overview, a GNN model in SGL is composed of five parts:

  • pre_graph_op, pre_msg_op: Graph Operator and Message Operator for the Preprocessing stage;

  • base_model: Base Model for the Training stage;

  • post_graph_op, post_msg_op: Graph Operator and Message Operator for the Postprocessing stage.

Thus, users only have to assign each module with pre-/user-defined Graph Operator/Message operator/Base Model when building models after inheriting the base class BaseSGAPModel. The behaviors of the adopted different Graph Operators, Message Operators and Base Models determine the behaviors of the built GNN models. The code of building SGC is provided below:

from sgl.models.base_model import BaseSGAPModel
from sgl.models.simple_models import LogisticRegression
from sgl.operators.graph_op import LaplacianGraphOp
from sgl.operators.message_op import LastMessageOp


class SGC(BaseSGAPModel):
    def __init__(self, prop_steps, feat_dim, output_dim):
        super(SGC, self).__init__(prop_steps, feat_dim, output_dim)

        self._pre_graph_op = LaplacianGraphOp(prop_steps, r=0.5)
        self._pre_msg_op = LastMessageOp()
        self._base_model = LogisticRegression(feat_dim, output_dim)

Note

The LaplacianGraphOp, LastMessageOp,and LogisticRegreesion are pre-defined Graph Operator, Message Operator, and Base Model, respectively.

Note

SGC does not have the Postprocessing stage in its training process. Thus, the modules used for the Postprocessing stage do not exist in the construction of SGC.

In the following parts of this tutorial, we will introduce ways to implement new Graph Operators and Message Operators.

Implement new Graph Operators

As introduced in overview, the behaviors of the Graph Operators can be represented as follows: \(\textbf{M}=graph\_propagate(\textbf{A}, \textbf{X})\). Thus, the critical part of implementing new Graph Operators is to determine the value of the matrix \(\textbf{A}\).

In SGL, users only need to implement the virtual function construct_adj, which takes in the original adjacency matrix of the graph and outputs the desired propagation matrix after inheriting the base class GraphOp. Below is the implementation of the PPR (Personalized PageRank) Graph Operator:

class PprGraphOp(GraphOp):
    def __init__(self, prop_steps, r=0.5, alpha=0.15):
        super(PprGraphOp, self).__init__(prop_steps)
        self.__r = r
        self.__alpha = alpha

    def _construct_adj(self, adj):
        adj_normalized = adj_to_symmetric_norm(adj, self.__r)
        adj_normalized = (1 - self.__alpha) * adj_normalized + self.__alpha * sp.eye(adj.shape[0])
        return adj_normalized.tocsr()

Please refer to operators part for more detailed introduction.

Implement new Message Operators

Similar to implementing new Graph Operators, implementing new Message Operators is easy in SGL. The users need to determine the behaviors of the new Message Operators represented in \(\textbf{X}'=message\_aggregate(\textbf{M})\).

Practically speaking, users have to implement the virtual function combine function after inheriting the base class MessageOp. The code below provides the implementation of the ConcatMessageOp in SGL:

class ConcatMessageOp(MessageOp):
    def __init__(self, start, end):
        super(ConcatMessageOp, self).__init__(start, end)
        self._aggr_type = "concat"

    def _combine(self, feat_list):
        return torch.hstack(feat_list[self._start:self._end])

Please refer to operators part for more detailed introduction.