Tutorial: Develop your Algorithm in Java with PIE Model¶
GraphScope provides sufficient support for java programmers
to implement graph algorithms and run it on GraphScope Analytical Engine.
In this tutorial, you will first try to explore GraphScope JavaSDK with some example algorithms, then implement your own algorithm, finally submit to GraphScope.
Run example algorithms with example jar¶
An example jar which contains implementation of several graph algorithms(i.e. PageRank, SSSP, BFS) is provided in grape-demo.jar. You can run the graph algorithms provided in this jar by submitting the downloaded jar to GraphScope.
Here we provide an example to run SSSP
on p2p dataset.
import graphscope
from graphscope.dataset import load_p2p_network
from graphscope.framework.app import load_app
# turn on this line to enable log verbose
# graphscope.set_option(show_log=True)
"""Or launch session in k8s cluster"""
sess = graphscope.session(cluster_type='hosts')
graph = load_p2p_network(sess)
"""Java algorithm need to run on simple graph"""
graph = graph.project(vertices={"host": ['id']}, edges={"connect": ["dist"]})
sess.add_lib("/home/graphscope/grape-demo-0.19.0-shaded.jar") # replace path to grape-demo.jar
sssp=load_app(algo="java_pie:com.alibaba.graphscope.example.sssp.SSSP")
ctx=sssp(graph._project_to_simple(), src=6, threadNum=1)
"""Fetch the result via context"""
ctx.to_numpy("r")
For more info about GraphScope Python client, please refer to GraphScope Python Doc.
Implement your own algorithms in Java.¶
To implement java graph algorithms runnable on GraphScope, all you need is GRAPE-jdk
, which provide a PIE programming interface.
Prepare GRAPE-jdk¶
You can include GRAPE-jdk
as a dependency in your maven project by adding following configuration.
Get from Maven Central Repository¶
Find the latest version available on Maven-Central-Repository
Build from source¶
git clone https://github.com/alibaba/GraphScope
cd GraphScope/analytical_engine/java
mvn clean install -DskipTests # the version is specified in pom.xml's revision property
<dependency>
<groupId>com.alibaba.graphscope</groupId>
<artifactId>grape-jdk</artifactId>
<version>${grape-jdk-version}</version>
</dependency>
To address the dependency issue in jar packaging, you shall package all your
dependent jars in a fat-jar. For example, you can use maven-shade-plugin
to
include runtime jars in your fat-jar.
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.4</version>
</plugin>
Implement your algorithm¶
Different from the pregel interface provided by Apache Giraph and Spark GraphX, GRAPE-jdk
provides user with PIE programming interface. Unlike pregel’s vertex-centric
interface, PIE
models graph computing in a subgraph-centric
manner.
In PIE
model, the program requires less supersteps and the size of generated message has been drastically reduced, which lead to great performance improvement.
To implement a PIE
algorithm, you need to provide two separate functions, PEval
and IncEval
. PEval
function will be execute only once at the first round of computation, and IncEval
will be called for multiple times until convergence. You are also supposed to provide a class called Context
. You can put intermediate
results, init configuration in this class. The init
method will be called before
PEval
.
Here we provide a simple PIE algorithm which simply traverse the graph.
public class Traverse implements ParallelAppBase<Long, Long, Long, Long, TraverseContext>,
ParallelEngine {
@Override
public void PEval(IFragment<Long, Long, Long, Long> fragment,
ParallelContextBase<Long, Long, Long, Long> context,
ParallelMessageManager messageManager) {
TraverseContext ctx = (TraverseContext) context;
for (Vertex<Long> vertex : fragment.innerVertices()) {
AdjList<Long, Long> adjList = fragment.getOutgoingAdjList(vertex);
for (Nbr<Long, Long> nbr : adjList.iterator()) {
Vertex<Long> dst = nbr.neighbor();
//Update largest distance for current vertex
ctx.vertexArray.setValue(vertex, Math.max(nbr.data(), ctx.vertexArray.get(vertex)));
}
}
messageManager.forceContinue();
}
@Override
public void IncEval(IFragment<Long, Long, Long, Long> fragment,
ParallelContextBase<Long, Long, Long, Long> context,
ParallelMessageManager messageManager) {
TraverseContext ctx = (TraverseContext) context;
for (Vertex<Long> vertex : fragment.innerVertices()) {
AdjList<Long, Long> adjList = fragment.getOutgoingAdjList(vertex);
for (Nbr<Long, Long> nbr : adjList.iterator()) {
Vertex<Long> dst = nbr.neighbor();
//Update largest distance for current vertex
ctx.vertexArray.setValue(vertex, Math.max(nbr.data(), ctx.vertexArray.get(vertex)));
}
}
}
}
The corresponding context class parse the input parameter maxIteration
from a JSONObject.
public class TraverseContext extends
VertexDataContext<IFragment<Long, Long, Long, Long>, Long> implements ParallelContextBase<Long,Long,Long,Long> {
public GSVertexArray<Long> vertexArray;
public int maxIteration;
@Override
public void Init(IFragment<Long, Long, Long, Long> frag,
ParallelMessageManager messageManager, JSONObject jsonObject) {
createFFIContext(frag, Long.class, false);
//This vertex Array is created by our framework. Data stored in this array will be available
//after execution, you can receive them by invoking method provided in Python Context.
vertexArray = data();
maxIteration = 10;
if (jsonObject.containsKey("maxIteration")){
maxIteration = jsonObject.getInteger("maxIteration");
}
}
@Override
public void Output(IFragment<Long, Long, Long, Long> frag) {
//You can also write output logic in this function.
}
}
For more detail usages of GRAPE-jdk
, you can check GRAPE-jdk JavaDoc. For more example apps, please refer GraphScope/gs-algos.
Submit to GraphScope¶
To run your algorithm on GraphScope, your first need to obtain a fat-jar which contains your implementation.
mvn package
Then a jar with name {artifact}*-shaded.jar
shall be found under target/
.
By submit this jar to graphscope, you can run any algorithms contained with proper
parameters.
import graphscope
from graphscope import JavaApp
from graphscope.dataset import load_p2p_network
"""Or launch session in k8s cluster"""
sess = graphscope.session(cluster_type='hosts')
graph = load_p2p_network(sess)
graph = graph.project(vertices={"host": ['id']}, edges={"connect": ["dist"]})
# you can also use your own graph, refer to graphscope load graph tutorial.
# But remember project to single property graph before running algorithms.
sess.add_lib("{full/path/to/your/packed/jar}") # *-shaded.jar
app=load_app(algo="java_pie:com.alibaba.graphscope.example.Traverse") # java_pie:{you-full-class-name}
ctx=app(graph, "{param string}") # a=b,c=d
After computation, you can obtain the results stored in context with the help of Context
.
GraphScope JavaSDK with GitHub Template¶
If you don’t bother creating new project to try GRAPE-jdk
, we provide a template project GraphScope-Java-template. By click Use this template, you can create a new repository with same files and structure of the template repository. You can then try developing your own algorithms in this project.