Gremlin (query language)

Gremlin
Designed by	Marko A. Rodriguez
Developer	Apache TinkerPop of the Apache Software Foundation
First appeared	2009; 16 years ago
Stable release	3.7.0 / 31 July 2023; 20 months ago
OS	Cross-platform (multi-platform)
License	Apache License 2.0
Website	tinkerpop.apache.org
Dialects
	Gremlin‑Java8, Gremlin‑Groovy, Gremlin‑Python, Gremlin‑Scala, Gremlin‑Clojure, Gremlin‑PHP, Gremlin‑JavaScript, Gremlin‑Typeset
Influenced by
	Regular expression, XPath, Ripple, SPARQL, SQL, Java/JVM

Gremlin is a graph traversal language and virtual machine developed by Apache TinkerPop of the Apache Software Foundation. Gremlin works for both OLTP-based graph databases as well as OLAP-based graph processors. Gremlin's automata and functional language foundation enable Gremlin to naturally support: imperative and declarative querying; host language agnosticism; user-defined domain specific languages; an extensible compiler/optimizer, single- and multi-machine execution models; hybrid depth- and breadth-first evaluation with Turing completeness.^[2]

As an explanatory analogy, Apache TinkerPop and Gremlin are to graph databases what the JDBC and SQL are to relational databases. Likewise, the Gremlin traversal machine is to graph computing as what the Java virtual machine is to general purpose computing.^[3]

History

2009-10-30 the project is born, and immediately named "TinkerPop"
2009-12-25 v0.1 is the first release
2011-05-21 v1.0 is released
2012-05-24 v2.0 is released
2015-01-16 TinkerPop becomes an Apache Incubator project
2015-07-09 v3.0.0-incubating is released
2016-05-23 Apache TinkerPop becomes a top-level project
2016-07-18 v3.1.3 and v3.2.1 are first releases as Apache TinkerPop
2017-12-17 v3.3.1 is released
2018-05-08 v3.3.3 is released
2019-08-05 v3.4.3 is released
2020-02-20 v3.4.6 is released

Vendor integration

Gremlin is an Apache2-licensed graph traversal language that can be used by graph system vendors. There are typically two types of graph system vendors: OLTP graph databases and OLAP graph processors. The table below outlines those graph vendors that support Gremlin.

Vendor	Graph System
Neo4j	graph database
OrientDB	graph database
DataStax Enterprise (5.0+)	graph database
Hadoop (Giraph)	graph processor
Hadoop (Spark)	graph processor
InfiniteGraph	graph database
JanusGraph	graph database
Cosmos DB	graph database
Amazon Neptune	graph database
ArcadeDB	graph database

Traversal examples

The following examples of Gremlin queries and responses in a Gremlin-Groovy environment are relative to a graph representation of the MovieLens dataset.^[4] The dataset includes users who rate movies. Users each have one occupation, and each movie has one or more categories associated with it. The MovieLens graph schema is detailed below.

user--rated[stars:0-5]-->movie
user--occupation-->occupation
movie--category-->category

Simple traversals

For each vertex in the graph, emit its label, then group and count each distinct label.

gremlin> g.V().label().groupCount()
==>[occupation:21, movie:3883, category:18, user:6040]

What year was the oldest movie made?

gremlin> g.V().hasLabel('movie').values('year').min()
==>1919

What is Die Hard's average rating?

gremlin> g.V().has('movie','name','Die Hard').inE('rated').values('stars').mean()
==>4.121848739495798

Projection traversals

For each category, emit a map of its name and the number of movies it represents.

gremlin> g.V().hasLabel('category').as('a','b').
           select('a','b').
             by('name').
             by(inE('category').count())
==>[a:Animation, b:105]
==>[a:Children's, b:251]
==>[a:Comedy, b:1200]
==>[a:Adventure, b:283]
==>[a:Fantasy, b:68]
==>[a:Romance, b:471]
==>[a:Drama, b:1603]
==>[a:Action, b:503]
==>[a:Crime, b:211]
==>[a:Thriller, b:492]
==>[a:Horror, b:343]
==>[a:Sci-Fi, b:276]
==>[a:Documentary, b:127]
==>[a:War, b:143]
==>[a:Musical, b:114]
==>[a:Mystery, b:106]
==>[a:Film-Noir, b:44]
==>[a:Western, b:68]

For each movie with at least 11 ratings, emit a map of its name and average rating. Sort the maps in decreasing order by their average rating. Emit the first 10 maps (i.e. top 10).

gremlin> g.V().hasLabel('movie').as('a','b').
           where(inE('rated').count().is(gt(10))).
           select('a','b').
             by('name').
             by(inE('rated').values('stars').mean()).
           order().by(select('b'),decr).
           limit(10)
==>[a:Sanjuro, b:4.608695652173913]
==>[a:Seven Samurai (The Magnificent Seven), b:4.560509554140127]
==>[a:Shawshank Redemption, The, b:4.554557700942973]
==>[a:Godfather, The, b:4.524966261808367]
==>[a:Close Shave, A, b:4.52054794520548]
==>[a:Usual Suspects, The, b:4.517106001121705]
==>[a:Schindler's List, b:4.510416666666667]
==>[a:Wrong Trousers, The, b:4.507936507936508]
==>[a:Sunset Blvd. (a.k.a. Sunset Boulevard), b:4.491489361702127]
==>[a:Raiders of the Lost Ark, b:4.47772]

Declarative pattern matching traversals

Gremlin supports declarative graph pattern matching similar to SPARQL. For instance, the following query below uses Gremlin's match()-step.

What 80's action movies do 30-something programmers like? Group count the movies by their name and sort the group count map in decreasing order by value. Clip the map to the top 10 and emit the map entries.

gremlin> g.V().
           match(
             __.as('a').hasLabel('movie'),
             __.as('a').out('category').has('name','Action'),
             __.as('a').has('year',between(1980,1990)),
             __.as('a').inE('rated').as('b'),
             __.as('b').has('stars',5),
             __.as('b').outV().as('c'),
             __.as('c').out('occupation').has('name','programmer'),
             __.as('c').has('age',between(30,40))).
           select('a').groupCount().by('name').
           order(local).by(valueDecr).
           limit(local,10)
==>Raiders of the Lost Ark=26
==>Star Wars Episode V - The Empire Strikes Back=26
==>Terminator, The=23
==>Star Wars Episode VI - Return of the Jedi=22
==>Princess Bride, The=19
==>Aliens=18
==>Boat, The (Das Boot)=11
==>Indiana Jones and the Last Crusade=11
==>Star Trek The Wrath of Khan=10
==>Abyss, The=9

OLAP traversal

Which movies are most central in the implicit 5-stars graph?

gremlin> g = graph.traversal(computer(SparkGraphComputer))
==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], sparkgraphcomputer]
gremlin> g.V().repeat(outE('rated').has('stars', 5).inV().
                 groupCount('m').by('name').
                 inE('rated').has('stars', 5).outV()).
               times(4).cap('m')
==>Star Wars Episode IV - A New Hope	  35405394353105332
==>American Beauty	  31943228282020585
==>Raiders of the Lost Ark	31224779793238499
==>Star Wars Episode V - The Empire Strikes Back  30434677119726223
==>Godfather, The	30258518523013057
==>Shawshank Redemption, The	28297717387901031
==>Schindler's List	27539336654199309
==>Silence of the Lambs, The	26736276376806173
==>Fargo	 26531050311325270
==>Matrix, The	 26395118239203191

Gremlin graph traversal machine

Gremlin is a virtual machine composed of an instruction set as well as an execution engine. An analogy is drawn between Gremlin and Java.

Java Ecosystem	Gremlin Ecosystem
Apache Groovy programming language	Gremlin-Groovy
Scala programming language	Gremlin-Scala
Clojure programming language	Gremlin-Clojure
...	...
Java programming language	Gremlin-Java8
Java instruction set	Gremlin step library
Java virtual machine	Gremlin traversal machine

Gremlin steps (instruction set)

The following traversal is a Gremlin traversal in the Gremlin-Java8 dialect.

g.V().as("a").out("knows").as("b").
  select("a","b").
    by("name").
    by("age")

The Gremlin language (i.e. the fluent-style of expressing a graph traversal) can be represented in any host language that supports function composition and function nesting. Due to this simple requirement, there exists various Gremlin dialects including Gremlin-Groovy, Gremlin-Scala, Gremlin-Clojure, etc. The above Gremlin-Java8 traversal is ultimately compiled down to a step sequence called a traversal. A string representation of the traversal above provided below.

[GraphStep([],vertex)@[a], VertexStep(OUT,[knows],vertex)@[b], SelectStep([a, b],[value(name), value(age)])]

The steps are the primitives of the Gremlin graph traversal machine. They are the parameterized instructions that the machine ultimately executes. The Gremlin instruction set is approximately 30 steps. These steps are sufficient to provide general purpose computing and what is typically required to express the common motifs of any graph traversal query.

Given that Gremlin is a language, an instruction set, and a virtual machine, it is possible to design another traversal language that compiles to the Gremlin traversal machine (analogous to how Scala compiles to the JVM). For instance, the popular SPARQL graph pattern match language can be compiled to execute on the Gremlin machine. The following SPARQL query

SELECT ?a ?b ?c
WHERE {
  ?a a Person .
  ?a ex:knows ?b .
  ?a ex:created ?c .
  ?b ex:created ?c .
  ?b ex:age ? d .
    FILTER(?d < 30)
}

would compile to

[GraphStep([],vertex), MatchStep(AND,[[MatchStartStep(a), LabelStep, IsStep(eq(Person)), MatchEndStep], [MatchStartStep(a), VertexStep(OUT,[knows],vertex), MatchEndStep(b)], [MatchStartStep(a), VertexStep(OUT,[created],vertex), MatchEndStep(c)], [MatchStartStep(b), VertexStep(OUT,[created],vertex), MatchEndStep(c)], [MatchStartStep(b), PropertiesStep([age],value), MatchEndStep(d)], [MatchStartStep(d), IsStep(gt(30)), MatchEndStep]]), SelectStep([a, b, c])].

In Gremlin-Java8, the SPARQL query above would be represented as below and compile to the identical Gremlin step sequence (i.e. traversal).

g.V().match(
  as("a").label().is("person"),
  as("a").out("knows").as("b"),
  as("a").out("created").as("c"),
  as("b").out("created").as("c"),
  as("b").values("age").as("d"),
  as("d").is(gt(30))).
    select("a","b","c")

Gremlin Machine (virtual machine)

The Gremlin graph traversal machine can execute on a single machine or across a multi-machine compute cluster. Execution agnosticism allows Gremlin to run over both graph databases (OLTP) and graph processors (OLAP).

References

^ "Apache TinkerPop - Downloads". Retrieved 27 October 2023.
^ Rodriguez, Marko A. (2015). "The Gremlin graph traversal machine and language (invited talk)". The Gremlin Graph Traversal Machine and Language. pp. 1–10. arXiv:1508.03843. doi:10.1145/2815072.2815073. ISBN 9781450339025. S2CID 10533031.
^ "The Benefits of the Gremlin Graph Traversal Machine". 2015-09-14. Retrieved September 17, 2015.
^ "The Gremlin Graph Traversal Language". 2015-08-19. Retrieved August 22, 2015.

External links

Apache TinkerPop Homepage
sql2gremlin.com (TinkerPop2)
Rodriguez, M.A., "The Gremlin Graph Traversal Machine and Language," Proceedings of the ACM Database Programming Languages Conference, October, 2015.

[Gremlin_Releases-1] "Apache TinkerPop - Downloads". Retrieved 27 October 2023.

[Gremlin_Machine_and_Language-2] Rodriguez, Marko A. (2015). "The Gremlin graph traversal machine and language (invited talk)". The Gremlin Graph Traversal Machine and Language. pp. 1–10. arXiv:1508.03843. doi:10.1145/2815072.2815073. ISBN 9781450339025. S2CID 10533031.

[The_Benefits_of_the_Gremlin_Graph_Traversal_Machine-3] "The Benefits of the Gremlin Graph Traversal Machine". 2015-09-14. Retrieved September 17, 2015.

[NoSQLNow_Slides-4] "The Gremlin Graph Traversal Language". 2015-08-19. Retrieved August 22, 2015.

[1]

[2]

[3]

[4]

v t e The Apache Software Foundation
Top-level projects	Accumulo ActiveMQ Airavata Airflow Allura Ambari Ant Aries Arrow Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound Brooklyn Calcite Camel CarbonData Cassandra Cayenne CloudStack Cocoon Cordova CouchDB cTAKES CXF Derby Directory Drill Druid Empire-db Felix Flex Flink Flume FreeMarker Geronimo Groovy Guacamole Gump Hadoop HBase Helix Hive Iceberg Ignite Impala Jackrabbit James Jena JMeter Kafka Kudu Kylin Lucene Mahout Maven MINA mod_perl MyFaces Mynewt NiFi NetBeans Nutch NuttX OFBiz Oozie OpenEJB OpenJPA OpenNLP OрenOffice ORC PDFBox Parquet Phoenix POI Pig Pinot Pivot Qpid Roller RocketMQ Samza Shiro SINGA Sling Solr Spark Storm SpamAssassin Struts 1 Subversion Superset SystemDS Tapestry Thrift Tika TinkerPop Tomcat Trafodion Traffic Server UIMA Velocity Wicket Xalan Xerces XMLBeans Yetus ZooKeeper
Commons	BCEL BSF Daemon Jelly Logging
Incubator	Taverna
Other projects	Batik FOP Ivy Log4j
Attic	Apex AxKit Beehive iBATIS Click Continuum Deltacloud Etch Giraph Hama Harmony Jakarta Marmotta MXNet ODE River Shale Slide Sqoop Stanbol Tuscany Wave XML
Licenses	Apache License
Category

v t e Query languages
In current use	.QL ALPHA CQL Cypher DAX DMX Datalog GraphQL Graph Query Language Gremlin ISBL LDAP LINQ MQL MDX OQL OCL QUEL RDF SMARTS SPARQL SQL XQuery XPath YQL
Proprietary	YQL LINQ
Superseded	CODASYL