AllegroGraph RDFStore™:
Efficiently Processing Billions of RDF Triples
The Semantic Web RDF standard from W3C is making increasing inroads into government, the enterprise, bioinformatics, telecommunications, and other demanding arenas. Its simplicity and flexibility make it suitable to represent any structured or unstructured knowledge (be it call detail records from telecom providers or intelligence data from Homeland Security). However, for real-world applications the number of RDF triples can easily grow into millions or even billions, making it difficult to process efficiently with traditional tools. Even working knowledge (data) sets often contain millions of triples.
Systems that must load, manipulate, and query such enormous triple data stores require the best possible performance. Faced with this challenge, Franz has used its experience with AllegroCache object database to engineer a massively scalable persistent triple store - AllegroGraph.
AllegroGraph efficiently performs the three most important tasks for a triple store: to load, store, and query data.
- Loading of triples, through its highly optimized RDF/XML and N-Triples parsers, is best-of-breed, particularly on large files. With just standard x86 64-bit hardware, it can load gigabytes of RDF data in minutes.
- Storage is persistent, including between application launches in on-disk binary trees. There is no additional serialization or deserialization overhead.
- Querying is both flexible and performant. Multiple indices support fast access through a simple triple-level API, Allegro Prolog, or SPARQL (the emerging W3C standard RDF query language). When querying for a particular subject with ten triples, AllegroGraph can retrieve about 40,000 triples per second, from disk.
High-performance Storage and Querying
AllegroGraph is designed with microsecond retrieval times as a basic target. On a 2GHz AMD64 quad-processor machine with 16GB of RAM, AllegroGraph hits a peak load speed of 20,000 triples per second, including thorough indexing and on-disk storage. Simple triple queries have achieved speeds with a worst-case time of around 30 microseconds, speeding up to as little as 300 nanoseconds for a cache hit.
Powerful and Expressive Reasoning and Querying
AllegroGraph provides the broadest array of mechanisms to query and access knowledge in RDF triples:
- Low-level APIs allow fast, 'close-to-the-metal' access to triples by subject, predicate, and object.
- RDF Prolog provides concise, powerful, industry-standard, domain-specific reasoning to stored knowledge for building relations on top of RDF data.
- SPARQL, the W3C standard RDF query language, gives native object, RDF, and XML responses to queries. Query over sockets, HTTP, Lisp or a Java / SAIL API.
- RDFS reasoning with owl:SameAs and owl:InverseOf predicates.
Furthermore, RacerPro has been integrated with AllegroGraph, exposing RDF data in AllegroGraph to Racer's highly optimized Description Logic (DL) reasoner. It is most suitable for ontology-driven applications. RacerPro's interfaces also include DIG over HTTP and support for rules (SWRL).
The ability to use a variety of query languages is a great advantage, allowing AllegroGraph to respond closely to the needs of the domain. Standard SPARQL gives you reusable queries and XML results to process with XSLT. Prolog provides domain-specific reasoning and is a well-studied mature language. RacerPro provides all the power of DL inference, enabling ontology-driven applications. And the clean Lisp and Java APIs let one exploit AllegroGraph's fast triple processing for any persistent knowledge needs.
Robustness
AllegroGraph is robust at every level. Transactions provide safety when importing data and adding triples, and fully-journalled modification ensures the consistency and integrity of data in a failure scenario. Modifications can be continually streamed to backup files - even other databases, over the network - for redundancy.
Other Features
AllegroGraph offers facilities beyond the standard RDF requirements. Triples have additional optional fields, usable for source tracking, named graphs, or access control. This feature allows implementation of permissions, trust, and provenance layers easily. To support developing ontology-driven applications, simple inferencing (subclass, subproperty, identity, and inverse relations) is included out-of-the-box.
Platforms
AllegroGraph runs on all popular 64-bit architectures and is 100% cross-platform compatible.
The AllegroGraph Java/HTTP version provides an HTTP interface for loading and retrieving RDF triples, and a Java API for developing large-scale semantic applications in Java.
System Requirements
Though best suited for 64-bit architectures, the current release of AllegroGraph runs on the 32-bit and 64-bit variants of all major operating systems such as Windows, Unix, Linux and Mac OS X. A minimum of 4 GB of memory is recommended.
AllegroGraph Documentation
- AllegroGraph Tutorial
- AllegroGraph Reference Guide
- AllegroGraph Java API Tutorial and Reference
- AllegroGraph SPARQL Tutorial
- twinql API Reference (AllegroGraph's SPARQL implementation)
- Notes on twinqls conformance to the W3C specification
- AllegroGraph LUBM50 Benchmarks
- AllegroGraph performance tuning
AllegroGraph products and try-out options
AllegroGraph version 1.2 is available for the Allegro Common Lisp development environment and as a stand-alone server application for Java developers. A free version of AllegroGraph is available, though limited to 50 million triples.
- Free Java Edition:
Request your personal evaluation license for download at Franz Inc. - Lisp Edition:
View download instructions at Franz Inc.'s web site
For more details such as platform requirements and pricing information please contact us.
