Property graph - Misplaced Pages

The data model of "property graphs" , "labeled property graphs ", or "attributed graphs " has emerged since the early 2000s as a common denominator of various models of graph-oriented databases . It can be defined informally as follows:

#189810

27-484: Properties take the form of key-value pairs, as used for example in JSON . Keys are defined by character strings. Values are either numeric or also character strings. These properties fall within the usual definition of attributes as understood in entity-attribute-value or object-oriented modeling. This is why the phrase "attributed graph" is relevant. Unlike what is the case with RDF graphs, properties are not arcs of

54-554: A particular set of strings. In this context, the expressive power of a formalism is the set of sets of strings its instances describe, and comparing expressive power is a matter of comparing these sets. An important yardstick for describing the relative expressive power of formalisms in this area is the Chomsky hierarchy . It says, for instance, that regular expressions , nondeterministic finite automata and regular grammars have equal expressive power, while that of context-free grammars

81-696: A principle of parsimony . Alternatively, and more consistently, labels can be defined through type graphs, as special types associated with nodes and arcs. Attributed graphs, as defined above, are especially useful and relevant in that they provide an "umbrella" hypernymic concept ( i.e. common generalization) for several key graph-theoretic models, which have long-since been widely used in classical graph algorithms Knowledge graphs , usually represented as RDF graphs, are in fact hybrid labeled graphs, whose node labels correspond to instance identifiers ( IRI )s or literals , and edge labels identify types (not instances) of predicates . They have now acquired

108-522: A result, they are not fully general; they cannot be used, for example, to implement electronic mail headers (which are ordered and non-unique). In some applications, a name–value pair has a value that contains a nested collection of attribute–value pairs. Some data serialization formats such as JSON support arbitrarily deep nesting. Other data representations are restricted to one level of nesting, such as INI file 's section/name/value. Expressive power (computer science) In computer science ,

135-440: A single stakeholder. However, even in such environments, it may be needed to constrain the representation of specific subsets of the information entered into the database, in a way that may resemble a traditional database schema, while keeping the openness of the overall graph for addition of unforeseen data or configurations. For example, the description of a smart city falls under the open world assumption and will be described by

162-468: A trade-off between expressive power and analyzability. The more a formalism can express, the harder it becomes to understand what instances of the formalism say. Decision problems become harder to answer or completely undecidable . Formal language theory mostly studies formalisms to describe sets of strings , such as context-free grammars and regular expressions . Each instance of a formalism, e.g. each grammar and each regular expression, describes

189-427: A visibility which tends to obscure the longer-established use of graphs as direct model for systems of all kinds. Attributed graphs are, by their versatility and expressivity, the best-adapted for this type of modeling, where graphs which can rightly be called cyber-physical do not merely capture weakly structured about a physical system, as would be the case with a knowledge graph, but attempt to directly capture

216-418: Is a fundamental data representation in computing systems and applications . Designers often desire an open-ended data structure that allows for future extension without modifying existing code or data. In such situations, all or part of the data model may be expressed as a collection of 2-tuples in the form <attribute name, value> with each element being an attribute–value pair. Depending on

243-443: Is concerned, among other things, with database queries , e.g. formulas that, given the contents of a database, specify certain information to be extracted from it. In the predominant relational database paradigm, the contents of a database are described as a finite set of finite mathematical relations; Boolean queries, that always yield true or false , are formulated in first-order logic . It turns out that first-order logic

270-412: Is greater; what this means is that the sets of sets of strings described by the first three formalisms are equal, and a proper subset of the set of sets of strings described by context-free grammars. In this area, the cost of expressive power is a central topic of study. It is known, for instance, that deciding whether two arbitrary regular expressions describe the same set of strings is hard, while doing

297-431: Is lacking in expressive power: it cannot express certain types of Boolean queries, e.g. queries involving transitive closure . However, adding expressive power must be done with care: it must still remain possible to evaluate queries with reasonable efficiency, which is not the case, e.g., for second-order logic . Consequently, a literature sprang up in which many query languages and language constructs were compared on

SECTION 10

#1732898332190

324-577: Is undecidable, a fact known as Rice's Theorem . There are some results on conciseness as well; for instance, nondeterministic finite automata and regular grammars are more concise than regular expressions, in the sense that the latter can be translated to the former without a blowup in size (i.e. in O(1) ), while the reverse is not possible. Similar considerations apply to formalisms that describe not sets of strings, but sets of trees (e.g. XML schema languages ), of graphs , or other structures. Database theory

351-472: The structure of a physical system, as matched by the connectivity structure of the graph. In contrast, an RDF graph would mix structural relationships with attached properties, and category / class information with instance / individuals, drowning out the structure The expressivity of attributed graphs, on the level of higher order logic , is also far above that of RDF graphs, which is limited to first order logic . Properties of relationships, which are at

378-871: The expressive power (also called expressiveness or expressivity ) of a language is the breadth of ideas that can be represented and communicated in that language. The more expressive a language is, the greater the variety and quantity of ideas it can be used to represent. For example, the Web Ontology Language expression language profile (OWL2 EL) lacks ideas (such as negation ) that can be expressed in OWL2 RL (rule language). OWL2 EL may therefore be said to have less expressive power than OWL2 RL. These restrictions allow for more efficient ( polynomial time ) reasoning in OWL2 EL than in OWL2 RL. So OWL2 EL trades some expressive power for more efficient reasoning (processing of

405-417: The knowledge representation language). The term expressive power may be used with a range of meaning. It may mean a measure of the ideas expressible in that language: The first sense dominates in areas of mathematics and logic that deal with the formal description of languages and their meaning, such as formal language theory , mathematical logic and process algebra . In informal discussions,

432-927: The added possibility of specifying relations between these types and constraining them by keys and properties. The type graph is itself a property graph, linked by a relation of graph homomorphism with the graphs of instances that use the types it defines, playing a role similar to that of a schema in a data definition language . The ontologies , thesauri or taxonomies used to reference NGSI-LD types are also defined by graphs, but these are RDF graphs rather than property graphs, and they typically have broader scopes than database schemas. The complementary use, possible with NGSI-LD types, of type graphs and referencing of external ontologies, makes it possible to enforce strong data structuration and consistency, while affording semantic grounding and interoperability . Key-value A name–value pair , also called an attribute–value pair , key–value pair , or field–value pair ,

459-449: The different rows of this relational table, or to instances of the same generic entity / class. With the proposed definition, these labels could in fact be viewed as attributes defined only by a key, without an associated value (this is why κ {\displaystyle \kappa } is defined separately as a binary relation, and π as a partial function). The basic definition thus becomes much clearer, simpler, and satisfies

486-460: The graph proper . This is another reason why it would be preferable to call them attributed graphs, or graphs with properties, rather than "property graphs", which is misleading. Relationships are represented by arcs of the graph. These are often called edges , even though, strictly speaking, edges belong in undirected graphs . Arcs must have an identifier, a source node and a target node, and may have one or more attributes/properties in

513-459: The graph. Labels have a practical rather than theoretical justification, as they were originally intended for users of Entity-Relationship models and relational databases , to facilitate the import of their legacy data sets into graph databases :. labels make it possible to associate the same identifier (that of the relational table, or of the ER entity) to all graph nodes which would correspond to

540-876: The heart of the attributed graph model, require a very cumbersome reification process to be expressed in RDF. The NGSI-LD data model specified by ETSI has been the first attempt to standardize property graphs under a de jure standards body. Compared to the basic model defined here, the NGSI-LD meta-model adds a formal definition of basic categories (entity, relation, property) on the basis of semantic webstandards ( OWL , RDFS , RDF ), which makes it possible to convert all data represented in NGSI-LD into RDF datasets, through JSON-LD serialization. NGSI-LD entities, relations and properties are thus defined by reference to types which can themselves be defined by reference to ontologies , thesauri , taxonomies or microdata vocabularies, for

567-524: The lines of the basic model described here, possibly adding notions of labels, types, and schemas . Graph-oriented databases are, compared to relational databases , touted for not requiring the prior definition of a schema to start populating the base. This is desirable and suitable for environments and applications where one operates under an open world assumption , such as the description of complex systems and systems of systems , characterized by bottom-up organization and evolution, not control of

SECTION 20

#1732898332190

594-478: The particular application and the implementation chosen by programmers, attribute names may or may not be unique. Some of the applications where information is represented as name-value pairs are: Some computer languages implement name–value pairs, or more frequently collections of attribute–value pairs, as standard language features. Most of these implement the general model of an associative array : an unordered list of unique attributes with associated values. As

621-403: The previous sense Building upon widely adopted definitions, a property graph/attributed graph can be defined by a 7-tuple (N , A, P, V, α, κ {\displaystyle \kappa } , π), where A complementary construct, used in several implementations of property graphs with commercial graph databases, is that of labels , which can be associated both with nodes and arcs of

648-451: The purpose of ensuring the semantic interoperability of the corresponding information. The ISO/IEC JTC1/SC32 /WG3 group of ISO , which established the SQL standard, is in the process of specifying a new query language suitable for graph-oriented databases, called GQL (Graph Query Language). This standard will include the specification of a property graph data model, which should be along

675-411: The same for arbitrary context-free grammars is completely impossible . However, it can still be efficiently decided whether any given string is in the set. For more expressive formalisms, this problem can be harder, or even undecidable. For a Turing complete formalism, such as arbitrary formal grammars , not only this problem, but every nontrivial property regarding the set of strings they describe

702-502: The term often refers to the second sense, or to both. This is often the case when discussing programming languages . Efforts have been made to formalize these informal uses of the term. The notion of expressive power is always relative to a particular kind of thing that the language in question can describe, and the term is normally used when comparing languages that describe the same kind of things, or at least comparable kinds of things. The design of languages and formalisms involves

729-443: The upper level of a graph database, without a schema. However, specific technical sub-systems of this city remain top-down closed-world systems managed by a single operator, who may impose a stronger structuring of information, as customarily represented by a schema. The notions of "type graphs" and schemas make it possible to meet this need, with types playing a role similar to that of labels in classical graph databases, but with

#189810