In Chapter 2, we saw how to represent the state of an application area as a dataset. If a dataset is large, it can be difficult to answer questions based on that dataset. In this chapter, we look at various ways of querying a dataset to find just the information that we need.
The simplest form of query is a true-or-false question. Given a factoid and a dataset, we might want to know whether or not the factoid is true in that dataset. For example, we might want to know whether a person Art is the parent of Bob. Answering an atomic true-or-false question is simply a matter of checking whether the given factoid is a member of the dataset.
A more interesting form of query is a fill-in-the-blanks question. Given a factoid with "blanks", we might want values that, when substituted for the blanks make the query true. For example, we might want to look up the children of Art or the parents of Bill or pairs of parents and children.
An even more interesting form of query is a compound question. We might want values for which a Boolean combination of conditions is true. For example, we might want whether Art is the parent of Bob or the parent of Bud. Or we might want to find all people who have sons and who have no daughters.
We begin this chapter by looking at an extension of our dataset language that allows us to express such questions. In the next section, we define the syntax of our language; and, in the section thereafter, we define its semantics. We then look at some examples of using this language to query datasets. With that introduction behind us, we look at an important syntactic restriction, called safety. And, finally, we finish by discussing useful predefined concepts (e.g. arithmetic operators) that increase the power of our query language.
3.2 Query Syntax
The language of queries includes the language of datasets but provides some additional features that make it more expressive, viz. variables and query rules. Variables allow us to write fill-in-the-blanks queries. Query rules allow us to express compound queries, notably negations (to say that a condition is false), conjunctions (to say that several conditions are all true), and disjunctions (to say that at least one of several conditions is true).
In our query language, a variable is either a lone underscore or a string of letters, digits, and underscores beginning with an upper case letter. For example, _, X23, X_23, and Somebody are all variables.
An atomic sentence, or atom, is analogous to a factoid in a dataset except that the arguments may include variables as well as symbols. For example, if p is a binary predicate and a is a symbol and Y is a variable, then p(a,Y)is an atomic sentence.
A literal is either an atom or a negation of an atom. A simple atom is called a positive literal, The negation of an atom is called a negative literal. In what follows, we write negative literals using the negation sign ~. For example, if p(a,b) is an atom, then ~p(a,b) denotes the negation of this atom. Both are literals.
A query rule is an expression consisting of a distinguished atom, called the head and a collection of zero or more literals, called the body. The literals in the body are called subgoals. The predicate in the head of a query rule must be a new predicate (i.e. not one in the vocabulary of our dataset), and all of the predicates in the body must be dataset predicates.
In what follows, we write rules as in the example shown below. Here, goal(a,b) is the head; p(a,b) & ~q(b) is the body; and p(a,b) and ~q(b) are subgoals.
|goal(a,b) :- p(a,b) & ~q(b)|
As we shall see in the next section, a query rule is something like a reverse implication - it is a statement that the head of the rule (i.e. the overall goal) is true whenever the subgoals are true. For example, the rule above states that goal(a,b) is true if p(a,b) is true and q(b) is not true.
The expressive power of query rules is greatly enhanced through the use of variables. Consider, for example, the rule shown below. This is a more general version of the rule shown above. Instead of applying to just the specific objects a and b it applies to all objects. In this case, the rule states that goal is true of any object X and any object Y if p is true of X and Y and q is not true of Y.
|goal(X,Y) :- p(X,Y) & ~q(Y)|
A query is a non-empty, finite set of query rules. Typically, a query consists of just one rule. In fact, most Logic Programming systems do not support queries with multiple rules (at least not directly). However, queries with multiple rules are sometimes useful and do not add any major complexity, so in what follows we allow for the possibility of queries with multiple rules.
3.3 Query Semantics
An instance of an expression (atom, literal, or rule) is one in which all variables have been consistently replaced by ground terms (i.e. terms without variables). For example, if we have a language with symbols a and b, then the instances of goal(X,Y) :- p(X,Y) & ~q(Y) are shown below.
|goal(a,a) :- p(a,a) & ~q(a)|
|goal(a,b) :- p(a,b) & ~q(b)|
|goal(b,a) :- p(b,a) & ~q(a)|
|goal(b,b) :- p(b,b) & ~q(b)|
Given this notion, we can define the result of the application of a single rule to a dataset. Given a rule r and a dataset Δ, we define v(r,Δ) to be the set of all ψ such that (1) ψ is the head of an arbitrary instance of r, (2) every positive subgoal in the instance is a member of Δ, and (3) no negative subgoal in the instance is a member of Δ.
The extension of a query is the set of all facts that can be "deduced" on the basis of the rules in the program, i.e. it is the union of v(ri,Δ) for each ri in our query.
To illustrate these definitions, consider a dataset describing a small directed graph. In the sentences below, we use symbols to designate the nodes of the graph, and we use the p relation to designate the arcs of the graph.
Now suppose we were given the following query. Here, the predicate goal is defined to be true of every node that has an outgoing arc to another node and also an incoming arc from that node.
|goal(X) :- p(X,Y) & p(Y,X)|
Since there are two variables here and three symbols, there are nine instances of this rule, viz. the ones shown below.
|goal(a) :- p(a,a) & p(a,a)|
|goal(a) :- p(a,b) & p(b,a)|
|goal(a) :- p(a,c) & p(c,a)|
|goal(b) :- p(b,a) & p(a,b)|
|goal(b) :- p(b,b) & p(b,b)|
|goal(b) :- p(b,c) & p(c,b)|
|goal(c) :- p(c,a) & p(a,c)|
|goal(c) :- p(c,b) & p(b,c)|
|goal(c) :- p(c,c) & p(c,c)|
The body in the first of these instances is not satisfied. In fact, the body is true only in the sixth and eighth instances. Consequently, the extension of this query contains just the two atoms shown below.
The definition of semantics in terms of rule instances is simple and clear. However, Logic Programming systems typically do not implement query processing in this way. There are more efficient ways of computing such extensions. In subsequent chapters, we look at some algorithms of this sort.
A query rule is safe if and only if every variable that appears in the head or in any negative literal in the body also appears in at least one positive literal in the body.
The rule shown below is safe. Every variable in the head and every variable in the negative subgoal appears in a positive subgoal in the body. Note that it is okay for the body to contain variables that do not appear in the head.
|goal(X) :- p(X,Y,Z) & ~q(X,Z)|
By contrast, the two rules shown below are not safe. The first rule is not safe because the variable Z appears in the head but does not appear in any positive subgoal. The second rule is not safe because the variable Z appears in a negative subgoal but not in any positive subgoal.
|goal(X,Y,Z) :- p(X,Y)|
|goal(X,Y,X) :- p(X,Y) & ~q(Y,Z)|
To see why safety matters in the case of the first rule, suppose we had a database in which p(a,b) is true. Then, the body of the first rule is satisfied if we let X be a and Y be b. In this case, we can conclude that every corresponding instance of the head is true. But what should we substitute for Z? Intuitively, we could put anything there; but there could be many possibilities. While this is conceptually okay, it is practically problematic.
To see why safety matters in the second rule, suppose we had a database with just two facts, viz. p(a,b) and q(b,c). In this case, if we let X be a and Y be b and Z be anything other than c, then both subgoals are true, and we can conclude goal(a,b,a).
The main problem with this is that many people incorrectly interpret that negation as meaning there is no Z for which q(Y,Z) is true, whereas the correct reading is that q(Y,Z) needs to be false for just one value of Z. As we will see, there are various ways of expressing this second meaning without writing unsafe queries.
3.5 Predefined Concepts
In practical logic programming languages, it is common to predefine useful concepts. These typically include arithmetic functions (such as plus, times, max, min), string functions (such as concatenation), equality and inequality, aggregates (such as countofall), and so forth.
In Epilog, equality and inequality are expressed using the relations same and distinct. The sentence same(σ,τ) is true iff σ and τ are identical. The sentence distinct(σ,τ) is true if and only if σ and τ are different.
The evaluate relation is used to represent equations involving predefined functions. For example, we would write evaluate(plus(times(3,3),times(2,3),1),16) to represent the equation 3^2+2x3+1=16. If height is a binary predicate relating a figure and its height and if width is a binary predicate relating a figure and its width, we can define the area of the object as shown below. The area of X is A if the height of X is H and the width of X is W and A is the result of multiplying H and W.
|goal(X,A) :- height(X,H) & width(X,W) & evaluate(times(H,W),A)|
In logic programming languages that provide such predefined concepts, there are usually syntactic restrictions on their use. For example, if a query contains a subgoal with a comparison relation (such as same and distinct), then every variable that occurs in that subgoal must occur in at least one positive literal in the body and that occurrence must precede the subgoal with the comparison relation. If a query uses evaluate in a subgoal, then any variable that occurs in the first argument of that subgoal must occur in at least one positive literal in the body and that occurrence must precede the subgoal with the arithmetic relation. Details are typically found in the documentation of systems that supply such built-in concepts
In practical logic programming languages, it is also common to include predefined aggregate operators, such as setofall and countofall.
Aggregate operators are typically represented as relations with special syntax. For example the following rule uses the countofall operator to request the number of a person's children. N is the number of children of X if and only if N is the count of all Y such that X is the parent of Y.
|goal(X,N) :- person(X) & evaluate(countofall(Y,parent(X,Y)),N)|
As with special relations, there are syntactic restrictions on their use. In particular, aggregate subgoals must be safe in that all variables in the second argument must be included in the first argument or must be used within positive subgoals of the rule containing the aggregate.
3.6 Example - Kinship
Consider a variation of the Kinship application introduced in Chapter 2. In this case, our vocabulary consists of symbols (representing people) and a binary predicate parent (which is true of two people if and only if the person specified as the first argument is the parent of the person specified as the second argument).
Given data about parenthood expressed using this vocabulary, we can write queries to extract information about other relationships as well. For example, we can find grandparents and grandchildren by writing the query shown below. A person X is the grandparent of a person Z if X is the parent of a person Y and Y is the parent of Z. The variable Y here is a thread variable that connects the first subgoal to the second but does not itself appear in the head of the rule.
|goal(X,Z) :- parent(X,Y) & parent(Y,Z)|
In general, we can write queries with multiple rules. For example, we can collect all of the people mentioned in our dataset by writing the following multi-rule query. In this case the conditions are disjunctive (at least one must be true), whereas the conditions in the grandfather case are conjunctive (both must be true).
|goal(X) :- parent(X,Y)|
|goal(Y) :- parent(X,Y)|
In some cases, it is helpful to use built-in relations in our queries. For example, we can ask for all pairs of people who are siblings by writing the query rule shown below. We use the distinct condition here to avoid listing a person as his own sibling.
|goal(Y,Z) :- parent(X,Y) & parent(X,Z) & distinct(Y,Z)|
While we can express many common kinship relationships using our query language, there are some relationships that are just too difficult. For example, there is no way to ask for all ancestors of a person (parents, grandparents, great grandparents, and so forth). For this, we need the ability to write recursive queries. We show how to write such queries in the chapter on view definitions.
3.7 Example - Map Coloring
Consider the problem of coloring planar maps using only four colors, the idea being to assign each region a color so that no two adjacent regions are assigned the same color.
A typical map is shown below. Here we have six regions. Some are adjacent to each other, meaning that they cannot be assigned the same color. Others are not adjacent, meaning that they can be assigned the same color.
We can enumerate the hues to be used as shown below. The constants red, green, blue, purple, stand for the hues red, green, blue, and purple respectively.
In the case of the map shown above, our goal is to find six hues (one for each region of the map) such that no two adjacent regions have the same hue. We can express this goal by writing the query shown below.
| hue(C1) & hue(C2) & hue(C3) & hue(C4) & hue(C5) & hue(C6) &|
| distinct(C1,C2) & distinct(C1,C3) & distinct(C1,C5) & distinct(C1,C6) &|
| distinct(C2,C3) & distinct(C2,C4) & distinct(C2,C5) & distinct(C2,C6) &|
| distinct(C3,C4) & distinct(C3,C6) & distinct(C5,C6)|
Evaluating this query will result in 6-tuples of hues that ensure that no two adjacent regions have the same color. In problems like this one, we usually want only one solution rather than all solutions. However, finding even one solution is such cases can be costly. In Chapter 4, we discuss ways of writing such queries that makes the process of finding such solutions more efficient.
Exercise 3.1: For each of the following strings, say whether it is a syntactically legal query.
||goal(X) :- p(a,f(f(X)))
||goal(X,Y) :- p(X,Y) & ~p(Y,X)
||~goal(X,Y) :- p(X,Y) & p(Y,X)
||goal(P,Y) :- P(a,Y)
||goal(X) :- p(X,b) & p(X,p(b,c))
Exercise 3.2: Say whether each of the following queries is safe.
||goal(X,Y) :- p(X,Y) & p(Y,X)
|| goal(X,Y) :- p(X,Y) & p(Y,Z)
|| goal(X,Y) :- p(X,X) & p(X,Z)
|| goal(X,Y) :- p(X,Y) & ~p(Y,Z)
|| goal(X,Y) :- p(X,Y) & ~p(Y,Y)
Exercise 3.3: What is the result of evaluating the query goal(X,Z) :- p(X,Y) & p(Y,Z) on the dataset shown below.
Exercise 3.4: Assume we have a dataset with a binary predicate parent (which is true of two people if and only if the person specified as the first argument is the parent of the person specified as the second argument). Write a query that defines the property of being childless. Hint: use the aggregate operator countofall. And be sure your query is safe. (This exercise is not difficult, but it is slightly tricky.)
Exercise 3.5: For each of the following problems, write a query to solve the problem. Values should include just the digits 8, 1, 4, 7, 3 and each digit should be used at most once in the solution of each puzzle. Your query should express the problem as stated, i.e. you should not first solve the problem yourself and then have the query simply return the answer.
||The product of a 1-digit number and a 2-digit number is 284.
||The product of two 2-digit numbers plus a 1-digit number is 3,355.
||The product of a 3-digit number and a 1-digit number minus a 1 digit number is 1,137.
||The product of a 2-digit number and a 3-digit number is between 13,000 and 14,000.
||When a 3-digit number is divided by a 2-digit number the result is between 4 and 6.