Knowledge Interchange Format Specification

Chapter 2: Syntax


Introduction

As with many computer-oriented languages, the syntax of KIF is most easily described in three layers. First, there are the basic characters of the language. These characters can be combined to form lexemes. Finally, the lexemes of the language can be combined to form grammatically legal expressions. Although this layering is not strictly esential to the specification of KIF, it simplifies the description of the syntax by dealing with white space at the lexeme level and eliminating that detail from the expression level.

In this chapter, the syntax of KIF is presented using a modified BNF notation. All nonterminals and BNF punctuation are written in boldface, while characters in KIF are expressed in plain font. The notation {x1,...,xn} means the set of terminals x1,...,xn. The notation [nonterminal] means zero or one instances of nonterminal; nonterminal* means zero or more occurrences; nonterminal+ means one or more occurrences; nonterminal^n means n occurrences. The notation nonterminal1 - nonterminal2 refers to all of the members of nonterminal1 except for those in nonterminal2. The notation int(n) denotes the decimal representation of integer n. The nonterminals space, tab, return, linefeed, and page refer to the characters corresponding to ascii codes 32, 9, 13, 10, and 12, respectively. The nonterminal character denotes the set of all 128 ascii characters. The nonterminal empty denotes the empty string.

Characters

The alphabet of KIF consists of 7 bit blocks of data. In this document, we refer to KIF data blocks via their usual ASCII encodings as characters (as given in ref). Support for the universal unicode character set can accomplished in programs that use KIF through the use of an escape mechanism whereby each unicode character is represented by a sequence of ASCII characters. No standard syntax for this has been chosen as yet, but a choice will be made at the next meeting of the X3T2 committee.

KIF characters are classified as upper case letters, lower case letters, digits, alpha characters (non-alphabetic characters that are used in the same way that letters are used), special characters, white space, and other characters (every ascii character that is not in one of the other categories).


    upper ::= A | B | C | D | E | F | G | H | I | J | K | L | M | 
              N | O | P | Q | R | S | T | U | V | W | X | Y | Z

    lower ::= a | b | c | d | e | f | g | h | i | j | k | l | m | 
              n | o | p | q | r | s | t | u | v | w | x | y | z

    digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

    alpha ::= ! | $ | % | & | * | + | - | . | / | < | = | > | ? |
              @ | _ | ~ |

    special ::= " | # | ' | ( | ) | , | \ | ^ | `

    white ::= space | tab | return | linefeed | page
A normal character is either an upper case, a lower case character, a digit, or an alpha character.

    normal ::= upper | lower | digit | alpha

Lexemes

The process of converting characters into lexemes in called lexical analysis. The input to this process is a stream of characters, and the output is a stream of lexemes.

The function of a lexical analyzer is cyclic. It reads characters from the input string until it encounters a character that cannot be combined with previous characters to form a legal lexeme. When this happens, it outputs the lexeme corresponding to the previously read characters. It then starts the process over again with the new character. Whitespace causes a break in the lexical analysis process but otherwise is discarded.

There are five types of lexemes in KIF -- special lexemes, words, character references, character strings, and character blocks.

Each special character forms its own lexeme. It cannot be combined with other characters to form more complex lexemes, except through the ``escape'' syntax described below.

A word is a contiguous sequence of (1) normal characters or (2) other characters preceded by the escape character \.


    word ::= normal | word normal | word\character

Except for characters following \, the lexical analysis of words is case insensitive. The output lexeme for any word corresponds to the lexeme obtained by converting all letters not following \ to their upper case equivalents. For example, the word abc and the word ABC map into the same lexeme. The word a\bc maps into the same lexeme as the word A\bC, which is not the same as the lexeme for the word ABC.

A character reference consists of the characters #, \, and any character. Character references allow us to refer to characters as characters and differentiate them from one-character symbols, which may refer to other objects.


    charref ::= #\character
A character string is a series of characters enclosed in quotation marks. The escape character \ is used to permit the inclusion of quotation marks and the \ character itself within such strings.


    string ::= "quotable"

    quotable ::= empty | quotable strchar | quotable\character

    strchar ::= character - {",\}
Sometimes it is desirable to group together a sequence of arbitrary bits or characters without imposing escape characters, e.g. to encode images, audio, or video in special formats. Character blocks permit this sort of grouping through the use of a prefix that specifies how many of the following characters are to grouped together in this way. A character block consists of the character # followed by the decimal encoding of a positive integer n, the character q or Q, and then n arbitrary characters.


    block ::= # int(n) q character^n | # int(n) Q character^n
For the purpose of grammatical analysis, it is useful to subdivide the class of words a little further, viz. as variables, operators, and constants.

A variable is a word in which the first character is ? or @. A variable that begins with ? is called an individual variable. A variable that begins with an @ is called a sequence variable.


    variable ::= indvar | seqvar

    indvar ::= ?word

    seqvar ::= @word
Operators are used in forming complex expressions of various sorts. There are three types of operators in KIF -- term operators, sentence operators, and definition operators. Term operators are used in forming complex terms. Sentence operators and user operators are used in forming complex sentences. Definition operators are used in forming definitions.


    operator ::= termop | sentop | defop

    termop ::= listof | quote | if

    sentop ::= = | /= | not | and | or | => | <= | <=> |
               forall | exists

    defop ::= defobject | defunction | defrelation | deflogical |
              := | :-> | :<= | :=>
All other words are called constants.


    constant ::= word - variable - operator
Semantically, there are four categories of constants in KIF -- object constants, function constants, relation constants, and logical constants. Object constants are used to denote individual objects. Function constants denote functions on those objects. Relation constants denote relations. Logical constants express conditions about the world and are either true or false. KIF is unusual among logical languages in that there is no syntactic distinction among these four types of constants; any constant can be used where any other constant can be used. The differences between these categories of constants is entirely semantic.

Expressions

The legal expressions of KIF are formed from lexemes according to the rules presented in this section. There are three disjoint types of expressions in the language -- terms, sentences, and definitions. Terms are used to denote objects in the world being described; sentences are used to express facts about the world; and definitions are used to define constants. Definitions and sentences are called forms. A knowledge base is a finite set of forms.

There are nine types of terms in KIF -- individual variables, constants, character references, character strings, character blocks, functional terms, list terms, quotations, and logical terms. Individual variables, constants, character references, strings, and blocks were discussed earlier.


    term ::= indvar | constant | charref | string | block |
             funterm | listterm | quoterm | logterm
A functional term consists of a function constant and an arbitrary number of argument terms, terminated by an optional sequence variable and surrounded by matching parentheses. Note that there is no syntactic restriction on the number of argument terms -- the same function constant can be applied to different numbers of arguments; arity restrictions in KIF are treated semantically.


    funterm ::= (constant term* [seqvar])
A list term consists of the listof operator and a finite list of terms, terminated by an optional sequence variable and enclosed in matching parentheses.


    listterm ::= (listof term* [seqvar])
Quotations involve the quote operator and an arbitrary list expression. A list expression is either an atom or a sequence of list expressions surrounded by parentheses. An atom is either a word or a character reference or a character string or a character block. Note that the list expression embedded within a quotation need not be a legal expression in KIF.


    quoterm ::= (quote listexpr) | 'listexpr

    listexpr ::= atom | (listexpr*)

    atom ::= word | charref | string | block
Logical terms involve the if and cond operators. The if form allows for the testing of a single condition or multiple conditions. An optional term at the end allows for the specification of a default value when all of the conditions are false. The cond form is similar but groups the pairs of sentences and terms within parentheses and has no optional term at the end.


    logterm ::= (if logpair+ [term])

    logpair ::= sentence term

    logterm ::= (cond logitem*)

    logitem ::= (sentence term)
The following BNF defines the set of legal sentences in KIF. There are six types of sentences. We have already mentioned logical constants.


    sentence ::= constant | equation | inequality | 
                 relsent | logsent | quantsent
An equation consists of the = operator and two terms. An inequality consist of the /= operator and two terms.


    equation ::= (= term term)

    inequality ::= (/= term term)
A relational sentence consists of a relation constant and an arbitrary number of argument terms, terminated by an optional sequence variable. As with functional terms, there is no syntactic restriction on the number of argument terms in a relation sentence -- the same relation constant can be applied to any finite number of arguments.


    relsent ::= (constant term* [seqvar])

It is noteworthy that the syntax of relational sentences is the same as that of functional terms. On the other hand, their meanings are different. Fortunately, the context of each such expression determines its type (as an embedded term in one case or as a top-level sentence or argument to some sentential operator in the other case); and so this slight ambiguity causes no problems.

The syntax of logical sentences depends on the logical operator involved. A sentence involving the not operator is called a negation. A sentence involving the and operator is called a conjunction, and the arguments are called conjuncts. A sentence involving the or operator is called a disjunction, and the arguments are called disjuncts. A sentence involving the => operator is called an implication; all of its arguments but the last are called antecedents; and the last argument is called the consequent. A sentence involving the <= operator is called a reverse implication; its first argument is called the consequent; and the remaining arguments are called the antecedents. A sentence involving the <=> operator is called an equivalence.


    logsent ::= (not sentence) |
                (and sentence*) |
                (or sentence*) |
                (=> sentence* sentence) |
                (<= sentence sentence*) |
                (<=> sentence sentence)
There are two types of quantified sentences -- a universally quantified sentence is signalled by the use of the forall operator, and an existentially quantified sentence is signalled by the use of the exists operator. The first argument in each case is a list of variable specifications. A variable specification is either a variable or a list consisting of a variable and a term denoting a relation that restricts the domain of the specified variable.


    quantsent ::= (forall (varspec+) sentence) |
                  (exists (varspec+) sentence)

    varspec ::= variable | (variable constant)
Note that, according to these rules, it is permissible to write sentences with free variables, i.e. variables that do not occur within the scope of any enclosing quantifiers. The significance of the free variables in a sentence depends on the use of the sentence. When we assert the truth of a sentence with free variables, we are, in effect, saying that the sentence is true for all values of the free variables, i.e. the variables are universally quantified. When we ask whether a sentence with free variables is true, we are, in effect, asking whether there are any values for the free variables for which the sentence is true, i.e. the variables are existentially quantified.

The following BNF defines the set of legal KIF definitions. There are three types of definitions -- unrestricted, complete, and partial. Within each type, there are four cases, one for each category of constant. Object constants are defined using the defobject operator. Function constants are defined using the deffunction operator. Relation constants are defined using the defrelation operator. Logical constants are defined using the deflogical operator.


    definition ::= unrestricted | complete | partial

    unrestricted ::=
      (defobject constant [string] sentence*) |
      (deffunction constant [string] sentence*) |
      (defrelation constant [string] sentence*) |
      (deflogical constant [string] sentence*)

    complete ::=
      (defobject constant [string] := term) |
      (deffunction constant (indvar* [seqvar]) [string] := term) |
      (defrelation constant (indvar* [seqvar]) [string] := sentence) |
      (deflogical constant [string] := sentence)

    partial ::=
      (defobject constant [string] :-> indvar :<= sentence) |
      (defobject constant [string] :-> indvar :=> sentence) |
      (deffunction constant (indvar* [seqvar]) [string]
        :-> indvar :<= sentence) |
      (deffunction constant (indvar* [seqvar]) [string]
        :-> indvar :=> sentence) |
      (defrelation constant (indvar* [seqvar]) [string]
        :<= sentence) |
      (defrelation constant (indvar* [seqvar]) [string]
        :=> sentence) |
      (deflogical constant [string] :<= sentence)
      (deflogical constant [string] :=> sentence)
A form in KIF is either a sentence or a definition.


    form ::= sentence | definition
It is important to note that definitions are top level constructs. While definitions contain sentences, they are not themselves sentences and, therefore, cannot be written as constituent parts of sentences or other definitions.

A knowledge base is a finite set of forms. It is important to keep in mind that a knowledge base is a set of sentences, not a sequence; and, therefore, the order of forms within a knowledge base is unimportant. Order may have heuristic value to deductive programs by suggesting an order in which to use those sentences; however, this implicit approach to knowledge exchange lies outside of the definition of KIF.


Michael R. Genesereth