Chapter 4
Datasets


4.1 Introduction

In the worksheets we seen thus far, all factoids are stored in a single dataset, viz. lambda. In some applications, it is useful to partition the data into multiple datasets, e.g. to represent different beliefs, different points in time, different locations, and so forth. As we shall see, partitioning data in this way is also useful in organizing data on secondary storage or in the cloud and in building collaborative worksheets.

4.2 Datasets

A dataset is a named collection of factoids. In the worksheets we have seen thus far, there is just one dataset, viz. lambda. However, it is possible to have multiple datasets on a single worksheet.

Each dataset on a worksheet must have a suitable declaration in the HTML file for that worksheet. The format of the declaration is similar to that used in defining lambda. The HTML fragment shown below defines a dataset named monday and a dataset named tuesday as well as the required dataset named lambda.

<dataset id='monday'> p(a,b) p(b,c) </dataset> <dataset id='tuesday'> p(b,d) p(c,e) </dataset> <dataset id='lambda'> value(pagecolor,black) </dataset>

Note that different datasets can have different and even inconsistent data. Here both monday and tuesday contain data on the p predicate but they disagree on the data for these predicates.

When working with datasets, facts stored in a dataset can be accessed and modified using the binary predicate true. The true predicate can be used in conditions and conclusions of operation definition rules. It can also be use in the bodies of view definitions but not in the heads of view definitions.

The view definition, below defines a predicate q that is true of an object if and only if there is a p factoid in monday in which that object appears as first argument.

q(X) :- true(p(X,Y),monday)

The following rule dictates that, if the user clicks the flip button, then the order of arguments in the p predicate should be reversed in the monday dataset.

click(flip) :: true(p(X,Y),monday) ==> ~true(p(X,Y),monday) & true(p(Y,X),monday)

Note that, using true, it is possible to write view definitions that combine information from multiple datasets. For example, the following rule defines r to be true of two objects if and only if p is true of those object in both monday and tuesday.

r(X,Y) :- true(p(X,Y),monday) & true(p(X,Y),tuesday)

Similarly, it is possible for an operation definition to access and modify multiple datasets. The following rule deletes reciprocal p factoids from monday and tuesday when the doit button is pushed.

click(doit) :: true(p(X,Y),monday) & true(p(Y,X),tuesday) ==> ~true(p(X,Y),monday) & ~true(p(Y,X),tuesday)

The dataset named lambda is special in that we do not need true to access or modify the data in that dataset. For example, the following rule defines the s predicate in terms of factoids in lambda, monday, and tuesday.

s(X,Y) :- p(X,Y) & true(p(X,Y),monday) & true(p(X,Y),tuesday)

As mentioned in the introduction, datasets are useful for organizing data. They are especially useful for distinguishing syntactic data (such as the values of input fields and the colors of nodes) from semantic data (which capture the state of the worksheet's application area). Datasets are useful in situations where there can be different, possibly inconsistent data. And they are especially useful in supporting persistent storage and collaboration, as described in the following sections.

4.3 Persistent Storage

One of the benefits of datasets is that they provide a way of connecting some or all of the data on a worksheet with files in secondary storage or in the cloud. This is important to preserve data so that it can be modified at different times, in different browsers, on different machines.

In order to enable persistent storage using the Worksheets cloud storage, we must annotate the dataset with the name of a file as the value of the src attribute of the dataset. For example, the following HTML fragment lists source files for the datasets monday and tuesday but not wednesday.

<dataset id='monday' src='monday'> p(a,b) p(b,c) </dataset> <dataset id='tuesday' src='tuesday'> p(b,d) p(c,e) </dataset> <dataset id='wednesday'> p(c,d) p(d,e) </dataset>

Note that the lambda dataset is typically used for local storage and is often not saved. If we want to store the data in lambda, we can accomplish this by replicating that data in a named dataset.

The simplest way to enable persistent storage is by providing a user it Save and/or Load buttons. Clicking the Save button saves the data for all datasets with src attributes to the corresponding files. Clicking the Load button loads data from those files into the corresponding datasets.

It is also possible to enable automatic storage and retrieval. This is done by setting the broadcast and/or reception attributes of a dataset to true, as in the following examples.

<dataset id='monday' src='monday' broadcast='true' reception='true'> p(a,b) p(b,c) </dataset> <dataset id='tuesday' src='tuesday' broadcast='true' reception='true'> p(b,d) p(c,e) </dataset>

When automatic storage is activated (i.e. when broadcast is true), the corresponding dataset is written to the named source file whenever the dataset is changed. When automatic retrieval is activated (i.e. when reception is true, the corresponding data file is periodically loaded into the dataset (overwriting any local data in that dataset).

Note that, when a dataset is loaded i.e. when the corresponding file is loaded, all of the widgets in the worksheet's layout are refreshed except the widget that is currently in focus (if any). This is done to avoid resetting the value of a widget that the user is currently changing.

4.4. Collaboration

Another benefit of datasets is that they support collaboration among worksheets. To enable such collaboration, we assign the src files to the datasets on those worksheets and we set the broadcast and reception attributes to true.

As an example of collaboration among different instances of the same worksheet, let's create a basic chatroom (where users can share messages with each other).

First, we add a text area widget with identifier chatroom. This widget serves as the chat window. We then add a single-line text field with identifier message below chatroom. See below.

As users type messages into this text field and hit return, we want those messages appear in chatroom on that worksheet and in the chatroom widget on all other instances of that worksheet.

Next, we add the following rules to our worksheet.

nonempty :- content(X)
select(message,X) :: ~nonempty ==> true(content(X),mychat)
select(message,X) :: true(content(Y),mychat) & stringappend(Y," ",X,Z) ==> true(content(Z),mychat) & ~content(Y)
value(chatroom,X) :- true(content(X),mychat)

The first rule here defines a relation called nonempty that checks whether or not chatroom is empty.

If the chatroom is empty i.e. ~nonempty, then the second rule adds the fact content(X) to the dataset mychat where X is the text entered into the message widget.

If there is some fact of the form content(Y) in the dataset mychat, then as text X is entered into the message widget, content(Y) is removed from mychat and the fact content(Z) is added to mychat, where Z is the string formed by appending a new line and X to the string Y.

The final rule, sets the value of the multi-line widget chatroom to reflect the fact of the form content(X) that is in dataset mychat.

Now, let's consider an example of collaboration across heterogeneous worksheets. We start with two worksheets coursesheet1, and coursesheet2. In both cases, we specify a dataset with the same source and with broadcast and reception set to true.

In the first worksheet, we add four checkboxes with chosen as name and the values cs103, cs109, cs121, and cs157 respectively.

CS 103    CS 109    CS 121    CS 157

In the second worksheet, we add a multi-valued selector with identifier chosen, and the option values cs103, cs109, cs121, and cs157.

On each worksheet, we create a dataset to hold our data. Importantly, we specify the same value for the src attribute of the dataset on each worksheet, and we set the broadcast and reception attributes to true, as shown below.

<dataset id='data' src='mydata' broadcast='true' reception='true'> </dataset>

Finally, we add the following rules to the library on both worksheets.

<ruleset id='library'> select(chosen,X) :: true(taken(X),mydata) de​select(chosen,X) :: ~true(taken(X),mydata) holds(chosen,X) :- true(taken(X),mydata) </ruleset>

Given these definitions, if we open both worksheets side by side and make an edit to one of the worksheets, we will see that the other worksheet updates accordingly.