CHAPTER 14   The External Database System

 

In this chapter, we cover Visual Prolog's external database system. An external database is composed of an external collection of chained terms; these chains give you direct access to data that is not a part of your Prolog program. The external database can be stored in any one of three locations: in a file, in memory, or in EMS-type expanded memory under DOS. The external database supports B+ trees, which provide fast data retrieval and the ability to sort quickly, and it supports multi-user access by a mechanism for serializing the file accesses inside transactions.

External Databases in Visual Prolog

Visual Prolog's internal fact¡¯s database, which uses asserta, assertz, retract, and retractall, is very simple to use and suitable for many applications. However, the RAM requirements of a database can easily exceed the capacity of your computer; the external database system has been designed partly with this problem in mind. For example, you might want to implement one or more of the following:

a stock control system with an large number of records

an expert system with many relations but only a few records with complicated structures

a filing system in which you store large text files in the database

your own database product--which maybe has nothing to do with a relational database system--in which data is linked together in other, nonrelational ways

a system including several of these possibilities

Visual Prolog's external database system supports these different types of applications, while meeting the requirement that some database systems must not lose data during update operations--even in the event of power failure.

Visual Prolog's external database predicates provide the following facilities:

efficient handling of very large amounts of data on disk

the ability to place the database in a file, in memory, or in EMS-type expanded memory cards under DOS

multi-user access

greater data-handling flexibility than provided by the sequential nature of Visual Prolog's automatic backtracking mechanism

the ability to save and load external databases in binary form

1. An Overview: What's in an External Database?

A Visual Prolog external database consists of two components: the data items--actually Prolog terms--stored in chains, and corresponding B+ trees, which you can use to access the data items very quickly.

The external database stores data items in chains (rather than individually) so that related items stay together. For example, one chain might contain part numbers to a stock list, while another might contain customer names. Simple database operations, such as adding new items or replacing and deleting old items, do not require B+ trees. These come into play when you want to sort data items or search the database for a given item; they are covered in detail later in this chapter.

(1) Naming Convention

The names of all the standard predicates concerned with database management follow a certain convention.

The first part of the name (db_, chain_, term_, and so on) is a reminder of what you must specify as input.

The second part of the name (flush, btrees, delete, and so on) is a reminder of what action occurs or what is returned or affected.

For example, db_delete deletes a whole database, chain_delete deletes a whole chain, and term_delete deletes a single term.

Figure 14.1: Structure of a Visual Prolog External Database

(2) External Database Selectors

It is possible to have several external databases simultaneously in memory, on disk, and in an EMS-type memory expansion card under DOS. With this flexibility, you can place external databases where they give the best speed and space compromise.

In order to distinguish between several open databases, you use a selector in every call to an external database standard predicate. You must declare these selectors in a domain called db_selector. This works like the file domain in the file system. For example, the following domains, declarations, external databases domain declaration declares customers and parts to be external database selectors:

    DOMAINS
       db_selector = customers; parts

2. Chains

An external database is a collection of Prolog terms. Some examples of terms are integers, reals, strings, symbol values, and compound objects; for instance, 32, -194, 3.1417, "Wally", wages, and book(dickens, "Wally goes to the zoo").

Inside an external database, the terms are stored in chains. A chain can contain any number of terms, and an external database can contain any number of chains. Each chain is selected by a name, which is simply a string.

The following figure illustrates the structure of a chain called MY_CHAIN.

Figure 14.2: Structure of a Chain

Database relations and database tables are modeled by chains of terms. For example, suppose you have a customer, supplier, and parts database, and you want to put all the data into a single database with three relations: one for customers, one for suppliers, and one for parts. You do this by putting the customers in one chain called customers, the suppliers in another chain called suppliers, and the parts in a chain called parts.

To insert a term in an external database, you must insert the term into a named chain. On the other hand, you can retrieve terms without explicitly naming the containing chain. In both cases, you must specify the domain to which the term belongs. In practice, it is best if all terms in the chain belong to the same domain, but there is actually no restriction on how terms are mixed in a database. It's up to you to ensure that a term you retrieve belongs to the same domain as it did when you inserted it.

The following is a simple example of setting up two chained databases, dba1 and dba2. In this example, all the customer data is in dba1 and all the parts data in dba2. For now, just look over this example; we need to introduce a lot more information before we can explain what's happening here.

/* Program ch14e01.pro */

 

PREDICATES

    access

 

CLAUSES

    access:-

        chain_terms(dba1,chain1,customers,customer(Name, ADDR),_),

        chain_terms(dba2,chain2,parts,part(Part, Id, Name),_),

        write("send ",Part," part num ",Id," to ",Addr), nl,

        fail.

    access.

 

GOAL

    % create the databases dba1 and dba2

    db_create(dba1, "dd1", in_memory),

    db_create(dba2, "dd1.bin", in_file),

 

    % insert customer facts into chain1 in dba1

    chain_insertz(dba1, chain1, customers,

    customer("Joe Fraser","123 West Side"), _),

    chain_insertz(dba1, chain1, customers,

    customer("John Smith","31 East Side"), _),

    chain_insertz(dba1, chain1, customers,

    customer("Diver Dan","1 Water Way"), _),

    chain_insertz(dba1, chain1, customers,

    customer("Dave Devine","123 Heaven Street"), _),

 

    % insert parts facts into chain2 in dba2

    chain_insertz(dba2, chain2, parts, part("wrench", 231,
                       "John Smith"), _),

    chain_insertz(dba2, chain2, parts, part("knife", 331,

                        "Diver Dan"), _),

    access,

    db_close(dba1), db_close(dba2),

    db_delete("dd1", in_memory),

    db_delete("dd1.bin", in_file).

This program first creates the databases dba1 (in memory) and dba2 (in a disk file). It then inserts facts into two chains: chain1 and chain2. After inserting the facts, it looks in these chains for a customer and the part ordered by that customer; finding these, it returns the address to which the shipper should ship the part. Finally, it closes and deletes the two databases.

3. External Database Domains

The external database uses six standard domains, summarized here:

Domain

What It's Used For

db_selector

Domain for declaring database selectors

bt_selector

Domain for declaring B+ tree selectors

place

Location of the database: in RAM, in a file, or in an extended memory system (EMS card under DOS)

accessmode

Decides how the file will be used.

denymode

Determines how other users can open the file.

ref

Reference to the location of a term in a chain

(1) Database Reference Numbers

Every time you insert a new term into an external database, Visual Prolog assigns it a database reference number. You can use the term's database reference number to retrieve, remove, or replace that term, or to get the next or previous term in the chain. You can also insert a database reference number in a B+ tree (as described later in this chapter), and then use the B+ tree to sort some terms or to carry out a fast search for a term.

Database reference numbers are independent of the database location and any possible packing operations. Once a reference number has been associated with a term, you can use that number to access that term--no matter which database management operations are subsequently carried out--until the term is deleted.

The ref Domain

Database reference numbers are special because you can insert them in facts sections and write them out with write or writef, but you can't type them in from the keyboard. You must declare the arguments to predicates handling database reference numbers as belonging to the standard domain ref.

When you delete a term with term_delete, the system will reuse that term's reference number when it inserts the next term into the external database. This happens automatically; however, if reference numbers have been stored in the facts section or in a B+ tree for some reason, it is your responsibility to ensure that a given reference is associated with the correct term.

To assist you in this, there is an error-checking option, enabled with the db_reuserefs standard predicate:

(2) db_reuserefs/2

db_reuserefs has the following form:

    db_reuserefs(DBase,ReUse)                              /* (i,i)*/

where DBase is a db_selector and ReUse is an unsigned integer. This should be set to 0 to enable checking for use of released terms, or 1 do disable this. The overhead of having the check enabled is very small (4 bytes per term, virtually no CPU overhead), but those 4 bytes will never be released. If you constantly create and release terms, your database will therefore grow at a steady rate. db_reuserefs's primary purpose is to assist you in tracking down bugs during development of programs.

4. Manipulating Whole External Databases

When you create a new external database, or open an existing one, you can place it in a file, in memory, or in EMS-type expanded memory under DOS, depending on the value of the Place argument in your call to db_create or db_open. After you've finished working with the external database, you close it with a call to db_close.

When you place an external database in main or expanded memory, closing the database with db_close does not delete the database from memory. You must do this explicitly with a call to db_delete, to free the memory the database occupies. If you close such an external database but don't delete it, you can later reopen it with the db_open predicate.

Since the external database system relies on the DOS buffer system, it will be very slow if no buffers have been allocated. To allocate 40 buffers (which isn't an excessive number), include the following line in your CONFIG.SYS file (a part of the DOS environment):

    buffers = 40

In this section, we discuss the predicates db_create, db_open, db_copy, db_loadems, db_saveems, db_close, db_delete, db_openinvalid, db_flush, db_garbagecollect, db_btrees, db_chains, and db_statistics.

(1) db_create/3

db_create creates a new database.

    db_create(Dbase, Name, Place)                       /* (i,i,i) */

If the database is placed in a disk file, the name of the file will be Name; if it's placed in memory or EMS under DOS, you'll need Name if you close the database and want to open it later. Dbase and Name correspond to the internal and external names for files.

Where you place an external database is determined by the Place argument. Place can take one of the following values:

in_file

The external database is placed in a disk file, and there will be only a minimum of main memory overhead.

in_memory

The external database is placed in the main memory--usually this will be done to achieve maximum performance.

in_ems

The database is placed in EMS-type expanded memory, if a suitable card is installed in the computer. in_ems is only relevant for DOS. On other platforms it has the same effect as in_memory

These values, in_file, in_memory, and in_ems, are elements of the pre-declared domain place, which corresponds to the following declaration:

    DOMAINS
       place = in_file; in_memory; in_ems

For example, here are two different calls to db_create:

    db_create(db_sel1,"MYFILE.DBA",in_file)
                                   /* Creates disk file MYFILE.DBA */

    db_create(db_sel2,"SymName2",in_memory)
                               /* Creates memory database SymName2 */

(2) db_open/3

db_open opens a previously created database, identified by Name and Place.

    db_open(Dbase, Name, Place)                         /* (i,i,i) */

If Place is in_memory or in_ems, Name is the database's symbolic file name; if Place is in_file, Name is the actual DOS-style file name.

(3) db_copy/3

Irrespective of where you initially place an external database, you can later move it to another location with the db_copy predicate.

    db_copy(Dbase, Name, Place)                        /* (i,i,i) */

For example, in this call to db_copy

    db_copy(my_base, "new_EMSbase", in_ems)

Visual Prolog copies the database identified by the database selector my_base into the new database file new_EMSbase, which is placed in EMS under DOS.

When you copy a database, the original still exists; you will have two copies until you explicitly delete the original.

Once you've moved a database, all processing can continue as if nothing happened, since all reference numbers to the external database terms will still be valid. In this way, if you're maintaining an external database in main memory, and free storage is running short, you can copy the database to a file and continue execution with the database in the file. An index set up to the external database in internal memory is still valid, even after you've copied the database to a file.

db_copy has several uses; you can use it to do the following:

Load a database from disk to memory and later save it again in binary form, instead of using save and consult with text files.

Copy a medium-sized database from disk to memory for faster access.

Pack a database containing too much free space; when the database is copied to another file, all free space will be eliminated.

(4) db_loadems/2 and db_saveems/2

db_copy performs a full-scale record-by-record copy of the database in question. This has the advantage that the resulting database will be compacted and without unused space, but for large databases the process can be time consuming.

For DOS only, db_loadems and db_saveems will transfer complete images of databases between disk and EMS:

    db_loadems(FileName,EmsName                           /* (i,i) */
    db_saveems(EmsName,FileName)
                        /* (i,i) */

The only restriction on their use is that there can be no more than one database in EMS.

(5) db_openinvalid/3

db_openinvalid allows you to open a database that's been flagged as invalid.

    db_openinvalid(Dbase, Name, Place)                  /* (i,i,i) */

If the power to the computer fails while a database is being updated, all the data in the database may be lost because part of some buffer has not been written to disk. A flag in the database indicates if it's in an invalid state after an update.

A database is recorded as being invalid after a call to any of the predicates that change the content in the database. These include chain_inserta, chain_insertz, chain_insertafter, term_replace, term_delete, chain_delete, bt_create, key_insert, and key_delete. The database is recorded as being valid once again when it is closed with db_close, or when db_flush is called to flush out the buffers.

By using db_openinvalid, it is sometimes possible to continue execution when a database is marked as invalid. This might make it possible to recover some data if all your backups have disappeared. However, all attempts to use an invalid database after opening it with db_openinvalid might yield unexpected results.

(6) db_flush/1

db_flush flushes the buffers and writes their contents to the appropriate destination in your database.

    db_flush(Dbase)                                         /* (i) */

When a database is updated it will be marked as invalid, and it remains flagged as invalid until it is either flushed with db_flush, or closed.

The level of security you employ for a given database will, of course, depend on how important its data is. The most basic level of data security is to keep backups on disk. At the intermediate level, you could call db_flush after each important database update. However, flushing the buffers is a relatively slow operation; if it's done too often, your database system will grind to a halt. Finally, if the contents of an external database are especially valuable, you could record all changes in a special log file or maintain two identical databases--perhaps on different disks.

(7) db_close/1

A call to db_close closes an open database.

    db_close(Dbase)                                        /* (i) */

If the database Dbase is placed in a disk file, the file will be closed. The database won't be deleted, even if it is placed in memory or in an EMS-type memory expansion card, and you can reopen it later through a call to db_open. You can use db_delete to remove a closed database from memory.

(8) db_delete/1

When the database is situated in memory or in an EMS-type memory expansion card, db_delete releases all the occupied space.

    db_delete(Name, Place)                                /* (i,i) */

When the database is situated in a file, db_delete erases the file. db_delete gives an error if the database Name does not exist in the given Place.

(9) db_garbagecollect/1

db_garbagecollect scans through the free lists in the database garbage collect and tries to merge some of the free space together into larger pieces.

    db_garbagecollect(Dbase)                                /* (i) */

This scanning and merging is done automatically when the database is placed in memory or in an EMS card.

Under normal circumstances, there should be no need to call this predicate. However, if there seems to be too much free space in the database that is not being reused when new terms are inserted, db_garbagecollect can regain some extra space.

(10) db_btrees/2

During backtracking, db_btrees successively binds BtreeName to the name of each B+ tree in the Dbase database.

    nondeterm db_btrees(Dbase, BtreeName)                 /* (i,o) */

The names are returned in sorted order. B+ trees are described later in this chapter.

(11) db_chains/2

During backtracking, db_chains successively binds ChainName to the name of each chain in the Dbase database.

    nondeterm db_chains(Dbase, ChainName)                /* (i,o) */

The names are returned in sorted order.

(12) db_statistics/5

db_statistics returns statistical information for the database Dbase.

    db_statistics(Dbase, NoOfTerms, MemSize, DbaSize, FreeSize)
                                    
                                                    /* (i,o,o,o,o) */

The arguments to db_statistics represent the following:

NoOfTerms is bound to the total number of terms in the database.

MemSize

is bound to the size--in bytes--of the internal tables stored in memory for the database.

DbaSize

is bound to the total number of bytes that the terms and descriptors in the Dbase database occupy. If Dbase is stored in a disk file, and DbaSize gets a value much smaller than the size of that file, the file can be compressed by using db_copy.

FreeSize

becomes bound to a value representing the free memory space; the value depends on where the database Dbase is currently placed.

When Dbase is placed in memory, FreeSize is bound to the number of unused bytes between the top of the global stack and the top of the heap. (Note: There might be some additional free bytes that are not included in this count.)

If Dbase is placed in EMS-type expanded memory, FreeSize is bound to the number of unoccupied bytes in that expansion memory.

When Dbase is placed in a file, FreeSize is bound to the number of unused bytes on the disk containing the file.

5. Manipulating Chains

To insert terms into an external database chain, you use the predicates chain_inserta, chain_insertz, or chain_insertafter. You can successively bind the terms in a chain, and their reference numbers, to the arguments of chain_terms, while chain_delete allows you to delete a whole chain of terms from the external database.

Four standard predicates return database reference numbers. These are chain_first, chain_last, chain_next, and chain_prev.

(1) chain_inserta/5 and chain_insertz/5

The predicates chain_inserta and chain_insertz correspond to asserta and assertz, respectively. These take the following form:

    chain_inserta(Dbase, Chain, Domain, Term, Ref) /* (i,i,i,i,o) */
    chain_insertz(Dbase, Chain, Domain, Term, Ref)
/* (i,i,i,i,o) */

chain_inserta inserts the term Term at the beginning of the chain Chain, while chain_insertz inserts Term at the chain's end. Dbase is the db_selector of the database, Domain is the domain of Term, and Ref is the database reference number corresponding to Term. For example, if my_dba is declared to be in the domain db_selector, like this:

    DOMAINS
       db_selector = my_dba; ....

then in this call to chain_inserta

    chain_inserta(my_dba, customer, person, p(john,
                                                                 "1 The Avenue", 32), NewRef)

customer is the name of the chain, and all customers are stored in one chain. It would be perfectly all right to store the suppliers as terms from the domain person but in a different chain, perhaps called supplier. person is the name of the domain to which p(john, "1 The Avenue", 32) belongs, as shown in this domain declaration:

    DOMAINS
       person = p(name, address, age)

If Chain doesn't already exist, these predicates will automatically create it.

(2) chain_insertafter/5

chain_insertafter inserts a term after a specified term, returning the inserted term's new reference number. It takes this format:

    chain_insertafter(Dbase, Domain, Ref, Term, NewRef)
                                                                                    /* (i,i,i,i,o) */

chain_insertafter inserts the term Term after the chain element specified by Ref, while NewRef is bound to the database reference number corresponding to Term after it's been inserted.

(3) chain_terms/5

During backtracking, chain_terms successively binds Term and Ref to each term and its associated database reference number in the specified Chain. chain_terms takes the form:

    chain_terms(Dbase, Chain, Domain, Term, Ref)    /* (i,i,i,o,o) */

(4) chain_delete/2

chain_delete deletes a specified chain from a given external database; this predicate takes the form:

    chain_delete(Dbase, Chain)                            /* (i,i) */

(5) chain_first/3 and chain_last/3

chain_first and chain_last return the database reference number for the first and last terms in a given chain, respectively.

    chain_first(Dbase, Chain, FirstRef)                 /* (i,i,o) */
    chain_last(Dbase, Chain, LastRef)
                 /* (i,i,o) */

(6) chain_next/3 and chain_prev/3

chain_next returns the reference number of the term following the given one, while chain_prev returns the reference number of the term preceding the given one.

    chain_next(Dbase, Ref, NextRef)                     /* (i,i,o) */
    chain_prev(Dbase, Ref, PrevRef)
                     /* (i,i,o) */

6. Manipulating Terms

Three standard predicates for external database management are all concerned with terms; these are term_replace, term_delete, and ref_term. Whenever you call any of the term-handling external database standard predicates, you must give the domain of the term as one of the arguments. Because of this, it's usually a good idea to declare all terms in a given database as alternatives in one domain, as in this declaration:

    DOMAINS
       terms_for_my_stock_control_database =
            customer(Customer, Name, ZipCode, Address);
            supplier(SupplierNo, Name, Address);
            parts(PartNo, Description, Price, SupplierNo)

Note that there are no restrictions on mixing types (domains) in an external database. One chain can contain text strings, another integers, a third some kind of compound structures, and so on. However, external database data items are not stored with type descriptors; for example, integers don't necessarily occupy just two bytes. It's your responsibility to retrieve a term into the same domain as that from which it was inserted. A run-time error will usually result if you attempt to mix domains.

(1) term_replace/4

term_replace replaces an old term (referenced by Ref, a database reference number) with a new term, Term.

    term_replace(Dbase, Domain, Ref, Term)            /* (i,i,i,i) */

(2) term_delete/3

term_delete erases the term stored under Ref, a given database reference number.

    term_delete(Dbase, Chain, Ref)                     /* (i,i,i) */

The storage occupied by the term will be released, and there must be no further references to Ref.

(3) ref_term/4

ref_term binds Term to the term stored under a given reference number, Ref.

    ref_term(Dbase, Domain, Ref, Term)                /* (i,i,i,o) */

7. A Complete Program Example

The following example program 2, uses nearly all the external database predicates introduced so far. Working first in memory, this program goes through the following sequence of operations:

This program then copies the database to a disk file and carries out the same sequence of activities twice with the database held on disk. Finally, it calculates--in hundredths of a second--the total time taken to carry out these activities. Note, however, that for illustration the program generates large amounts of output, which slows it down considerably. The true speed is only revealed if you remove the output statements. The program 3 is for UNIX, as time-calculation is done differently in UNIX, and terminal output is significantly slower in UNIX than in DOS.

Run the program to see what happens, and then try to alter the number of terms and study your system's performance. The DOS program appears below.

/* Program ch14e02.pro */

DOMAINS

    my_dom = f(string)

    db_selector = my_dba

 

PREDICATES

    write_dba(integer)

    read_dba

    rd(Ref)

    count_dba(integer)

    count(Ref, integer, integer)

    replace_dba

    replace(Ref)

    double_dba

    double(Ref)

    half_dba

    half(Ref)

    mixture

 

CLAUSES

    write_dba(0):-!.

    write_dba(N):-

        chain_inserta(my_dba,my_chain,my_dom,f("Prolog system"),_),

        chain_insertz(my_dba, my_chain, my_dom, f("Prolog Compiler"), _),

        N1=N-1,

        write_dba(N1).

 

    read_dba:-

        db_chains(my_dba, Chain),

        chain_terms(my_dba, Chain, my_dom, Term, Ref),nl,

        write("Ref=", Ref),

        write(", Term=", Term),

        fail.

    read_dba:-

        db_chains(my_dba, Chain),

        chain_first(my_dba, Chain, Ref),

        rd(Ref),

        fail.

        read_dba.

 

    rd(Ref):-

        ref_term(my_dba, my_dom, Ref, Term), nl,

        write(Term),

        fail.

    rd(Ref):-

        chain_next(my_dba,Ref,Next),!,rd(Next).

    rd(_).

 

    replace_dba:-

        chain_first(my_dba, my_chain, Ref),

        replace(Ref).

 

    replace(Ref):-

        term_replace(my_dba, my_dom, Ref, f("Prolog Toolbox")),

        chain_next(my_dba, Ref, NN),

        chain_next(my_dba, NN, Next),!,

        replace(Next).

        replace(_).

 

    half_dba:-

        chain_last(my_dba, my_chain, Ref),

        half(Ref).

 

    half(Ref):-

        chain_prev(my_dba, Ref, PP),

        chain_prev(my_dba, PP, Prev), !,

        term_delete(my_dba, my_chain, Ref),

        half(Prev).

    half(_).

 

    double_dba:-

        chain_first(my_dba, my_chain, Ref),

        double(Ref).

 

    double(Ref):-

        chain_next(my_dba, Ref, Next),!,

        chain_insertafter(my_dba, my_chain, my_dom, Ref,f("Programmers Guide"), _),

        double(Next).

        double(_).

 

    count_dba(N):-

        chain_first(my_dba, my_chain, Ref),

        count(Ref, 1, N).

 

    count(Ref, N, N2):-

        chain_next(my_dba, Ref, Next),!,

        N1=N+1,

        count(Next, N1, N2).

    count(_, N, N).

 

    mixture :-nl,

        write("Replace every second term:"),

        replace_dba,nl,

        write("Double the number of terms:"),

        double_dba,nl,

        write("Erase every second term:"),

        half_dba,nl,

        write("Use ref_term for all terms:"),

        read_dba,

        count_dba(N),nl,

        write("There are now ", N, " terms in the database"),

        db_statistics(my_dba, NoOfTerms, MemSize, DbaSize, FreSize),nl,

  writef("NoOfTerms=%, MemSize=%, DbaSize=%, FreeSize=%", NoOfTerms, MemSize,DbaSize,FreSize).

 

GOAL

    nl,nl,nl,

    write("¡¬tTEST OF DATABASE SYSTEM¡¬n¡¬t***********************¡¬n¡¬n"),

    time(H1, M1, S1, D1),

    db_create(my_dba, "dd.dat", in_memory),nl,nl,

    write("Write some terms in the database:"),

    write_dba(50),

    read_dba,

    mixture,nl,nl,

 

    write("Copy to file"),

    db_copy(my_dba, "dd.dat", in_file),

    db_close(my_dba),  db_delete("dd.dat", in_memory),

    db_open(my_dba, "dd.dat", in_file),

    mixture,

    db_close(my_dba),nl,nl,nl,

 

    write("Open the database on file"),

    db_open(my_dba, "dd.dat", in_file),

    mixture,

    db_close(my_dba),

 

    time(H2, M2, S2, D2),

    Time = (D2-D1)+100.0*((S2-S1)+60.0*((M2-M1)+ 60.0*(H2-H1))),nl,nl,

    write("Time = ", Time, "/100 Sec" ), nl.