CHAPTER 13    String-Handling in Visual Prolog

 

Visual Prolog provides several standard predicates for powerful and efficient string manipulations. In this chapter, we've divided these into two groups: the family of basic string-handling predicates, and a set of predicates used for converting strings to other types and vice versa. Strings may also be compared to each other, but this is covered in chapter 9.

String Processing

A few formalities apply to strings and string processing, in that the backslash acts as an escape character, allowing you to put non-keyboardable characters into strings. Please see the description on page 55.

1. Basic String-Handling Predicates

The predicates described in this section are the backbone of string-handling in Visual Prolog; as such, they serve several purposes:

dividing a string into component strings or tokens

building a string from specified strings or tokens

verifying that a string is composed of specified strings or tokens

returning a string, token, or list of these from a given string

verifying or returning the length of a string

creating a blank string of specified length

verifying that a string is a valid Visual Prolog name

formatting a variable number of arguments into a string variable

(1) frontchar/3

frontchar operates as if it were defined by the equation

    String1 = the concatenation of Char and String2

It takes this format:

    frontchar(String1,Char,String2)
                    
                    /* (i,o,o) (i,i,o) (i,o,i) (i,i,i) (o,i,i) */

frontchar takes three arguments; the first is a string, the second is a char (the first character of the first string), and the third is the rest of the first string.

frontchar can be used to split a string up into a series of characters, or to create a string from a series of characters, and to test the characters within a string. If the argument String1 is bound to a zero-length string, the predicate fails.

Example

In Program 1, frontchar is used to define a predicate that changes a string to a list of characters. Try the goal

    string_chlist("ABC", Z)

This goal will return Z bound to ['A','B','C'].

/* Program ch13e01.pro */

 

PREDICATES

    string_chlist(string, charlist)

 

CLAUSES

    string_chlist("", []):-!.

    string_chlist(S, [H|T]):-

        frontchar(S,H,S1),

        string_chlist(S1,T).

(2) fronttoken/3

fronttoken performs three related functions, depending on the type of flow pattern you use when calling it.

    fronttoken(String1, Token, Rest)
                                        /* (i,o,o) (i,i,o) (i,o,i) (i,i,i) (o,i,i) */

In the (i,o,o) flow variant, fronttoken finds the first token of String1, binds it to Token, and binds the remainder of String1 to Rest. The (i,i,o), (i,o,i), and (i,i,i) flow variants are tests; if the bound arguments are actually bound to the corresponding parts of String1 (the first token, everything after the first token, or both, respectively), fronttoken succeeds; otherwise, it fails.

The last flow variant (o,i,i) constructs a string by concatenating Token and Rest, then binds String1 to the result.

A sequence of characters is grouped as one token when it constitutes one of the following:

a name according to normal Visual Prolog syntax

a number (a preceding sign is returned as a separate token)

a non-space character

fronttoken is perfectly suited for decomposing a string into lexical tokens.

Example

Program 2 illustrates how you can use fronttoken to divide a sentence into a list of names. If 2 is given the goal:

    string_namelist("bill fred tom dick harry", X).

X will be bound to:

    [bill, fred, tom, dick, harry]

/* Program ch13e02.pro */

 

PREDICATES

    string_namelist(string, namelist)

 

CLAUSES

    string_namelist(S,[H|T]):-

        fronttoken(S,H,S1),!,

        string_namelist(S1,T).

    string_namelist(_,[]).

(3) frontstr/4

frontstr splits String1 into two parts. It takes this format:

    frontstr(NumberOfChars, String1, StartStr, EndStr)
                                                                          /* (i,i,o,o) */

StartStr contains the first NumberOfChars characters in String1, and EndStr contains the rest. When frontstr is called, the first two parameters must be bound, and the last two must be free.

(4) concat/3

concat states that String3 is the string obtained by concatenating String1 and String2. It takes this format:

    concat(String1, String2, String3)
                
                             /* (i,i,o), (i,o,i), (o,i,i), (i,i,i) */

At least two of the parameters must be bound before you invoke concat, which means that concat always gives only one solution (in other words, it's deterministic). For example, the call

    concat("croco", "dile", In_a_while)

binds In_a_while to crocodile. In the same vein, if See_ya_later is bound, the call

    concat("alli", "gator", See_ya_later)

succeeds only if See_ya_later is bound to alligator.

(5) str_len/2

str_len can perform three tasks: It either returns or verifies the length of a string, or it returns a string of blank spaces of a given length. It takes this format:

    str_len(StringArg, Length)              /* (i,o), (i,i), (o,i) */

str_len binds Length to the length of StringArg or tests whether StringArg has the given Length. The Length is an unsigned integer. In the third flow version, str_len returns a string of spaces with a given length; this can be used to allocate buffers, etc. allocating buffers with str_len, but makebinary is preferable especially for binary data.

(6) isname/1

isname verifies that its argument is a valid name in accordance with Visual Prolog's syntax; it takes this format:

    isname(String)                                          /* (i) */

A name is a letter of the alphabet or an underscore character, followed by any number of letters, digits, and underscore characters. Preceding and succeeding spaces are ignored.

(7) format/*

format performs the same formatting as writef (see page 159), but format delivers the result in a string variable.

    format(OutputString,FormatString,Arg1,Arg2,Arg3,...,ArgN)
                                                                                /* (o,i,i,i,...,i) */

(8) subchar/3

subchar returns the character at a given position in a string; it takes the form:

    subchar(String,Position,Char)                       /* (i,i,o) */

The first character has position 1. For example,

    subchar("ABC",2,Char)

will bind Char to B. If the position specifies a character beyond the end of the string, subchar exits with an error.

(9) substring/4

substring returns a part of another string; it takes the form:

    substring(Str_in,Pos,Len,Str_out)                 /* (i,i,i,o) */

Str_out will be bound to a copy of the string starting with the Pos¡¯th character, Len characters long, in Str_in. For example

    substring("GOLORP",2,3,SubStr)]

binds SubStr to OLO. If Pos and Len specify a string partly or wholly outside of Str_in, substring exits with an error. However, it is not an error to ask for 0 bytes at the extreme end of the string:

    substring("ABC",4,0,SubStr)]

will bind SubStr to an empty string (""), while

    substring("ABC",4,1,SubStr)/* WRONG */]

is an error. By the way, so is

    substring("ABC",5,-1,SubStr)/* WRONG */]

(10) searchchar/3

searchchar returns the position of the first occurrence of a specified character in a string; it takes the form:

    searchchar(String,Char,Position)                   /* (i,i,o) */]

For example,

    searchchar("ABEKAT",'A',Pos)]

will bind Pos to 1. If the character isn't found, searchchar will fail. Note that searchchar is not re-satisfiable (i.e. if there are more occurrences of the specified character in the string, backtracking won't find them), but you can easily make your own:

/* Program ch13e03.pro */

 

CLAUSES

    nd_searchchar(Str,Ch,Pos):-

        nd_searchchar1(Str,Ch,Pos,0).

 

    nd_searchchar1(Str,Ch,Pos,Old):-

        searchchar(Str,Ch,Pos1),

        nd_sc(Str,Ch,Pos,Pos1,Old).

 

    nd_sc(_,_,Pos,Pos1,Old):- Pos = Pos1+Old.

    nd_sc(Str,Ch,Pos,Pos1,Old):-

        frontstr(Pos1,Str,_,Rest),

        Old1 = Old + Pos1,

        nd_searchchar1(Rest,Ch,Pos,Old1).

 

GOAL

    nd_searchchar("abbalblablabbala",'a',P),

    write(P,'¡¬n'),

    fail.

This implements a non-deterministic predicate (nd_searchchar) which is plug-compatible with searchchar; if you don't mind typing the extra argument (Old) to nd_searchchar1 yourself, you can of course discard a level of calls.

(11) searchstring/3

searchstring returns the position of the first occurrence of a string in another string; it takes the form:

    searchstring(SourceStr,SearchStr,Pos)              /* (i,i,o) */]

For example,

    searchstring("ABEKAT","BE",Pos)]

will bind Pos to 2. If the search string isn't found in, or is longer than, the source string, searchstring will fail. As with searchchar, searchstring isn't re-satisfiable, but you can easily make your own. As a matter of fact, all that's necessary is to take 3 and do a global substitution with 'string' replacing 'char', and change the 'a' in the goal to a suitable search string, e.g. "ab":

    GOAL
       nd_searchstring("abbalblablabbala","ab",P),
       write(P,'¡¬n'),
       fail.