Investigations in Logic, Language and Computation
Erik Aarts
Abstract:
%Nr: DS-1995-19
%Author: Erik Aarts
%Title: Investigations in Logic, Language and Computation
In this dissertation the time complexity of three problems is considered.
The time complexity of a problem is the relation between the time a computer
needs to solve a problem and the size of that problem. In this relation, we
leave out constant factors. Suppose we have an object and an unsorted list,
and that we want to know whether the object occurs in the list or not. The
size of the problem is the length of the list. We call this length n. When
we walk through the list and compare each object in the list with the object
that we are looking for, the time needed is O(n) (order n) in the worst
case. When the relation between the computing time and the size of the
problem is a polynomial function (O(n k ) with k fixed), we say that the
problem is in P (Polynomial time).
Often a problem is hard to solve, but checking a solution is simple. Suppose
we have a big number which is the product of two big prime numbers and that
we are asked to find the two prime factors. A very simple algorithm is the
following: try all smaller numbers as a divisor of the number we have to
factor. If the number has n digits, we have to try O(10 n ) numbers as a
prime factor. Apparently, this is not polynomial time, but exponential. By
the way, there are much better algorithms, but these algorithms are not
polynomial time either. Checking a solution can be done in polynomial time,
however, if we have two numbers, we only have to multiply them in order to
see whether the product is indeed the number we had to factor. The class of
problems with the property that solutions can be checked easily, is called
NP (Nondeterministic Polynomial time). Problems in NP that are conjectured
to be not in P are called NP-complete. Factoring a number in prime numbers is
NP-complete.
The first part of this thesis describes the time complexity of problems in
two fields: categorial grammar and acyclic context-sensitive grammar.
Categorial grammars consist of a lexicon, in which types are assigned to
lexical elements, and a sequent calculus in which we can reason about
derivability of types. A type A is derivable from a sequence of types 130
Samenvatting to know whether it is in the categorial language or not. It is
not known whether this problem, when the standard Lambek calculus is used,
is in P or NP-complete. This dissertation shows that for certain fragments the
problem is in P. Those fragments are the grammars that use the non-associative
Lambek calculus and the second order calculus (with second order we mean
that the nesting depth is limited to two).
Acyclic context-sensitive grammars are rewrite grammars that generate tree
structures with crossing branches. We assign indices to context elements in
the rewrite rule. We look at the problem whether a certain string is in the
language generated. When we use normal context-sensitive grammars, this is a
very hard problem, harder than NP-complete problems. Therefore we use acyclic
grammars. We show that the problem, for these grammars, is in P if we vary
the sentences only, and keep the grammar fixed. If the grammar may change as
well, (and plays a role in the "size of the problem"), then the problem
becomes NP-complete.
The second part of this dissertation is about Prolog programs. Prolog is a
very simple, powerful programming language. The big difference with other
programming languages is that it is possible to code non-deterministic
programs. But computers are deterministic machines, so the Prolog program
must be converted to a deterministic program in some way. The standard way
to do this is depth-first with backtracking. If a choice must be made, the
first option is taken. When it turns out that this option does not lead to
any result, the next option is tried, and so on. In this dissertation we
consider an alternative search strategy (OLDT resolution). In this strategy,
one memoes which alternatives are tried and what the result was. This
strategy prevents that some subcomputation is done twice. In the standard
search, the search space can be represented as a tree. It is very well
possible that two nodes in the search space represent the same subproblem.
In OLDT resolution, all nodes represent different subproblems by definition;
the search space is not a tree anymore, but a graph. The idea that has been
elaborated in this dissertation is that the size of the search space is an
estimate for the time a Prolog program needs, because every branch in the
search space is visited only once. In the second part of this dissertation a
method is given to estimate the time a Prolog program needs under the
assumption that OLDT resolution is used in the execution of the program.
Finally we combine parts 1 and 2. We give Prolog programs, that compute
whether some string is in a categorial language (for both fragments). Then
we show that the time that those Prolog programs need, is polynomial in the
length of the sentence and in the length of the lexicon.