Logic Engineering. The Case of Description and Hybrid Logics Carlos Areces Abstract: As the title indicates, there are two levels involved in the research carried out in this thesis: the general issue of understanding (and promoting) Logic Engineering, together with a detailed study of its particular instantiation for Description and Hybrid Languages. For some years now, a trend has been developing in the field of computational logic: given the wide diversity of applications the field has advanced into (theorem proving, software and hardware verification, computational linguistics, knowledge representation, etc.), a multiplicity of formal languages has been developed, offering a wealth of alternatives to classical languages. With the advantages of the diversity of choice, comes its complexity. How do we decide what the best formalism is for a given reasoning or modeling task? Or even more, what are the important rules to take into account when designing yet another formal language? How do we compare, how do we measure, how do we test? These are the questions that the young field of Logic Engineering is supposed to investigate and, if possible, answer. What we know about Logic Engineering is still not a lot, and as yet there are no general answers to these questions. Don't expect to find a list of ``recipes'' of how things should be done here. But much can be learned from analyzing in detail a particularly interesting case. This will be the main thrust of the work carried out in the thesis. Description logics are a family of formal languages used for structured knowledge representation. They have been designed as a tool to describe information in terms of concepts and their interrelation (definitions), together with means to specify that certain elements of the domain actually fit such definitions (assertions). In addition, they provide a formal notion of inference in terms of this structured knowledge. Description logics constitute the best example we are aware of, of a broad, homogeneous collection of formal languages with a clearly specified semantics (in terms of first-order models) devised to deal with particular applications. They offer an assortment of specialized inference mechanisms to handle tasks like knowledge classification, structuring, etc. The complexity of reasoning in the different languages of this family has been widely investigated, theorem provers effectively deciding some of the most expressive languages have been implemented (and they are among the fastest provers for non-classical languages available), and these languages have been successfully applied in many realistic problems, even at an industrial level. Connections between description languages and modal logics have been investigated, but a unifying logical background theory explaining their expressive power and logical characteristics was largely missing. This is the role to be played by hybrid logics. Hybrid languages are modal languages extended with the ability to explicitly refer to elements in the domain of a model. They were first introduced in the mid 1960s, in the field of temporal logic, and were subsequently developed mainly in a purely theoretical environment. The work in the field focused on investigating complete axiomatizations for these languages, characterizing their meta-logical properties and understanding their semantic and proof-theoretical behavior. Hybrid languages provide the exact kind of expressive power required to match description languages. Having been optimized for applications, description logics are difficult to handle with classical model- and proof-theoretical tools, but given the close match between description and hybrid logics we will be able to apply these techniques to the hybrid logic counterpart of description logics instead. Going in the other direction, description logics provide hybrid logics with extensively tested examples of useful languages, knowledge management lore, and implementations. In this thesis we will draw these two complementary fields together and investigate in detail what each of them has to offer to the other. Given that the two areas have developed different techniques and evolved in divergent directions, ``trading'' between them will be especially fruitful. Description logics can export reasoning methods, complexity results and application opportunities; while hybrid logics have their model-theoretical tools, axiomatizations and analyses of expressive power to offer. The particular aim of this thesis is, then, to explore and exploit the connections between description and hybrid logic, their similarities and differences. The main results we will present specifically concern this issue. But we hope to take the first steps in setting and discussing this work in the wider perspective of logic engineering, and provide a small contribution to the general issue of better understanding the rules behind the good design of new formal languages. The thesis is organized in four parts. In the first, containing Chapter 1, we discuss different ways of identifying interesting fragments (and fragments of extensions) of first-order logic. We argue that traditional methods, like prenex normal form and finite variable fragments, are not completely satisfactory. We propose, instead, to capture relevant fragments _via translations_. The semantics of many formal languages (including modal, description and hybrid languages) can be given in terms of classical logics, and as such they can be considered fragments of classical languages. But now, these fragments come together with an extremely simple presentation --- modal languages, for example, are usually introduced as extensions of propositional logic --- and with novel and powerful proof- and model-theoretical tools (simple tableaux systems, elegant axiomatizations, fine-grained notions of equivalence between models, new model-theoretical constructions, etc). Modal-like logics in general, and description and hybrid logics in particular, will be presented as examples of useful fragments identified in such a way. Part II introduces both description and hybrid logics (in Chapters 2 and 3 respectively) providing the necessary background and the basic notions which will be used in the rest of the thesis. The chapters can be read independently and serve as introductions to the kinds of methods and results which have been developed in these areas. They also provide a detailed guide to the literature. As we make clear in our presentation, description and hybrid logics are closely related, and their connections are spelled out in Chapter 4. We start by presenting already known embeddings of description languages into converse propositional dynamic logics, and discussing why they provide a less satisfactory match than the one obtained through hybrid languages. In particular, we highlight that two ingredients are needed for a successful embedding: the ability to refer to elements in the domain of a model, and the ability to make statements about the whole model from a local point. The first ingredient is needed to account for assertions, the second to account for definitions. Both are provided, in an elegant and direct way, by hybrid languages in the form of nominals, the satisfiability operator and the existential modality. We also clarify the relation between local and global notions of consequence, the first being the standard notion of consequence for hybrid (and in general modal) languages while the second is predominant in the description logic community. After providing two-way satisfaction preserving translations between description and hybrid logics, we explore the transfer of results. We show how the embedding into hybrid languages provides sharp upper and lower complexity bounds, separations in terms of expressive power and characterizations, and meta-logical properties like interpolation and Beth definability. Concerning interpolation and Beth definability, to the best of our knowledge this is the first time that such results have been investigated in connection with description languages. Many of these results are obtained from the general theorems we will prove in Part III. We also discuss how results from description logics can fill important gaps which have not yet received attention in the hybrid logic community. Some examples are the known complexity bounds concerning description logics with counting operators, or the PSpace results when certain syntactic restrictions are imposed on the existential modality. Part III of the thesis contains the core technical work. In Chapter 5 we show how ideas from description and hybrid logics can be put to work with benefit even when the subject is purely modal. In particular, aided by the notions of nominal/individual, we define well behaved direct resolution methods for modal languages. This example shows how the additional flexibility provided by the ability to name states can be used to greatly simplify reasoning methods. We proceed to build over the basic resolution method and obtain extensions for description and hybrid languages. In Chapters 6 and 7 we take a hybrid logic perspective as we dive into model-theoretical issues. But we have already demonstrated in Chapter 4 how hybrid logic results shed their light on description languages. In Chapter 6 we turn to expressive power. We start by considering $\Hls(@,\downarrow)$, a very expressive hybrid language. The two main results concerning this language are Theorems~\ref{the:charac} and~\ref{general-arrow}. The first theorem provides a five fold characterization of the first-order formulas equivalent to the translation of a formula in $\Hls(@,\downarrow)$. In particular, it identifies this fragment as the set of formulas which are invariant for generated submodels. Theorem~\ref{general-arrow} shows that the arrow interpolation property not only holds in this language, but also for any system obtained from $\Hls(@,\downarrow)$ by the addition of pure axioms. In a more general perspective, the results in Chapter 6 show that $\Hls(@,\downarrow)$ is surprisingly well behaved in model-theoretical terms. As we discuss in this chapter, it can be characterized in many different and natural ways, it responds with ease to both modal and first-order techniques, and possess one of the strongest versions of the interpolation and Beth properties we are aware of for modal languages. For these reasons, $\Hls(@,\downarrow)$ can be used as a ``logical laboratory:'' what we learn from it using the plethora of techniques it offers, can provide us, in many cases, with intuitions on restrictions and extensions. We see this process in action throughout the chapter, as we are able to transfer certain results from $\Hls(@,\downarrow)$ to extensions and sublanguages. In Chapter 7 we discuss complexity. We start with an excursion into undecidability and we prove that a small fragment of $\Hls(\downarrow)$ already has an undecidable local satisfiability problem. This is a hint that only very severe restrictions on the $\downarrow$ binder will bring us back into decidability. We show in Theorem~\ref{the:hl-decnn} that if we restrict ourselves to sentences of $\Hls(\pmodop,\umodop,@,\downarrow)$, where $\downarrow$ appears non-nested, decidability is regained. In Chapter 4 we have already shown that even this restricted use of binding proves interesting from a description logic perspective. We then turn to weaker languages (without binders) which remain closer to standard description languages. In Theorem~\ref{the:b.k.pspace} we prove that the addition of nominals and the satisfiability operator to the basic modal language $\logic{K}$ does not modify its complexity, while it greatly increases its expressive power. Interestingly, the same is not true when we extend the basic temporal language $\logic{K}_t$: the addition of just one nominal increases the complexity of the local satisfiability problem to \exptime, when the class of all models is considered. But usually temporal languages are interpreted on models where the accessibility relation is forced to adopt a ``time-like'' structure, the two best known cases being strict linear orders (linear time) and transitive trees (branching time). We prove in Theorems~\ref{the:t.linear} and~\ref{the:t.trees.pspace} that over these classes of models, complexity is tamed and again coincides with the complexity of the basic temporal language. Part IV contains our conclusions and directions for further research. Here we highlight some of the lessons we have learned during the research presented in this thesis. As we said, we cannot hope yet for general answers concerning logic engineering, but we can proceed by analogy: the same questions we posed and answered for description and hybrid logics can be tested on other formal languages, and we have presented tools and methodologies (bisimulations, model construction and comparison games, translations, etc.) which are powerful and versatile enough to be useful in many diverse situations.