Abstract
We present a visual and interactive taxonomic Artificial Intelligence (AI) tool, the Automated Taxonomic Concept Reasoner (ATCR), whose graphical web interface is under development and will also become available via an Application Programming Interface (API). The tool employs automated reasoning (Beeson 2014) to align multiple taxonomies visually, in a web browser, using user or expert-provided taxonomic articulations, i.e. "Region Connection Calculus (RCC-5) relationships between taxonomic concepts, provided in a specific logical language (Fig. 1). It does this by representing the problem of taxonomic alignment under these constraints in terms of logical inference, while performing these inferences computationally and leveraging the powerful Microsoft Z3 Satisfiability Modulo Theory (SMT) solver (de Moura and Bjørner 2008). This tool represents further development of utilities for the taxonomic concept approach, which fundamentally addresses the challenge of robust biodiversity data aggregation in light of multiple conflicting sources (and source classifications) from which primary biodiversity data almost invariably originate. The approach has proven superior to aggregation, based just on the syntax and semantics provided by the Darwin Core standard Franz and Sterner 2018).
Fig. 1 provides an artificial example of such an alignment. Two taxonomies, A and B, are shown. There are five taxonomic concepts, A.One, A.Two, A.Three, B.One and B.Two. A.Two and A.Three are sub-concepts (children) of A.One, and B.Two is a sub-concept (child) of B.One. These are represented by the direction of the grey arrows. The undirected mustard-coloured lines represent relationships, i.e., the articulations referred to in the previous paragraph. These may be of five kinds: congruent (==), includes (<) and included in (>), overlap (><), and disjointness. These five relationships are known in the AI literature as the Region Connection Calculus-5 (RCC-5) (Randell et al. 1992, Bennett 1994, Bennett 1994), and taken exclusively and in conjunction with each other, have certain desirable properties with respect to the representation of spatial relationships. The provided relationship (i.e. the articulation) may also be an arbitrary disjunction of these five fundamental kinds, thus allowing for representation of some degree of logical uncertainty. Then, and under three assumptions that:
"sibling" concepts are disjoint in their instances,
all instances of a parent concept are instances of at least one of its child concepts, and
every concept has at least one instance - the SMT-based automated reasoner is able to deduce the relationships represented by the undirected green lines. It is also able to deduce disjunctive relationships where these are logically implied.
"sibling" concepts are disjoint in their instances,
all instances of a parent concept are instances of at least one of its child concepts, and
every concept has at least one instance - the SMT-based automated reasoner is able to deduce the relationships represented by the undirected green lines. It is also able to deduce disjunctive relationships where these are logically implied.
ATCR is related to Euler/X (Franz et al. 2015), an existing tool for the same kinds of taxonomic alignment problems, which was used, for example, to obtain an alignment of two influential primate classifications (Franz et al. 2016). It differs from Euler/X in that it employs a different logical encoding that enables more efficient and more informative computational reasoning, and also in that it provides a graphical web interface, which Euler/X does not.