Today, a follow-up to Writing the Worst Datalog Ever in 26loc, maybe even the start of a series.🍿

Our 26-loc Datalog is naive. Nothing personal, it's a technical term: each iteration in `saturate`

rederives all the facts derived plus hopefully some new ones. The last iteration is guaranteed to be 100% redundant since by definition it's the one which derived nothing new!

Let's engineer a bad case (not that it's difficult given how purposefully unsophisticated our code is):

```
user=> (q
(set (for [i (range 50)] [:edge (inc i) i]))
'([i] [:edge+ 1 i])
'[([:edge+ i j] [:edge i j])
([:edge+ i j] [:edge+ i k] [:edge k j])])
#{[0]}
```

It runs in 10 seconds. Replace 50 by 100 and it runs in 4 minutes! 🐌

The solution to all these useless rederivations is well known and called **semi-naive evaluation**!

Don't forget: when we're not busy writing silly Datalog implementations, we are available to help you on your Clojure projects or working on ClojureDart or on our app Paktol (The positive spending tracker where money goes up!).

The idea behind semi-naive evaluation is to not keep rederiving from the same facts at each iteration. So the rule is to only consider facts which can be derived by using at least one fresh fact (derived during the previous iteration).

`saturate`

The first step is to split facts in two: fresh facts (`dfacts`

—the `d`

stands for diff or delta) and old news (`facts`

).

In the `saturate`

loop, we initialize `dfacts`

with the initial set of `facts`

because at the start of the computation everything is fresh. We keep looping while `dfacts'`

is not empty.

We will modify `match-rule`

to only return facts derived by using at least one fresh fact. However we'll still have to post-process its returned values with `(remove facts')`

just in case it accidentally rederives old facts.

```
(defn saturate [facts rules]
(loop [dfacts facts, facts #{}]
(let [facts' (into facts dfacts)
dfacts' (into #{} (comp (mapcat #(match-rule dfacts facts %)) (remove facts')) rules)]
(cond->> facts' (seq dfacts') (recur dfacts')))))
```

`match-rule`

Not much changes, we just pass an extra `dfacts`

argument through to `match-patterns`

.

```
(defn match-rule [dfacts facts [head & patterns]]
(for [env (second (match-patterns patterns dfacts facts))]
(into [] (map #(env % %)) head)))
```

Oh yes, that's true: we call `second`

on the value returned by `match-patterns`

and we explain why right below👇.

`match-patterns`

Here is usually where semi-naive gets gory in textbooks where a typical rule:

becomes a monster expression 🐲 such as:

👆And this is a simplified version since we lumped EDB and IDB together in the previous article.

It's tedious to implement and `match_patterns`

would end very different from its naive version.

However we can evolve the existing `match_patterns`

rather than replace it by looking at automatic differentiation with dual numbers for inspiration.

Automatic differentiation with dual numbers feels like magic: you compute the original function with special numbers and you get both the original result but also the value of the derivative and without knowing the expression of the derivative!

For example let's say you want to compute `x^2+x`

at 7, then you compute as usual but by writing `7+ε`

instead of 7 and simplifying using the magic property that `ε^2=0`

. (A non-null number whose square is zero isn't more bizarre than a number whose square is -1...)

Here we have computed both the value (56) of `x^2+x`

for `x=7`

but we also computed the value (15) of the derivative of `x^2+x`

(`2x+1`

) without knowing the formula of the derivative!

The monster expression 🐲 above is a kind of derivative and we'd like to compute it without implementing it.

In the same way `x`

is replaced by `x + ε`

in dual numbers we are going to replace `envs`

by `[envs denvs]`

where `envs`

are environments created using strictly matches over old facts and `denvs`

environments where at least one fresh fact was matched against.

The original `(for [fact facts env envs] ...)`

is thus going to be declined in 4 versions: `facts×envs`

, `facts×denvs`

, `dfacts×denvs`

and `dfacts×envs`

. Only the first one contributes to the `envs`

component; all the others by having a `d`

on `envs`

or `facts`

contribute to the `devs`

component.

Last, we have to be careful to not eagerly compute `envs`

since it's exactly what we want to avoid: rediriving the old facts. We do so by delaying its actual computation with `delay`

.

```
(defn match-patterns [patterns dfacts facts]
(reduce
(fn [[envs denvs] pattern]
[(-> #{} (into (for [fact facts env @envs] (match pattern fact env))) (disj nil) delay)
(-> #{}
(into (for [fact facts env denvs] (match pattern fact env)))
(into (for [fact dfacts env denvs] (match pattern fact env)))
(into (for [fact dfacts env @envs] (match pattern fact env)))
(disj nil))])
[(delay #{{}}) #{}] patterns))
```

which keeps the same structure as the original naive version:

```
(defn match-patterns [patterns facts]
(reduce
(fn [envs pattern]
(-> (for [fact facts env envs] (match pattern fact env))
set (disj nil)))
#{{}} patterns))
```

The `set`

has been replaced by `into #{}`

to make `envs`

and `denvs`

computations more similar but otherwise the `envs`

component of the semi-naive version is the original `envs`

computation in the naive version.

`q`

Not much to say except we pass an empty set as the `facts`

parameter to `match-rule`

.

```
(defn q [facts query rules]
(-> facts (saturate rules) (match-rule #{} query) set))
```

🥳 Here it is: semi-naive datalog in 30 lines of code!

Remember the bad case which took 10s for 50 and 4 minutes for 100?

```
user=> (q
(set (for [i (range 50)] [:edge (inc i) i]))
'([i] [:edge+ 1 i])
'[([:edge+ i j] [:edge i j])
([:edge+ i j] [:edge+ i k] [:edge k j])])
#{[0]}
```

Now it takes 350ms for 50 and 5s for 100! **It's respectively 30 and 50 times faster**: progress! 🎉

This datalog is maybe vaguely better but it's still as minimal/useless as in our previous article: we have made the engine more efficient but we don't have made the language more powerful.

See, if we try to compute Bart's siblings (reusing `edb`

and `rules`

from 26-loc Datalog):

```
user=> (q edb '([c] [:parent :bart p] [:parent c p]) rules)
#{[:lisa] [:maggie] [:bart]}
```

Bart is returned as his own sibling! This is because we can't constraint a variable to have a different value than another.

This is what we'll fix in the next installment of this Datalog series, we'll add constraints to be able to write:

```
([:sibling a b] [:parent a p] [:parent b p] (not= a b))
```

and have:

```
user=> (q edb '([c] [:sibling :bart c]) rules)
#{[:lisa] [:maggie]}
```

Share online if you'd like to see more open ended exploratory code (including keeping to grow this datalog)!

```
(defn match [pattern fact env]
(when (= (count pattern) (count fact))
(reduce (fn [env [p v]]
(let [p-or-v (env p p)]
(cond
(= p '_) env
(= p-or-v v) env
(symbol? p-or-v) (assoc env p v)
:else (reduced nil))))
env (map vector pattern fact))))
(defn match-patterns [patterns dfacts facts]
(reduce
(fn [[envs denvs] pattern]
[(-> #{} (into (for [fact facts env @envs] (match pattern fact env))) (disj nil) delay)
(-> #{}
(into (for [fact facts env denvs] (match pattern fact env)))
(into (for [fact dfacts env denvs] (match pattern fact env)))
(into (for [fact dfacts env @envs] (match pattern fact env)))
(disj nil))])
[(delay #{{}}) #{}] patterns))
(defn match-rule [dfacts facts [head & patterns]]
(for [env (second (match-patterns patterns dfacts facts))]
(into [] (map #(env % %)) head)))
(defn saturate [facts rules]
(loop [dfacts facts, facts #{}]
(let [facts' (into facts dfacts)
dfacts' (into #{} (comp (mapcat #(match-rule dfacts facts %)) (remove facts')) rules)]
(cond->> facts' (seq dfacts') (recur dfacts')))))
(defn q [facts query rules]
(-> facts (saturate rules) (match-rule #{} query) set))
```