Can ChatGPT do formal logic? – Görkem Paçacı

All this talk about Artificial General Intelligence (AGI) made me wonder how far in is GPT towards formal logic? I should clarify: if it turns out it can do a lot of formal logic, I don’t think that is any kind of sign towards AGI. The two are conceptually related but not correlated. Humans (supposedly) have AGI, but they fail at formal logic often. A lot of computer algorithms have been invented for formal logic, and we wouldn’t consider them AGI.

What I am trying to find out is how much of formal logic did GPT extract and abstract out of language corpuses. I know the term ‘abstraction’ (or even generalisation) might tickle the statisticians among the readers the wrong way. So I should clarify this too, I don’t mean abstraction in the sense that how it works, but in the sense that how we observe the results. So even if GPT isn’t actually doing any generalisation, if it looks like that to me, I’ll call it generalisation here.

For us to be able to say GPTs can do formal logic, I’d like to see in minimum the following:

Modus ponens. If P implies Q, and we know P to be true, it can infer Q is true. For example, if we know all apples are fruits, and we know a particular object to be an apple, it should be able to infer that, that particular object is also a fruit.
Modus tollens. If P implies Q, and we know Q to be false, it can infer P is not true either. For example, if we know all apples are fruits, and we know a particular object to be NOT a fruit, it should be able to infer that, that particular object is not an apple either.
Logical connectives ∧ (and) and ∨ (or) in connection with modus ponens and modus tollens.

I’m not trying to do an exhaustive list, there’s various kinds of formal logic. Deciding which logic and what subset is a research question in and of itself. But the three above would definitely be a sign for Predicate Logic.

I tried the following using ChatGPT (GPT-4). For every question I started a new session.

Q: If x is an apple, is x a fruit?
A: Yes, if x is an apple, then x is a fruit. An apple is a type of fruit.

That was too easy, because it knows what an apple is. Let’s try arbitrary identifiers.

Q: All Jmuis are Gbuis. My jee is a Jmui, is my jee a Gbui?
A: Yes. Given the statements:
1. All Jmuis are Gbuis.
2. My jee is a Jmui.
It logically follows that your jee, being a Jmui, is also a Gbui.

Well, it’s so good it’s upsetting. Let’s pile on logical connectives:

Q: All Hbuis are Gbuis if they’re also Jbuis. My jee is a Hbui and a Jbui. Is my jee a Gbui?
A: Let’s break down the information provided step by step:

1. “All Hbuis are Gbuis if they’re also Jbuis.” This means that for something to be a Gbui, it has to be both a Hbui and a Jbui.

2. “My jee is a Hbui and a Jbui.” Given this, your jee satisfies both the conditions of being a Hbui and a Jbui.

Given the first statement, since your jee is both a Hbui and a Jbui, it means that your jee is a Gbui. So, yes, your jee is a Gbui.

Very impressive. How about negation?

Q: All Hbuis are Gbuis if they’re also Jbuis. My jee is a Hbui but not a Jbui. Is my jee a Gbui?
A: Let’s break it down step by step:

Statement 1: All Hbuis are Gbuis if they’re also Jbuis. This means that being a Hbui by itself doesn’t guarantee being a Gbui. For a Hbui to be a Gbui, it must also be a Jbui.

Statement 2: My jee is a Hbui but not a Jbui. From this, we understand that your jee is a Hbui, but it’s not a Jbui.

Given the conditions in Statement 1, for a Hbui to be a Gbui, it must also be a Jbui. But since your jee is not a Jbui, then based on the given information:

Your jee is not a Gbui.

Let’s make it more complicated by adding one more level of implication.

Q: All Hbuis are Gbuis if they’re also Jbuis. All Mbuis are Hbuis. My jee is a Mbui, and a Jbui. Is my jee a Gbui?
A: Let’s break this down step by step:

1. “All Hbuis are Gbuis if they’re also Jbuis.” This means that for something to be a Gbui, it needs to be both a Hbui and a Jbui.
2. “All Mbuis are Hbuis.” This means that if something is an Mbui, then it’s also a Hbui.
3. “My jee is a Mbui, and a Jbui.” From this, we can deduce that your jee is a Hbui (because all Mbuis are Hbuis) and it’s also a Jbui.

Given the first statement, if something is both a Hbui and a Jbui, then it’s a Gbui.

So, since your jee is both a Hbui and a Jbui, then it is a Gbui.

Answer: Yes, your jee is a Gbui.

Again, too good. Let’s try modus tollens.

Q: All Hbuis are Gbuis if they’re also Jbuis. My jee is not a Gbui, but it is a Hbui. Is my jee a Jbui?
A: Let’s break this down step by step.

1. All Hbuis are Gbuis if they’re also Jbuis. This implies that if something is both a Hbui and a Jbui, then it is a Gbui.
2. My jee is not a Gbui, but it is a Hbui. This means your jee does not have the qualities of a Gbui, but does have the qualities of a Hbui.

To determine if your jee is a Jbui, let’s look at the reverse of the first statement: If a Hbui is not a Gbui, it cannot be a Jbui.

Your jee is a Hbui, but it is not a Gbui. So, using the logic from the first statement, your jee cannot be a Jbui.

Still good with correct reasoning.

I’m looking to throw a curve ball at this point.

Q: All Hbuis are Gbuis if they’re also Jbuis. My jee is not a Gbui, but it is a Jbui. All Zorks are Hbuis if they’re also Storks. I know my jee is a Zork. Is my jee a Stork?
A: Let’s break this down step by step:
1. “All Hbuis are Gbuis if they’re also Jbuis.” This implies that being a Jbui is a necessary condition for an Hbui to be a Gbui. If something is both an Hbui and a Jbui, then it is a Gbui.
2. “My jee is not a Gbui, but it is a Jbui.” Your jee cannot be both an Hbui and a Jbui since it would then have to be a Gbui based on the first statement. This means that your jee is not an Hbui.
3. “All Zorks are Hbuis if they’re also Storks.” If something is both a Zork and a Stork, then it is an Hbui.
4. “I know my jee is a Zork.” So, your jee is a Zork, but from point 2, we deduced that your jee is not an Hbui.

Given these conditions, it would be inconsistent for your jee to be both a Zork and a Stork since it would then have to be an Hbui. However, since we’ve already deduced that your jee is not an Hbui, this means your jee cannot be a Stork.

I’ll have to call it now. GPT-4 is a lean mean logic machine. Of course it depends on where you draw the line. At this point it not only recognises logical inference but it also recognises that this is now what we’d call a theory. Theories may well be inconsistent, but this one isn’t. It’s a bit didactical that it’s pointing this out, but that’s probably due to its knowledge about logic comes from educational material (books, etc) it was trained on.

It’s observable that the longer an inference chain I present, the longer steps it produces to find out the answer. Eventually its context length will be the limiting factor. So if its reasoning extended beyond a few pages (Apparently for ChatGPT with GPT-4 it’s 4096 tokens, roughly a handful of pages for this kind of text density), it might start producing nonsense. But it wouldn’t be fair to call that a failure. Context lengths have been extending.

Another way to break it would be to give it something that requires recursion, like Peano Arithmetic expressed in plain language. I think it’d be able to do that to some depth, but eventually will be limited by context length. It’d be way more interesting if it started to fail before it reached its context length. Maybe I’ll try this next.

Still, If we tried to use GPT for logical reasoning in a professional setting, we’d be taking a big risk, because GPT’s ‘tape’ (as in the tape of a Turing Machine) is narration, and narratives in the wild are full of inconsistencies. As long as it can spot a modus ponens, fine, but if it starts to look like something else, it’ll jump ship before we know.

That’s why we need more robust forms of Transformers. Recently I’ve been working on Neuro-Symbolic AI. It’s a field that tries to bring the efficiency of Neural Networks to Symbolic AI, as in formal logic. Specifically I’m trying to reconstruct the algorithms that make CombInduce, our inductive synthesizer, so that they can be used to train models that represent sound logical structures. That way we can inspect how they are working and be able trust them.

Leave a Reply Cancel reply