Introduction
The Raven Paradox, also known as Hempel’s Paradox, is a paradox introduced by Carl Gustav Hempel (C. G. Hempel 1937; Carl G. Hempel 1945) that explores the notion of qualitative confirmation—specifically, what constitutes evidence for a particular hypothesis.
Statements with empirical content require validation through reference to the empirical reality around us. However, it is not possible to conclusively verify certain statements, such as universally quantified ones, by means of deductive logic. The sentence “All ravens are black” is an example of a universally quantified statement. No finite amount of observation could verify this statement, as it asserts validity for an infinite number of instances. Although it is not possible to conclusively verify such general hypotheses through observation and deductive logic, it can intuitively be said that each observation of a “black raven” provides evidence for the hypothesis and confirms it. In this sense, the non-deductive relationship between evidence and hypothesis is crucial for scientific practice, as science largely deals with such general hypotheses. Nevertheless, attempts to formalize an explicit theory of confirmation/disconfirmation encounter problems, including Hempel’s Raven Paradox. To fully grasp why Hempel formulated this paradox, it is important to understand the historical and philosophical context in which it emerged.
Hempel was a pioneering figure within the Logical Positivism movement, contributing greatly to its refinement and establishment within philosophical circles, and the paradox should be understood within this context. Logical Positivism was a philosophical movement that flourished in the 1920s and 1930s in central Europe, particularly among members of the Vienna Circle. It is characterized by an approach to philosophy, grounded in the belief that meaningful statements are those that can be empirically verified or are tautological (true by definition). This principle, known as the verifiability criterion of meaningfulness, was central to their efforts to distinguish “meaningful” questions—those that can be the subject of philosophical and scientific endeavour—from “meaningless” ones, which lack empirical content or logical form and thus fall outside the realm of legitimate discourse. By this standard, sentences that are neither tautological nor empirically verifiable, such as theological or metaphysical claims, were considered meaningless. Logical Positivists believed that scientific methodology, with its emphasis on empirical testing and observation, was the ideal way to acquire knowledge. They argued that the sole purpose of philosophy should be to clarify the language of science, ensuring that all philosophical discourse is grounded in empirical reality and logical analysis. However, while the verifiability criterion of meaningfulness ruled out statements without empirical content as “meaningless,” it also classified many universally quantified scientific statements, such as physical laws and theories, as “meaningless.”, which ultimately challenged the very foundation of scientific research.
Hempel’s Theory of Confirmation
Referring to confirmation as qualitative highlights the distinction between qualitative and quantitative confirmation. The notion of qualitative confirmation focuses on the nature of non-deductive relation between evidence and the hypothesis, without attempting to quantify it. While quantitative confirmation involves assigning numerical values to the strength of evidence, qualitative confirmation deals with how an evidence supports or undermines a hypothesis based purely on logical or conceptual connections. In exploring formal theories of confirmation, several approaches have emerged. The hypothetico-deductive (HD) model of confirmation suggests that evidence confirms a hypothesis if the hypothesis successfully predicts the evidence. Meanwhile, Bayesian confirmation theory employs probability to assess the degree to which evidence supports a hypothesis, focusing on how new evidence affects the likelihood of a hypothesis being true and thus follows a more quantitative approach. In this essay, we will focus on Hempel’s theory of confirmation to answer the question “What does it mean, that an evidence \(E\) supports a hypothesis \(H\) ?”
Hempel’s theory of confirmation is essentially a reconstruction of the Jean Nicod’s criterion of confirmation (Nicod and Lalande 1925). Nicod’s criterion is as follows:
“Consider the formula or the law: \(A\) entails \(B\). How can a particular proposition, or more briefly, a fact, affect its probability? If this fact consists of the presence of \(B\) in a case of \(A\), it is favourable to the law ‘\(A\) entails \(B\)’; on the contrary, if it consists of the absence of \(B\) in a case of \(A\), it is unfavourable to this law. It is conceivable that we have here the only two direct modes in which a fact can influence the probability of a law[…] Thus, the entire influence of particular truths or facts on the probability of universal propositions or laws would operate by means of these two elementary relations which we shall call confirmation and invalidation.”
In other words, according to Nicod’s criterion, a hypothesis of the form “All \(A\)’s are \(B\)” is confirmed by an observation of an \(A\) that is also a \(B\), and disconfirmed by an observation of an \(A\) that is not a \(B\).
Nevertheless, there are a few shortcomings of Nicod’s criterion of confirmation:
It is only applicable to hypotheses of universal conditional form, the applicability of the criterion is restricted to certain form of hypotheses.
It does not fulfill the equivalence condition. It makes confirmation depend not only on content of the hypothesis but also on its formulation.
The second shortcoming requires further explanation, what is the equivalence condition?
The equivalence condition basically states, that the hypotheses, which are logically equivalent, should be confirmed/disconfirmed by the same set of evidences.
Consider following two hypotheses:
- \(S_1\): All ravens are black
- \(S_2\): All non-black things are non-ravens
These two hypotheses are logically equivalent but an observation of a raven, which is black, would confirm the hypothesis \(S_1\) but would be neutral with respect to \(S_2\). Similarly an observation of an object, which is not black and not a raven, would confirm the hypothesis \(S_2\) but would be neutral with respect to \(S_1\). This suggests that, the Nicod’s criterion of confirmation does not purely depend on the content of the hypothesis but also on its formulation.
The Raven Paradox and Resolution Attempts
The Raven Paradox is a controversial conclusion derived from premises that are seemingly trivial, including the equivalence condition. Consider the following premises:
- Every observation of an A that is B supports a hypothesis of the form \((\forall x)(Ax \rightarrow Bx)\), meaning “All \(A\) are \(B\)”
- Hypotheses, which are logically equivalent, are confirmed/disconfirmed by the same set of evidences.
- \((\forall x)(Ax \rightarrow Bx)\) is logically equivalent to \((\forall x)(\neg Bx \rightarrow \neg Ax)\)
Now, consider the hypothesis \(H\) = “All ravens are black” or \((\forall x)(Ax \rightarrow Bx)\), where \(A\) refers to ravens and \(B\) to blackness. According to the premise \(S_3\), the hypothesis \(H\) is logically equivalent to the hypothesis \(H'\) = “All non-blacks are non-raven” or \((\forall x)(\neg Bx \rightarrow \neg Ax)\), meaning “Whatever is not black is not a raven”. By the premise \(P_1\), an observation of a non-black non-raven, for example a white shoe, supports the hypothesis \(H'\). According to the premise \(P_2\), hypotheses \(H\) and \(H'\) are confirmed by the same set of evidences and as a consequence the observation of a white shoe supports the hypothesis \(H\) = “All ravens are black”.
The issue is that accepting these premises $P_1, \(P_2\) and \(P_3\) as true logically leads to the conclusion that observing a white shoe supports the hypothesis that all ravens are black. This outcome clashes with our common sense and intuition. After all, why would the observation of a white shoe, seemingly unrelated to ravens, provide any support for a hypothesis, which is seemingly about ravens?
As with any deductive argument, there are several approaches to challenge the conclusion and, by extension, the paradox:
- Deny one or more of the premises.
- Deny the validity of the deduction.
- Deny that the conclusion is paradoxical.
First of all, attempting to deny the validity of the deduction would be absurd, as it follows basic principles of sound deductive logic. The argument is structured in a way that, given the premises, the conclusion necessarily follows. Therefore, any challenge must focus on the premises themselves or on the interpretation of the conclusion, rather than on the deductive process used to derive it. Additionally, since premise \(P_3\) is a sound principle in classical logic, it is also not reasonable to reject it. This leaves only two options for any attempt to resolve the paradox by focusing on the premises: either reject premise \(P_1\), which reflects Nicod’s criterion, or reject premise \(P_2\), the equivalence condition.
Hempel’s Resolution
Hempel himself, denied that the conclusion is paradoxical. He argued, that “impression of a paradoxical situation is not objectively founded; it is a psychological illusion” (Carl G. Hempel 1945). He argumented, that a general hypothesis such as “All ravens are black” is in fact a constraint on all objects, and not just on ravens. Furthermore, he pointed out that when discussing how evidence \(E\) supports a hypothesis \(H\), we implicitly rely on a set of background information \(B\). He suggested that the paradoxical conclusion would disappear if we did not factor in this background knowledge.
To elaborate on Hempel’s first point, the hypothesis “All ravens are black” can be seen not merely as a statement about ravens but as a universal constraint on all objects. In other words, the hypothesis asserts that if any object is a raven, then it must also be black. This means that the hypothesis implicitly makes a claim about every possible object in the world: any non-black object cannot be a raven. From this perspective, observing a non-black object, such as a white shoe, is relevant to the hypothesis because it confirms that the object in question is not a counterexample to the rule. While it may feel counterintuitive, Hempel’s view is that confirming the absence of non-black ravens is just as valid as confirming the presence of black ones.
Regarding the second point, Hempel stresses the role of background information in how evidence is interpreted. When we assess how a particular piece of evidence supports a hypothesis, we do so with additional background assumptions that often go unnoticed. For instance, if we already know that most objects around us are not ravens, observing a white shoe feels irrelevant to the hypothesis “All ravens are black.” However, if we imagine a scenario where the distribution of ravens and non-ravens is unknown, the observation of any non-black, non-raven object gains more significance. Hempel argued that the sense of paradox arises because we intuitively operate with background information, like knowing what ravens look like and their rarity, without realizing its influence on our reasoning. When this background information is accounted for, the seemingly paradoxical nature of the conclusion dissolves.
Hempel’s key insight is that confirmation should be seen as a three-way relationship: “evidence \(E\) supports the hypothesis \(H\) relative to background information \(B\),” rather than just a two-way link between \(E\) and \(H\). However, Hempel’s theory of confirmation does not offer a framework to systematically account for background information \(B\) in order to “isolate” the confirmational impact of evidence \(E\) on its own. (Fitelson and Hawthorne 2010) In other words, it does not make a distinction between “\(E\) confirms \(H\) given \(B\)” and “\(E \And B\) confirms \(H\)”. In contrast, Bayesian confirmation theory offers a more structured approach. Using Bayes’ Theorem, it quantifies how much evidence \(E\) raises the probability of the hypothesis \(H\) in light of background information \(B\). While Bayesian confirmation is fundamentally quantitative, it can easily be interpreted qualitatively. So, a set of alternative approaches for resolving the Raven Paradox can be found by looking at the Bayesian view on confirmation.
Bayesian Approach
Bayesianism is basically a probabilistic framework, which in simple words describe how to update the belief regarding the likelihood of a hypothesis in light of new evidence. Bayes’ Theorem explicates this as following:
\[ P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)} \]
Where:
\(P(H|E)\) is the posterior probability: the probability of the hypothesis \(H\) after observing evidence \(E\).
$P(E|H) $ is the likelihood: the probability of observing evidence \(E\) given that \(H\) is true.
\(P(H)\) is the prior probability: the initial probability of \(H\) before observing any evidence.
\(P(E)\) is the marginal likelihood: the overall probability of observing ( E ) under all possible hypotheses. It can also be written as \(P(E) = P(D|H) \cdot P(H) + P(D|\neg H) \cdot P(\neg H)\)
It is important to point out that the Bayesian formulation does not presuppose that the probabilities correspond to the objective likelihood of events—assuming such a thing as ‘objective probability’ even exists. Instead, these probabilities represent our degrees of belief or confidence in certain outcomes based on the available information. This subjective interpretation of probability is central to Bayesianism, as it allows us to update our beliefs dynamically in response to new evidence. In this regard, Bayesian reasoning is particularly well-suited for addressing Hempel’s Raven Paradox, as it helps us explore whether the so-called paradoxical conclusion is truly paradoxical or simply a psychological illusion rooted in our subjective beliefs. (Urbach and Howson 2006)
(Urbach and Howson 2006), demonstrates an elegant example of how Bayesian framework can provide a detailed analysis of the paradox.
Let \(H\) denote the hypothesis “All ravens are black”, \(P\) a suitable probability function, \(\theta\) the probability of a randomly chosen raven being black, \(R\) an observation of a black object, \(\overline{R}\) an observation of a non-black object, \(R\) an observation of a raven, \(\overline{R}\) an observation of a non-raven and \(\overline{R} \overline{B}\) for example an observation of a black raven.
\(P(H|\overline{R} \overline{B})\) represents our belief regarding how likely it is that the \(H\) is true, given we observe a non-black non-raven, e.g. a white shoe. According to Bayes’ Theorem
\[\begin{aligned} P(H|\overline{R} \overline{B}) & = \frac{P(\overline{R} \overline{B}|H) \cdot P(H)}{P(\overline{R} \overline{B})} \\ & = P(H) \cdot \frac{P(\overline{R} \overline{B}|H)}{P(\overline{R} \overline{B})} \end{aligned}\]The ratio between \(P(H|\overline{R} \overline{B})\) and \(P(H)\) can be seen as a measure for confirmation. This results in following analysis under some plausible assumptions:
\[\begin{aligned} \frac{{}P(H|\overline{R} \overline{B})}{P(H)} & = \frac{P(\overline{R} \overline{B}|H)}{P(\overline{R} \overline{B})} \\ & = \frac{P(\overline{B}|H)}{P(\overline{R} \overline{B})} \\ & =\frac{P(\overline{B})}{P(\overline{R} \overline{B})} \\ & = \frac{P(\overline{B})}{P(\overline{R}|\overline{B}) \cdot P(\overline{B})} \\ & = \frac{1}{P(\overline{R}|\overline{B})} \end{aligned}\]At first glance, some of the derivation steps may seem unclear due to the implicit assumptions involved. Let’s break them down:
Step (3) essentially applies Bayes’ Theorem.
Step (4) assumes that the probability of encountering a non-black non-raven (( )) is negligibly close to the probability of encountering any non-black object (()). This assumption seems reasonable, as there are far more non-black objects than ravens in general. While the equality used here is for simplicity and isn’t formally precise, it doesn’t affect the overall point of this analysis.
In step (5), it’s assumed that the probability of an object being non-black given that (H) is true is again very close to the prior probability (P()), which is justified by the overwhelming ratio of non-black things to ravens.
Under the assumptions mentioned, we come to the conclusion, that the ratio between \(P(H|\overline{R} \overline{B})\) and \(P(H)\), which is considered to be a measure for confirmation, is roughly equal to \(\frac{1}{P(\overline{R}|\overline{B})}\). Another plausible assumption given the world we live in is that there are overwhelmingly more non-ravens than raven so it is safe to assume that \[ \frac{1}{P(\overline{R}|\overline{B})} \approx \frac{1}{1 -\epsilon} > 1, \] where \(\epsilon\) is a very small number. As a consequence, this series of derivations under some plausible assumptions leads us to the result, that the observation of a white shoe does indeed confirms the hypothesis \(H\), that all ravens are black, but the degree of confirmation is almost negligible.
To understand why, consider the vast number of non-black, non-raven objects in the world. Observing one more of these objects, like a white shoe, barely shifts our confidence in the hypothesis, simply because such observations are so common. In Bayesian terms, while the likelihood ratio slightly favors ( H ), the weight of this evidence is lessened by the large number of similar observations. It’s like adding a single drop of water into an ocean, the water level technically rises, but it isn’t noticeable. This minimal confirmation highlights a crucial aspect of Bayesian reasoning: not all evidence is equally impactful. Although technically supportive, evidence like the white shoe is weak because it doesn’t address the core of the hypothesis. What truly matters is evidence directly tied to ravens themselves—seeing a black raven, for instance, offers much stronger support. In this regard, (Urbach and Howson 2006) ‘resolves’ the paradox by showing that it isn’t truly a paradox, while also shedding light on why it might seem like one to us.
Various Attempts
One approach that complements the Bayesian perspective is Patrick Maher’s use of Carnap’s theory of inductive probability (Maher 1999). Maher builds on the idea that observing non-ravens can indeed confirm the hypothesis, but in a way that differs from the intuitive, and often misleading, assumption that non-ravens inform us about the color of ravens. Instead, by reducing the total number of potential counterexamples to the hypothesis, Maher shows that such observations still contribute, albeit indirectly, to confirming “All ravens are black.”. A counterexample to the hypothesis would be a raven that is not black. Observing any raven that is black doesn’t falsify the hypothesis and thus supports it, but observing non-ravens, like white shoes or red apples, plays a different role. Even though a white shoe, for example, isn’t a raven and doesn’t tell us directly about raven colors, it still reduces the number of possible objects that could serve as a counterexample to the hypothesis. The more non-raven objects we observe, the more potential items we rule out as being non-black ravens. By systematically observing non-black, non-raven objects, we narrow the space of possible counterexamples that could disprove the claim that all ravens are black.
Another interesting attempt to resolve the paradox appeared on a single-page article titled “The White Shoe is a Red Herring” (Good 1967):
Suppose that we know we are in one or other of two worlds, and the hypothesis, H, under consideration is that all the crows in our world are black. We know in advance that in one world there are a hundred black crows, no crows that are not black, and a million other birds; and that in the other world there are a thousand black crows, one white one, and a million other birds. A bird is selected equiprobably at random from all the birds in our world. It turns out to be a black crow. This is strong evidence (a Bayes-Jefrreys-Turing factor 2 of about 10) that we are in the second world, wherein not all crows are black. Thus the observation of a black crow, in the circumstances described, undermines the hypothesis that all the crows in our world are black. Thus the initial premise of the paradox of confirmation is false, and no reference to the contrapositive is required
This thought experiment from (Good 1967) introduces an important point: the validity of Hempel’s first premise, Nicod’s criterion, depends heavily on the background information we consider. By imagining two possible worlds, the example shows that observing a black crow can sometimes weaken the hypothesis, depending on the specific context. This challenges the traditional understanding of Hempel’s paradox, suggesting that confirmation may be more context-dependent than originally thought.
Another example of such a configuration of background information is found in (Rosenkrantz 1977). Consider the following scenario: Three men, \(M_1\), \(M_2\), and \(M_3\), are wearing three different hats, \(\textit{Hat}_1\), \(\textit{Hat}_2\), and \(\textit{Hat}_3\). They enter a bar, leave their jackets and hats at the coat check, have some drinks, and become drunk. When it’s time to leave, they each pick a hat at random. Let \(H_\textit{hat}\) denote the hypothesis that each man picks a hat that does not belong to him. According to Nicod’s criterion, if we learn that \(M_1\) picked \(\textit{Hat}_2\), this evidence supports the hypothesis \(H_\textit{hat}\) since \(M_1\) has chosen a hat that isn’t his. If we also learn that \(M_2\) picked \(\textit{Hat}_1\), Nicod’s criterion would again suggest this supports the hypothesis, as \(M_2\) has also picked a hat that does not belong to him. However, if \(M_1\) and \(M_2\) had picked each other’s hats, we would implicitly know that \(M_3\) picked his own hat, meaning the hypothesis is actually false. In this case, the evidence of \(M_2\) picking \(\textit{Hat}_1\) would actually falsify the hypothesis rather than support it, contrary to what Nicod’s criterion would suggest.
However, Hempel responds in his article “The White Shoe: No Herring” (Carl G. Hempel 1967), by arguing that the paradox should be analysed without reference to background information. He insists that the consideration of additional factors compromises the logical clarity of the paradox. In his view, any observation of an A that is B must universally confirm the hypothesis, regardless of what we know about the world. Hempel’s response calls for a stricter interpretation of confirmation, where the process is kept pure and uninfluenced by external considerations. This debate over whether to include or exclude background information raises a broader question: can we really evaluate any hypothesis in isolation from its context? By questioning the universal applicability of Nicod’s criterion, arguments such as “The White Shoe is a Red Herring” highlight potential limitations in the original formulation.
Another alternative thought-provoking approach to address the paradox is Quine’s argument of natural kinds (Quine 1970). The Raven Paradox, as outlined by Hempel, relies on the assumption that objects can be neatly classified into categories like “raven” and “black” in such a way that a white shoe could logically confirm the hypothesis “All ravens are black”. However, Quine argues that not all groupings of objects are equally valid when it comes to inductive reasoning. Instead, he claims that categories that reflect the real, inherent structure of the world are essential to making sense of inductive generalizations, which he calls “natural kinds”. Quine’s notion of natural kinds provides a way to dismiss the paradox as a problem that arises from a misunderstanding of how we should group objects for the purposes of confirmation. In the case of the Raven Paradox, Quine would argue that the category of “non-black non-ravens” (e.g., white shoes) is not a natural kind, and thus observations of such objects cannot genuinely confirm or disconfirm a hypothesis about ravens. The paradox arises, in Quine’s view, because it mistakenly treats irrelevant categories as if they were meaningful for inductive reasoning.
This perspective is closely related to Goodman’s grue-emerald argument (Goodman 1983), which also questions the validity of certain categories in inductive logic. Goodman’s grue hypothesis highlights the problem of arbitrary classifications. By defining “grue” as something that is green before time \(T\) and blue after, we find that the same observations can support two contradictory hypotheses: “All emeralds are green” and “All emeralds are grue”. Both Goodman and Quine converge on the idea that not all predicates are equally valid when it comes to inductive generalizations. Goodman claims that certain predicates, such as ‘grue’, are artificially constructed and lack the intuitive appeal of others, such as ‘green’. Quine extends this notion by arguing that only natural kinds - those that reflect real divisions in nature - should be used to formulate hypotheses and confirm them by observation. According to Quine, the Raven Paradox, like Goodman’s grue problem, exposes how our reliance on artificial categories can distort inductive reasoning. In the Raven Paradox, the category of “non-black non-ravens” is treated as relevant for confirming a hypothesis about ravens, but Quine would argue that this category is too arbitrary to support any meaningful inference. The same issue appears in Goodman’s grue hypothesis: the classification of emeralds as “grue” is not based on a natural kind but on an arbitrary, division.
Verdict
To summarize, Hempel’s raven paradox has established itself as the paradox of confirmation, sparking a long line of philosophical discussions as various perspectives have emerged on how to approach and resolve it. I tend to view the paradox more as a common medium for discussing the non-deductive relationship between evidence and hypothesis rather than as a paradox in the traditional sense. Among all the attempts at resolution, I find myself most aligned with the approach provided by the Bayesian framework, which denies the paradoxical nature of the conclusion. It is, of course, not feasible to delve deeply into the extensive discussions surrounding Bayesian confirmation theory and the interplay between the Bayesian view of confirmation and the paradox. However, I firmly believe that the premises underlying the paradox and the deductions leading to the so-called paradoxical conclusion are sound and “bullet-proof,” with the configuration of background information serving as the actual “red herring” in this context. In that regard, I am inclined to explain why our intuition senses “something wrong” with the so-called paradoxical conclusion. In my opinion, the Bayesian framework offers the most “intuitive” way to address why our intuition fails. Although I tend to deny the paradoxical nature of the conclusion, I still find it challenging to defend the idea that a white shoe and a black raven provide the same kind of evidence for the hypothesis. It is clear to me that there must be a distinction between a white shoe and a black raven regarding the degree of confirmation they offer for the hypothesis, and that is precisely what the Bayesian approach aims to demonstrate.