What is at stake in contemporary debates about innateness

Jake Browning
Jul 7, 2023
10 min read

It is often hard to figure out what, if anything, is at stake in these discussions of innateness. All the nativists are big on learning and empiricists acknowledge built-in architecture and some basic learning algorithms and methods of backprop are innate. Thus, everyone’s both a nativist and an empiricist—which means the distinction doesn't seem that useful.

But this is an oversimplification. People agree there is an innate architecture to the brain. And some learning algorithms are underlying the learning.; even B.F. Skinner needed the mechanisms for learning from reinforcement to be innate. The question largely turns on whether there are innate representations--that is, certain representations (like objects, numbers, etc.) which are encoded in our genes--and which innate algorithms are present. This debate is what largely underlies the debate about domain-specific and domain-general learning--which is how most people in cognitive science talk about this issue. This debate is also genealogically related as well, going back to Kant's rejection of Hume.

Contemporary Empiricism

The idea of domain-general learning is that we don't need to have innate, domain-specific concepts--such as object, causality, or number--because we can learn them using very general resources. Historically, these resources usually involved some mix of associations, such as Hume's resemblance, spatial contiguity, and before-and-after connection. The idea was that these could be combined: if a blue ball falls when dropped, a similar red ball will also drop. Hume had little idea how this might be cashed out in the brain, but he thought we could get all our learning from habituation and reinforcement from our faculties of sensations, memories, and imagination (all of which were innate).

More recently, this program has gotten a boost from artificial neural networks (ANNs). These suggest innate architectures to explain roughly how the brain might instantiate some of the innate faculties, as well as using some rather simple learning algorithms to perform something like the associations Hume outlined. For example, as Cameron Buckner argues, learning abstract notions of similarity between diverse categories can be accomplished using a convolutional neural network without any need for innate representations; architecture and learning algorithm is enough.

Saying ANNs lack any innate representation is a half-truth. Rather, they have a basically random, disordered, distributed representation that could end up representing all kinds of different things. But what that distributed representation will end up representing depends on what it is connected to as input--vision, hearing, multimodal, etc.--and what is needed as output--an action by the organism, a response to another part of the brain, etc.. What learning accomplishes is ordering the distributed representation into something that accomplishes a specific task by encoding the properties of the domain--whatever is being input--in useful ways--however it needs to be output for the system to act on it. Put simply, the randomness is reordered so that it becomes a domain-specific module--it learns those features necessary for recognizing objects, or performing math, or playing a video game. In short, the distributed representations ends up a kind of “know-how,” a skill where the environmental data is represented tacitly so as to guide action effectively.

For those arguing the brain mostly consists in domain-general learning, the idea is that all our domain-specific knowledge--such as language or logic--arises out of these domain-general algorithms. As evidence, they point to studies showing, for example, that re-wiring the auditory network to receive visual inputs transforms it into a visual network. This suggests the vision system is not innately domain-specific; it is domain-general, but it becomes specified because it is developing the skill for using visual data effectively.

Contemporary Nativism

Most nativists grasp that empiricists acknowledge the innateness of neural network architecture and some learning rules. They thus skip these arguments and focus on the status of innate representations--and, with it, some arguments about both domain-specific knowledge and the learning algorithms behind them.

The claim that we have certain innate representations is sometimes posited by people studying cognitive psychology. They contend there is a core knowledge component (e.g., Spelke's recent book)--consisting of concepts of object, number, place, agent, and so on--which are built into the system. Their evidence for this isn't that newborns open their eyes and start tracking this stuff. Rather, it is based upon the incredible capacity of humans to learn rapidly--in some cases, effectively instantaneously. As Gary Marcus rightly notes, humans learn a lot faster than ANNs, and this has led to speculation they rely on learning mechanisms which aren’t just gradient descent. A big one is hypothesis testing: if babies see something unexpected, like a bouncy ball go flying when thrown, they’ll often test out what other objects behave the same way by throwing them.

Hypothesis-testing is difficult with distributed representations, since it requires being able to make big changes rapidly, sometimes after just one example. It also assumes some basic concepts of the domain which are improved through the hypothesis-testing. This kind of approach is much simpler to implement, however, on symbolic AI approaches using a “language of thought”: discrete representations, variable-binding, and logical operations. For example, an infant sees a ball bounce. The ball is object, so the kid comes up with a hypothesis: objects bounce. So, the kid throws a block to test out their new thesis. Test and update their concept of object, coming up with a better understanding of objects and bounciness, learning about shape, materials, surfaces, and so on.

This position requires domain-specific knowledge for the big stuff. A baby needs to already know the basics of objects and causality in outline in order to use hypothesis-testing. If babies aren't sure what an "object" is, then testing on a block may tell them nothing about a chair, and without some sense of "causality" the baby trying out an action at time-a will have no reason to expect the same action at time-b will produce the same outcome. Hypotheses-testing largely assumes a regular world; it's the basis for experimenting in the first place.

Looking for Evidence

Since distributed representations are damn near impossible to hand-code and too complex to fit in the genome, but are really simple (if time-consuming) to learn, a plausible assumption is that if we find a distributed representation we probably also are looking at a representation that is learned using something approximating backprop--which is slow. We’re now doing a bunch of similarity score studies of how similar representations in the brain are to ANNs, showing enough commonality to suggest this is broadly right in a lot of areas. ANNs also have a distinctive learning curve: a lot of failures in the beginning, then more gradual improvement that becomes sometimes much, much quicker, and then finally very slow improvement once it is “trained.” This looks somewhat similar to older learning curves seen in animal cognition studies during the behaviorist years.

By contrast, we can also look for behavioral effects that are difficult for distributed representations—like rapid belief updating and variable-binding—as evidence for a language of thought. While not definitive, it is really hard to get these features through gradient descent, so we have a hunch that there is a distinctive, symbolic learning mechanism at work. And since learning a language of thought and variable-binding has proven extremely difficult to do--especially reliably--using ANNs, we have a prima facie reason to think these systems aren't learned through backprop. So, the assumed answer is some kind of innateness.

But these same mechanisms have problems of their own: they prefer regularity and clear boundaries, and they really struggle with nuance and variation. There are difficulties explaining how the discrete representations, especially for general categories, can be innate. And there also isn't much insight into how discrete representations and variable-binding might be implemented in neurons, either. Thus, while popular with cognitive psychologists, it hasn't always been well received by cognitive neuroscientists.

The History

This shifts the debate from whether anything is innate (an easy question) to what kinds of learning algorithm is innate: a minimal innate mechanism utilizing something like gradient descent, or a more robust innate mechanism like a dedicated symbol-manipulation module. So, everyone’s a nativist now. Why treat this as related to domain-general and domain-specific debates? Part of the answer is historic, part recent.

As mentioned, Hume argued that the human mind runs on similarity, finding correlations in data, and this is sufficient to explain all animal and human reasoning. It was a domain-general, empiricist approach. Kant rejected this. He claimed you couldn’t learn necessary and universal logical laws from experience and, since logic is the ground of the necessary and universal truths of mathematics and Newtonian science, you can’t learn math or science. So, correlation alone can't explain human reasoning abilities.

Kant's response is to posit innate representations (or, at least, representations that are disposed to arise with the first encounter with the appropriate target, which is also roughly how modern nativists think of innate representations). For Kant, humans necessarily represent the world as consisting of spatiotemporal "objects" which are interacting according to "causal" laws. Objects and causes are not facts about the universe necessarily, but they are necessarily the way we order and understand the universe.

For Kant, our innate, "pure" concepts of object and cause are highly abstract, but we apply them automatically to our sensations into logical judgments--not quite the contemporary symbolic manipulation kind, but a close predecessor. Kant regarded learning, then, as filling in the pure concepts of objective reality through experience--effectively making the abstract notion of "object" more concrete by learning about specific objects and their specific properties, as well as specific causal laws explaining them. For the physical world, the goal is to create a logical totality, one running from pure concepts of objects and causes, through Newtonian concepts of mass and gravitation, and then down to the microscopic.

But Kant also argued the physical world could not exhaust our experiences. There were also agents who could not be reduced to causality: living beings and persons. The latter were self-determining, capable of choosing how they should act, which required a whole host of (also innate) moral concepts to make sense of. Similarly, living beings required a further set of concepts, such as goal-directedness that was both intentional and causal, yet not free (as in the case of persons). Our concepts for the physical, the moral, and the biological do not connect into a unified system, so Kant instead divided them into three domains--each of which formed a logical system, but one which was incompatible with the others. In effect, humans possessed three different, domain-specific kinds of innate knowledge which, while largely empty at the start, can only be fleshed out in one way once we try to apply hypothesis-testing to our experiences.

Later in the 19th century, empirists (which is how they spelled it) argued this wouldn’t do and tried to show you could learn logic and math from experience (see Mill and Helmholtz). The rough idea is that if purely abstract objects, like number, could be learned from mere correlations from sensations and memories, then object-, living, and person-representations could as well (though there were questions about causality). And hypothesis-testing, while real and important, could be treated as an emergent feature of associations. By contrast, anti-psychologistic neo-Kantians argued this was balderdash, with Gottlieb Frege especially arguing this wouldn’t work because numbers and logical operations aren’t objects of experience but specific ways of relating objects—rules, not things. The rules, it so happened, that underlie hypothesis-testing; if hypothesis-testing is possible, it can't be because of anything we've experienced.

Frege also argued Kant hadn’t been rigorous enough with logic and argued it should be understood in terms of symbolic logic—you know, discrete representations, variable-binding, and logical operations. Frege argued symbolic logic was the foundation not just of math and science, but also language. In fact, he thought logic just is the structure of all thought; ergo, all thinking is propositional and should be connected into a single, coherent, ideal symbolic logic connecting every possible true proposition to every other true proposition. It was a good, old-fashioned return to Platonism.

While Frege did not speculate as to how learning works, the standard account for most of those inspired by his work was hypothesis-testing. This became so common even some empiricists in the 20th century (e.g., Wilfrid Sellars) began to argue it was the only kind of learning available. Even associationism was taken to require hypothesis-testing: after all, gathering up correlations is useless unless you are categorizing them in ways that include some correlations and exclude others. So, the argument went, rules underlie the move from correlation to categorization, and rules depend on a hypothesis--which, of course, depends on logic, discrete symbols, and so on. This lead to the formalization of the learning problem in the 20th century: if human knowledge is propositional, therein hypothesis-testing is the only game in town.

And Fodor loved that. He argued we needed to “psychologize Frege,” turning symbol manipulation into just how the brain works (which is a classic version of the computational theory of mind). And, since 1969's Minsky and Papert book, everyone has known that neural networks struggle with logic—and, thus, “thinking.” So, it was a short step to simply say, “that isn’t even thinking! It’s just conditioned responses to learned patterns. Plus, the categorizations tacitly rely on hypothesis-testing, rendering them dependent on logic (if indirectly).” The result is that much work on neural networks was relegated to the "non-cognitive," stuff happening below the level of thinking-proper and explicable solely by pattern-recognition.

Return to Contemporary Innateness

Thus, hypothesis-testing was the only game in town until about the 1980s--excluding a hardcore contingent of behaviorists. But the behaviorists couldn't explain language, so they were largely relegated to animal cognition stuff (which caused issues). Thus, every computational model in cognitive science largely relied on symbol-manipulation, and the discrete representations manipulated typically had their meaning "innately"--that is, programmed in by the human. These models became, in many cases, successful at matching the learning profiles of humans, which provided a good reason to assume the brain relied on similar representations and algorithms.

But when connectionism arose--that is, contemporary multilayer neural networks trained using backpropagation--we suddenly saw a massive shift. A domain-general computational model could simply be wired up to some inputs and trained to provide the right outputs and, voila, you'd have a domain-specific model. These often (though not always) had a similar behavioral profile to a human but often (though not always) had the wrong kind of learning profile. But connectionist models were loosely based on the wiring of the brain, and following the brain wiring more closely (as with convolutional neural networks) often produces even better results. So there was a lot of promise.

And we're still there. Symbol-manipulation-based computational models are still a closer match in behavior and learning profile to humans, suggesting innate domain-specific mechanisms and representations are plausible. But there is still no story about how they're implemented in the brain. By contrast, artificial neural networks are proving to produce distributed representations that resemble those in the brain. But the behavior and learning rate is still less similar to human data; they don't make the same mistakes and they take a lot longer to learn. So, there is a big gap between the two.

At the moment, though, domain-general empiricism has the energy behind it. Neural networks keep getting better, and brain scans keep showing promising results, so a lot of scientists are hoping they can learn faster and match human behavior better in the future. Still, innate cognitive psychology still has a lot of life left because it has excellent models of human reasoning to play with. The debate between nativism and empiricism is still alive and well, though it looks a bit more technical and more explicit than it did a few hundred years ago.

What is at stake in contemporary debates about innateness

Recent Posts

Comments