In the first post in this series I made an extraordinary claim about research I’ve been doing into a potentially revolutionary adaptation of the transformer architecture. Before we can look at the details of what I’ve been researching and how, and more importantly why, it has been able to achieve these results, we need to take a step back all the way to the beginning of AI research. Actually, we need to go back beyond that to the beginnings of research in the biology of nervous systems.
When Mary Shelley wrote Frankenstein, she did not arrive at the concept of a doctor piecing together parts of cadavers to create a “monster” by accident. She was inspired by what was happening contemporaneously in the field of science. Electricity was still a poorly understood phenomenon, and it would be more than a century before scientists fully nailed down its underlying mechanisms. Still, in experimenting with this mysterious phenomenon, Italian physician Luigi Galvani was able to show that it was not a property of inanimate objects only. He, in his early experiments, was able to draw a connection between electricity and the nervous system. Specifically, he showed that applying an electrical current to the nervous tissue of a freshly killed frog could make the frog’s legs twitch.
Not only did this inspire Mary Shelley, but it also revealed something immensely important about how the nervous system operates. Again, it would be almost a century before people were able to tease apart the detailed mechanisms, but it turns out that neurons mostly function as electrical wires…with one very, very important caveat. That caveat goes by the name “action potential”.
Essentially, neurons have an internal electrical potential that is influenced by various chemical factors. The most important category of factors is known as neurotransmitters. These are chemicals released by other neurons in close proximity (nearly touching) to a neuron when those other neurons become activated. Each release of neurotransmitters nudges the internal electrical potential of a neuron away from its resting state, until the accumulation of these nudges pushes the neuron past a critical point, the action potential. When this critical point is reached a current of electricity flows down the length of the neuron until it reaches the tips of where the neuron makes close contact with other neurons. At this point the current triggers a release of neurotransmitters from the neuron to yet other neurons with their own action potentials.
That last paragraph may seem like a lot, but what it boils down to is this: every neuron acts a bit like a voting system. Imagine you have a picture of a dress that might be blue or might be gold. You, a neuron, ask your group of close friends what they think, and they each vote one way or the other. If enough of your friends feel one way versus the other, then you post the picture to neuron social media and state, confidently, that the dress is blue. The other neurons who follow you don’t know anything about how many of your friends thought it might be gold, they only know that you are communicating that it is blue. These neuron social media followers of yours have other neurons they follow who are placing their own votes, and each neuron on social media confidently states, one way or the other, what they believe based on votes from their own friends. Eventually, this viral meme reaches the neuron that has a connection with the speech portion of the brain, and when that neuron makes a pronouncement it results in the declaration: “the dress is gold.”
In other words, neurons operate as “discriminators”. There is no maybe this or 30% that. According to a neuron, things are yes or no with no in between.
One thing your math teacher probably never told you is that math has its limits. Certain, fairly mundane, concepts are difficult to capture in their fullness with mathematics alone. One such concept is that of a discrete decision. Math can do maybe. Math can do 30% with its eyes closed, but math has a problem with “yes or no”. I’ll explore this problem more in a later post. For now, what’s important to understand is that as biologists were revealing the nature of how neurons operate, mathematicians were looking on with envy. With neurons, biology had a mechanism for making “yes or no” decisions in a way that mathematics did not.
Driven by the desire to replicate this kind of discrimination, and being as impatient as ever, mathematicians, along with their newly arrived and closely aligned peers the computer scientists, did not wait for the biologists to work out all the details of how individual neurons collected into things like “a brain”. Instead, much as Galvani poking at random parts of a frog with energized wires, early mathematicians pushed ahead with designing a computer system that would mimic the operation of neurons. Units within the system would coalesce multiple signals, combine their individual values, and based on whether or not this combined value passed a specific threshold, a new signal would be passed along to the next unit.
This design was given the name “Perceptron”, and it forms the foundation of nearly all the artificial intelligence in use today. The first physical implementation of this design, the Mark I Perceptron, was built in 1958 and by all accounts showed tremendous success. Its ability to make “yes or no” and “this or that” decisions based only on visual input was far beyond anything mathematics alone had been able to achieve up to that point.
Of course, it was also a physically very large machine with onerous requirements for wiring and circuitry and a relatively slow response time. Much of the ensuing decades of artificial intelligence research would focus on ways to overcome these limitations. Unfortunately, for some of us who don’t necessarily have the greatest talent for advanced mathematics, much of the ensuing success in this realm came from reaching deep into the mathematician’s tool chest, pulling especially from the drawer labeled “Linear Algebra”. More on that in the next post…