Two Studies Pave The Way To Better Computer Object Recognition And Future Therapies For Visual Disorders
How is it possible for a human eye to figure out letters that are twisted and looped in crazy directions, like those in the little security test internet users are often given on websites?
It seems easy to us – the human brain just does it. But the apparent simplicity of this task is an illusion. The task is actually so complex, no one has been able to write computer code that translates these distorted letters the same way that neural networks can. That’s why this test, called a CAPTCHA, is used to distinguish a human response from computer bots that try to steal sensitive information.
Now, a team of neuroscientists at the Salk Institute for Biological Studies has taken on the challenge of exploring how the brain accomplishes this remarkable task. Two studies published within days of each other demonstrate how complex a visual task decoding a CAPTCHA, or any image made of simple and intricate elements, actually is to the brain.
The findings of the two studies, published recently in Neuron and in the Proceedings of the National Academy of Sciences (PNAS), take two important steps forward in understanding vision, and rewrite what was believed to be established science. The results show that what neuroscientists thought they knew about one piece of the puzzle was too simple to be true.
Their deep and detailed research – involving recordings from hundreds of neurons – may also have future clinical and practical implications, says the study’s senior co-authors, Salk neuroscientists Tatyana Sharpee and John Reynolds.
“Understanding how the brain creates a visual image can help humans whose brains are malfunctioning in various different ways – such as people who have lost the ability to see,” says Sharpee, an associate professor in the Computational Neurobiology Laboratory. “One way of solving that problem is to figure out how the brain – not the eye, but the cortex – processes information about the world. If you have that code then you can directly stimulate neurons in the cortex and allow people to see.”
Reynolds, a professor in the Systems Neurobiology Laboratory, says an indirect benefit of understanding the way the brain works is the possibility of building computer systems that can act like humans.
“The reason that machines are limited in their capacity to recognize things in the world around us is that we don’t really understand how the brain does it as well as it does,” he says.
The scientists emphasize that these are long-term goals that they are striving to reach, a step at a time.
Integrating parts into wholes
In these studies, Salk neurobiologists sought to figure out how a part of the visual cortex known as area V4 is able to distinguish between different visual stimuli even as the stimuli move around in space. V4 is responsible for an intermediate step in neural processing of images.
“Neurons in the visual system are sensitive to regions of space – they are like little windows into the world,” says Reynolds. “In the earliest stages of processing, these windows – known as receptive fields – are small. They only have access to information within a restricted region of space. Each of these neurons sends brain signals that encode the contents of a little region of space – they respond to tiny, simple elements of an object such as edge oriented in space, or a little patch of color.”
Neurons in V4 have a larger receptive field that can also compute more complex shapes such as contours. They accomplishes this by integrating inputs from earlier visual areas in the cortex – that is, areas nearer the retina, which provides the input to the visual system, which have small receptive fields, and sends on that information for higher level processing that allow us to see complex images, such as faces, he says.
Both new studies investigated the issue of translation invariance – the ability of a neuron to recognize the same stimulus within its receptive field no matter where it is in space, where it happens to fall within the receptive field.
The Neuron paper looked at translation invariance by analyzing the response of 93 individual neurons in V4 to images of lines and shapes like curves, while the PNAS study looked at responses of V4 neurons to natural scenes full of complex contours.
Dogma in the field is that V4 neurons all exhibit translation invariance.
“The accepted understanding is that individuals neurons are tuned to recognize the same stimulus no matter where it was in their receptive field,” says Sharpee.
For example, a neuron might respond to a bit of the curve in the number 5 in a CAPTCHA image, no matter how the 5 is situated within its receptive field. Researchers believed that neuronal translation invariance – the ability to recognize any stimulus, no matter where it is in space–increases as an image moves up through the visual processing hierarchy.
“But what both studies show is that there is more to the story,” she says. “There is a trade off between the complexity of the stimulus and the degree to which the cell can recognize it as it moves from place to place.”
A deeper mystery to be solved
The Salk researchers found that neurons that respond to more complicated shapes – like the curve in 5 or in a rock – demonstrated decreased translation invariance. “They need that complicated curve to be in a more restricted range for them to detect it and understand its meaning,” Reynolds says. “Cells that prefer that complex shape don’t yet have the capacity to recognize that shape everywhere.”
On the other hand, neurons in V4 tuned to recognize simpler shapes, like a straight line in the number 5, have increased translation invariance. “They don’t care where the stimuli they are tuned to is, as long as it is within their receptive field,” Sharpee says.
“Previous studies of object recognition have assumed that neuronal responses at later stages in visual processing remain the same regardless of basic visual transformations to the object’s image. Our study highlights where this assumption breaks down, and suggests simple mechanisms that could give rise to object selectivity,” says Jude Mitchell, a Salk research scientist who was the senior author on the Neuron paper.
“It is important that results from the two studies are quite compatible with one another, that what we find studying just lines and curves in one first experiment matches what we see when the brain experiences the real world,” says Sharpee, who is well known for developing a computational method to extract neural responses from natural images.
“What this tells us is that there is a deeper mystery here to be solved,” Reynolds says. “We have not figured out how translation invariance is achieved. What we have done is unpacked part of the machinery for achieving integration of parts into wholes.”
Minjoon Kouh, a former postdoctoral fellow at Salk, participated in the PNAS study. Salk postdoctoral researcher Anirvan Nandy and senior staff scientist Jude Mitchell, of the Salk Systems Neurobiology Laboratory, were co-authors of the Neuron paper.
Both studies were funded by grants from the National Institutes of Health (R01EY019493), the McKnight Scholarship and the Ray Thomas Edwards and W. M. Keck Foundations. In addition, the PNAS study received a grant from the Searle Funds. The Neuron study was additionally funded by grants from the Alfred P. Sloan Foundation, the National Institutes of Health (EY0113802), the Gatsby Charitable Foundation and the Schwartz Foundation, and a Pioneer Fund postdoctoral fellowship.