In this video, I'm going to describe some
of the things that make it difficult to
recognize objects in real scenes.
We're incredibly good at this, and so it's
very hard for us to realize how difficult
it is to take a bunch of numbers that
describe the intensities of pixels and go
from there to the label of an object.
There's all sorts of difficulties.
We have to segment the object out.
We have to deal with variations in
lighting, in viewpoint.
We have to deal with the fact that the
definitions of objects are quite
complicated.
It's also possible that to get from an
image to an object requires huge amounts
of knowledge, even for the lower level
processes that involve segmentation and
dealing with viewpoint and lighting.
If that's the case, it's gonna be very
hard for any hand engineered program to be
able to do a good job of those things.
There are many reasons why it's hard to
recognize objects and images.
First of all, it's hard to segment out an
object from the other things in an image.
In the real world, we move around, and so
we have motion cues.
We also have two eyes, so we have stereo
cues.
You don't get those in static images.
So it's very hard to tell which pieces go
together as parts of the same object.
Also, parts of an object can be hidden
behind other objects, and so, you often
don't see the whole of an object.
You're so good at doing vision that you
don't often notice this.
Another thing that makes it very hard to
recognize objects is that the intensity of
a pixel is determined as much by the
lighting as it is by the nature of the
object.
So, for example, a black surface in bright
light will give you much more intense
pixels than a white surface in very gloomy
light.
Remember, to recognize an object you've
got to convert a bunch of numbers, that is
the intensities of the pixels, into a
class label.
And, these intensities are varying for all
sorts of reasons that have nothing to do
with the nature of the object.
Or nothing to do with the identity of the
object.
Objects can also deform in a variety of
ways.
So for relatively simple things like
handwritten digits there's a wide variety
of different shapes that have the same
name.
A two for example could be very italic
with just a cusp instead of a loop or it
could be a very loopy two with a, a big
loop and very round.
It's also the case that for many types of
object, the class is defined more by what
the object's for, than by its visual
appearance.
So consider chairs.
There's a huge variety of things we call
chairs, from armchairs to modern chairs
made with steel frames and wood backs to
basically anything you can sit on.
One other thing that makes it hard to
recognize objects, is that we have
different viewpoints.
So, there's a wide variety of viewpoints
from which we can recognize a 3D object.
Now, changes in viewpoint cause changes in
the images that standard machine learning
methods cannot cope with.
The problem is that information hops about
between the input dimensions.
So, typically envision the input
dimensions correspond to pixels, and, if
an object moves in the world and you don't
move your eyes to follow it, the
information about the object will occur on
different pixels.
That's not the kind of thing we normally
have to deal with in machine learning.
Just to stress that point, suppose we had
a medical database in which one of the
inputs is the age of a patient and another
input is the weight of the patient.
And we start doing machine learning, and
then we realize that some coder has
actually changed which input I mentioned
is coding which property.
So for one of the coders they've put
weight where they should have put age, and
they put age where they should have put
weight.
Obviously, we wouldn't just carry on doing
our learning.
We'd try and do something to fix that.
That's going to make everything go wrong.
I call that phenomenon dimension hopping.
When information jumps from one input
dimension to another.
And that's what Viewpoint does and it's
something we need to fix.
And preferably we'd like to fix it in a
systematic way.