1
00:00:00,000 --> 00:00:05,013
In this video, I'm going to describe some
relatively simple models of neurons.

2
00:00:05,013 --> 00:00:10,081
I'll describe a number of different models
starting with simple linear and threshold

3
00:00:10,081 --> 00:00:14,088
neurons, and then, describing slightly
more complicated models.

4
00:00:14,088 --> 00:00:20,028
These are much simpler than real neurons,
but they're still complicated enough, to

5
00:00:20,028 --> 00:00:25,035
allow us to make neural nets, that do some
very interesting kinds of machine

6
00:00:25,035 --> 00:00:28,083
learning.
In order to understand anything

7
00:00:28,083 --> 00:00:34,050
complicated, we have to idealize it.
That is, we have to make simplifications

8
00:00:34,050 --> 00:00:38,036
that allow us to get a handle on how it
might work.

9
00:00:38,036 --> 00:00:44,025
With atoms, for example, we simplify them
as behaving like little solar systems.

10
00:00:45,015 --> 00:00:49,019
Idealization removes the complicated
details that are not essential for

11
00:00:49,019 --> 00:00:54,010
understanding the main principles.
It allows us to apply mathematics, and to

12
00:00:54,010 --> 00:00:58,096
make analogies to other familiar systems.
And once we understand the basic

13
00:00:58,096 --> 00:01:03,088
principles, it's easy to add complexity,
and make the model more faithful to

14
00:01:03,088 --> 00:01:07,003
reality.
Of course, we have to be careful when we

15
00:01:07,003 --> 00:01:11,055
idealize something, not to remove the
thing that's giving it its main

16
00:01:11,055 --> 00:01:15,060
properties.
It's often worth understanding models that

17
00:01:15,060 --> 00:01:19,035
are known to be wrong, as long as we don't
forget they're wrong.

18
00:01:19,035 --> 00:01:24,000
So for example, a lot of work on neural
networks uses neurons that communicate

19
00:01:24,000 --> 00:01:28,083
real values rather than discrete spikes of
activity, and we know cortical neurons

20
00:01:28,083 --> 00:01:33,060
don't behave like that, but it's still
worth understanding systems like that, and

21
00:01:33,060 --> 00:01:37,000
in practice they can be very useful for
machine learning.

22
00:01:37,095 --> 00:01:42,081
The first kind of neuron I want to tell
you about is the simplest, it's a linear

23
00:01:42,081 --> 00:01:44,055
neuron.
It's simple.

24
00:01:44,055 --> 00:01:47,089
It's computationally limited in what it
can do.

25
00:01:47,089 --> 00:01:52,030
It may allow us to get insights into more
complicated neurons.

26
00:01:52,052 --> 00:01:59,085
But it may be somewhat misleading.
So in a linear neuron, the output Y.

27
00:02:00,018 --> 00:02:05,067
Is a function of a bi-asset in your run B
and the sum of all its incoming

28
00:02:05,067 --> 00:02:11,075
connections of the activity on an input
line times the weight on that line that's

29
00:02:11,075 --> 00:02:17,069
the synaptic weight on the input line and
if you plot that as curve, then if you

30
00:02:17,069 --> 00:02:23,062
plot on the X-axis, the buyers plus the
weighted activities on the input line we

31
00:02:23,062 --> 00:02:26,081
get a straight line that goes through
zero.

32
00:02:30,069 --> 00:02:34,086
Very different from linear neurons, are
binary threshold neurons that were

33
00:02:34,086 --> 00:02:39,014
introduced by McCulloch and Pitts.
They actually influenced Von Roenam when

34
00:02:39,014 --> 00:02:42,036
he was thinking about how to design a
universal computer.

35
00:02:43,090 --> 00:02:49,096
In a binary threshold neuron you first
compute a weighted sum of the inputs and

36
00:02:49,096 --> 00:02:56,009
then you send out a spike of activity if
that weighted sum exceeds the threshold.

37
00:02:56,069 --> 00:03:01,040
[inaudible] and Pitts thought that the
spikes were like the truth values of

38
00:03:01,040 --> 00:03:04,057
propositions.
So each neuron is combining the truth

39
00:03:04,057 --> 00:03:09,003
values it gets from other neurons to
produce the truth value of its own.

40
00:03:09,003 --> 00:03:13,062
And that's like combining some
propositions to compute the truth value of

41
00:03:13,062 --> 00:03:17,078
another proposition.
At the time in the 1940's logic was the

42
00:03:17,078 --> 00:03:24,006
main paradigm for how the mind might work.
Since then people thinking about how the

43
00:03:24,006 --> 00:03:29,039
brain computes have become much more
interested in the idea the brain is

44
00:03:29,039 --> 00:03:33,069
combining lots of different sources of
unreliable evidence.

45
00:03:33,069 --> 00:03:38,052
And so logic isn't such a good pardigm for
what the brain's up to.

46
00:03:39,043 --> 00:03:44,041
For a binary threshold neuron, you can
think of its input/output function as if

47
00:03:44,041 --> 00:03:48,070
the weighted input is above the threshold,
it gives an output of one.

48
00:03:48,070 --> 00:03:55,022
Otherwise, it gives an output of zero.
There are actually two equivalent ways to

49
00:03:55,022 --> 00:03:57,097
write the equations for a binary threshold
neuron.

50
00:03:58,024 --> 00:04:04,087
We can say that the total input Z is just
the activities on the input lines times

51
00:04:04,087 --> 00:04:09,024
the weights.
And then the output Y is one if that Z is

52
00:04:09,024 --> 00:04:15,031
above the threshold and zero otherwise.
Alternatively, we could say that the total

53
00:04:15,031 --> 00:04:20,018
input includes a bias term.
So the total input is what comes in on the

54
00:04:20,018 --> 00:04:23,066
input lines, times the weights, plus this
bias term.

55
00:04:23,066 --> 00:04:29,063
And then we could say the output is one if
that total input is above zero and is zero

56
00:04:29,063 --> 00:04:33,004
otherwise.
And the equivalence is simply that the

57
00:04:33,004 --> 00:04:38,074
threshold in first formulation is equal to
the negative of the bias in the second

58
00:04:38,074 --> 00:04:44,049
formulation.
A kind of neuron that combines the

59
00:04:44,049 --> 00:04:49,033
properties of both linear neurons and
binary threshold neurons is a rectified

60
00:04:49,033 --> 00:04:54,051
linear neuron.
It first computes a linear weighted sum of

61
00:04:54,051 --> 00:04:59,025
its inputs, but then it gives an output
that's a non-linear function of this

62
00:04:59,025 --> 00:05:04,042
weighted sum.
So we compute Z in the same way as before.

63
00:05:05,010 --> 00:05:08,054
If z is below zero, we give an output of
zero.

64
00:05:08,054 --> 00:05:12,006
Otherwise, we give an output that's equal
to z.

65
00:05:12,006 --> 00:05:16,073
So above zero is linear, and at zero, it
makes a hard decision.

66
00:05:16,073 --> 00:05:23,016
So the input/output curve looks like this.
It's definitely not linear, but above zero

67
00:05:23,016 --> 00:05:27,022
it is linear.
So with a neuron like this, we can get a

68
00:05:27,022 --> 00:05:32,027
lot of the nice properties of linear
systems, when it's above zero.

69
00:05:32,027 --> 00:05:36,063
We can also get the ability to make
decisions, at zero.

70
00:05:40,036 --> 00:05:45,032
The neurons that we'll use a lot in this
course, and are probably the commonest

71
00:05:45,032 --> 00:05:50,016
kinds of neurons to use in artificial
neuron [inaudible], are sigmoid neurons.

72
00:05:50,016 --> 00:05:55,044
They give a real valued output that is a
smooth and bounded function of their total

73
00:05:55,044 --> 00:05:59,051
input.
It's typical to use the logistic function,

74
00:05:59,051 --> 00:06:05,042
where the total input is computed as
before, as a bias plus what comes in on

75
00:06:05,042 --> 00:06:10,046
the input lines, weighted.
The output for a logistic neuron is one

76
00:06:10,046 --> 00:06:13,095
over one plus e to the minus, the total
input.

77
00:06:14,025 --> 00:06:19,014
If you think about that, if the total
input's big and positive.

78
00:06:19,014 --> 00:06:22,069
E to the minus a big positive number is
zero.

79
00:06:22,069 --> 00:06:28,021
And so, the output will be one.
If the total input's big and negative, E

80
00:06:28,021 --> 00:06:34,044
to the minus a big negative number is a
large number, and so the output will be

81
00:06:34,044 --> 00:06:38,045
zero.
So the input output function looks like

82
00:06:38,045 --> 00:06:42,016
this.
When, the total input's zero, e to the

83
00:06:42,016 --> 00:06:48,074
minus zero is one, so the output's a half.
And the nice thing about a sigmoid is it

84
00:06:48,074 --> 00:06:53,047
has, smooth derivatives.
The derivatives, change continuously.

85
00:06:53,047 --> 00:06:59,089
And, so they're nicely behaved, and they
make it easy to do learning as we'll see

86
00:06:59,089 --> 00:07:04,069
in lecture three.
Finally the stochastic binary neurons.

87
00:07:04,069 --> 00:07:07,098
They use just the same equations as
logistic units.

88
00:07:07,098 --> 00:07:13,027
They compute their total input the same
way and they use the logistic function to

89
00:07:13,027 --> 00:07:18,018
compute a real value which is the
probability that they will output a spike.

90
00:07:18,018 --> 00:07:23,028
But then instead of outputting that
probability as a real number they actually

91
00:07:23,028 --> 00:07:28,070
make a probabilistic decision, and so what
they actually output is either a one or a

92
00:07:28,070 --> 00:07:30,089
zero.
They're intrinsically random.

93
00:07:32,025 --> 00:07:36,075
So they're treating the P as the
probability of producing a one, not as a

94
00:07:36,075 --> 00:07:39,089
real number.
Of course, if the input is very big and

95
00:07:39,089 --> 00:07:42,079
positive they will almost always produce a
one.

96
00:07:42,079 --> 00:07:47,004
If the input's big and negative they'll
almost always produce a zero.

97
00:07:48,045 --> 00:07:52,067
We can do a similar trick with rectified
linear units.

98
00:07:52,067 --> 00:07:59,015
We can say that the output, there's real
value that comes out of a rectified linear

99
00:07:59,015 --> 00:08:04,014
unit, if its input is above zero, is the
rate of producing spikes.

100
00:08:04,014 --> 00:08:08,076
So that's deterministic.
But once we've figured out these rate of

101
00:08:08,076 --> 00:08:13,016
producing spikes, the actual times at
which spikes are produced is a random

102
00:08:13,016 --> 00:08:14,098
process.
It's a Poisson process.

103
00:08:14,098 --> 00:08:19,073
So the rectified linear unit determines
the rate, but intrinsic randomness in the

104
00:08:19,073 --> 00:08:22,089
unit determines when the spikes are
actually produced.