In fact, in many ways artificial
intelligence at web scale is already here.
Think of machine translation between human
languages as a feat that was thought to be
impossible after the failures in the early
days of AI in the sixties.
Think of recognizing images such as in
Google Goggles.
Or in recognizing faces such as happens
with Picasa or Facebook every day.
Last but not least, think about Watson the
IBM program which defeated world
champions, or rather at least US national
champions, at the Jeopardy quiz game in
2009.
Certainly these tasks are worthy of being
called.
Successful artificial intelligence
applications.
In fact, if you think about the jeopardy
example, is the human participants had
been behind a screen, like Watson, and
communicated using computer generated
language.
Then, could we have actually made out the
difference between the two?
And in that sense, hasn't the Turing test
been successfully passed?
To a certain extent, their full web scale,
artificial intelligence is already here.
Now what about data?
What is all this about big data?
Well, there are lots of webpages now.
There are a billion Facebook users and
many more Facebook pages, hundreds of
millions of twitter accounts, hundreds of
millions of tweets per day, billions of
Google queries per day, millions of
servers, terabytes of data powering all
this.
And clearly driving this explosion is
More's Law where computing power doubles
every eighteen months.
In Criner's Law, which is that this
capacity is growing even faster than
Moore's Law.
In contrast with all these massive growth
and data, on the web.
Typical large enterprises like banks,
retail companies.
Or hospitals have.
Far fewer servers.
Maybe the largest banks have a few
thousand or tens of thousands of servers.
And terabytes of data rather than
petabytes, and only a few million
transactions a day, nowhere near the
billions.
As a result, the technology used by large
enterprises which, pretty much looks
something like this.
Where we have a bunch of databases where.
Data is collected, cleaned, put into
things called data warehouses.
And then taken out into further databases
on which some statistics or reporting is
performed.
This approach simply does not work.
And Google, Facebook, LinkedIn, Ebay,
Amazon which needed a process large
volumes of big data on the web did not use
traditional databases.
In fact they could not, later in this
course we'll study why.
As well as, what they.
Replace this technology with.
In short.
They used massive parallelism, and a new
programming paradigm for data processing
called map reduce, which is essentially
the heart of what is today called big data
technology.