1
00:00:00,086 --> 00:00:08,075
Since Google invented MapReduce, BigTable,
distributed file systems.

2
00:00:08,075 --> 00:00:13,041
It has moved on and is now using something
called Dremel.

3
00:00:13,041 --> 00:00:18,074
Recall that, in the early days of
databases, the relational database was

4
00:00:18,074 --> 00:00:24,037
used both for transaction processing.
That is, inserting new records, as well

5
00:00:24,037 --> 00:00:28,028
for answering complex queries.
Storage was expensive.

6
00:00:28,028 --> 00:00:33,091
It was too expensive to, for example,
create a fresh copy of the entire data.

7
00:00:33,091 --> 00:00:39,888
To, in a, in a better form, more suited
for efficient query processing.

8
00:00:39,888 --> 00:00:47,327
So, one was happy with the compromise of
having a sim-, the one size fits all model

9
00:00:47,327 --> 00:00:53,456
of the relational database.
Over the years storage became very cheap,

10
00:00:53,456 --> 00:01:00,098
and one started to move data into specific
column oriented databases as well as have

11
00:01:00,098 --> 00:01:05,859
analytical queries that touched all the
data performed using MapReduce, where

12
00:01:05,859 --> 00:01:12,371
large volumes data would be read and then
equally large volumes freshly written

13
00:01:12,371 --> 00:01:18,132
again and again as one performed more and
more processing on them.

14
00:01:18,132 --> 00:01:24,140
This was fine as long as you had
terabytes, hundreds of terabytes of data,

15
00:01:24,140 --> 00:01:27,831
even Google.
But, once one started dealing with

16
00:01:27,831 --> 00:01:32,223
petabytes of data, and wanted queries on
such volumes.

17
00:01:32,223 --> 00:01:39,357
One could not afford to produce a new
petabyte of data every time when processed

18
00:01:39,357 --> 00:01:44,062
the old petabyte.
So, the challenge of storage being a

19
00:01:44,062 --> 00:01:51,043
constraint once again, enters into the
arena when one is dealing with very large

20
00:01:51,043 --> 00:01:56,076
volumes.
At the same time, writing extremely large

21
00:01:56,076 --> 00:02:02,635
volumes is itself costly, and so by
avoiding writing again and again, one does

22
00:02:02,635 --> 00:02:07,720
introduce more efficiencies.
So this is, is essentially what Dremel

23
00:02:07,720 --> 00:02:11,464
does.
Dremel today powers Google's BigQuery,

24
00:02:11,464 --> 00:02:15,988
which is a service, which one can access
over the web.

25
00:02:15,988 --> 00:02:22,400
One can define extremely large tables.
One can populate them through computations

26
00:02:22,400 --> 00:02:28,903
or importing data from various sources and
execute extremely fast queries that

27
00:02:28,903 --> 00:02:34,581
process large volumes of data using the
Dremel structure underneath.

28
00:02:34,581 --> 00:02:42,071
There are two important innovations in
Dremel, which was published only in 2010.

29
00:02:42,071 --> 00:02:49,992
First, it uses a column-oriented storage.
Just like a column-oriented database in,

30
00:02:49,992 --> 00:02:56,286
in, in some sense.
But for nested and possibly non-unique

31
00:02:56,286 --> 00:03:01,414
fields.
For example, you could have a document

32
00:03:01,414 --> 00:03:06,520
which had a field A.
Within that field A, one could have

33
00:03:06,520 --> 00:03:14,033
another field B, which itself would have
say, three, two different fields C and D

34
00:03:14,033 --> 00:03:20,094
which actually contain values.
So, the nested field A.B.C or A.B.D is

35
00:03:20,094 --> 00:03:28,405
actually how you would access this data.
Further in a particular record, there

36
00:03:28,405 --> 00:03:33,133
could be multiple values for A.B.C.
For example.

37
00:03:33,133 --> 00:03:40,294
So, you could have multiple names,
multiple IP addresses or whatever for this

38
00:03:40,294 --> 00:03:48,363
particular nested field name.
This is very common in web-oriented text,

39
00:03:48,363 --> 00:03:56,423
text to unstructured data, not that common
in structured relational data, for

40
00:03:56,423 --> 00:04:01,092
example.
But this is the kind of large petabyte

41
00:04:01,092 --> 00:04:09,070
volume data that Google needs to process.
So, the column where in to storage is

42
00:04:09,070 --> 00:04:16,033
fairly unique in that each nested field is
stored contiguously.

43
00:04:16,069 --> 00:04:23,050
So, all the values for record one and
record two for this nested field are

44
00:04:23,050 --> 00:04:30,758
stored in, close together on disk and
processed by leaf servers.

45
00:04:30,758 --> 00:04:39,332
Similarly these, this nested field A.B.D
is stored contiguously and so on.

46
00:04:39,332 --> 00:04:45,708
So, first innovation that it's column
orientated for nested and possibly

47
00:04:45,708 --> 00:04:50,568
non-unique fields.
The second innovation is that instead of

48
00:04:50,568 --> 00:04:54,913
reading and writing data repeatedly like
in MatReduce.

49
00:04:55,110 --> 00:05:00,131
One assumes that the intermediate data
that one produces is always much, much

50
00:05:00,131 --> 00:05:04,759
less than the original data, which was
quite obvious if you're dealing with

51
00:05:04,759 --> 00:05:08,060
petabytes of data or you won't be
producing more petabytes.

52
00:05:08,060 --> 00:05:13,036
You'll be summarizing it in some form or
selecting it or querying it, exactly as

53
00:05:13,036 --> 00:05:17,099
you had in the traditional relational
databases where you would query and get

54
00:05:17,099 --> 00:05:21,026
small results from large results, from
large data.

55
00:05:21,026 --> 00:05:27,020
So, the second innovation is that there,
there is a tree of query servers that pass

56
00:05:27,020 --> 00:05:31,092
intermediate results from the root to the
leaves and back.

57
00:05:31,092 --> 00:05:37,039
And the intermediate servers essentially
execute a complex query plan.

58
00:05:37,039 --> 00:05:42,007
Very similar, in some respects to
traditional sequel engines.

59
00:05:42,007 --> 00:05:45,043
However, these operate at a different
scale.

60
00:05:45,043 --> 00:05:49,018
Sequel engines predominantly operated in
memory.

61
00:05:49,018 --> 00:05:55,011
These are operating in a distributed
fashion across a tree of query servers,

62
00:05:55,011 --> 00:05:59,018
and passing results back and forth across
a network.

63
00:06:00,048 --> 00:06:07,050
As a result, Google is able to demonstrate
orders of magnitude better performance

64
00:06:07,050 --> 00:06:12,070
than MapReduce when performing queries on
petabytes of data.

65
00:06:12,070 --> 00:06:19,081
Not only does it give more speed, but it
also clearly saves storage as compared to

66
00:06:19,081 --> 00:06:24,023
MapReduce.
The underlying storage layer remains the

67
00:06:24,023 --> 00:06:30,048
distributed GFS system.
But the Dremel is now widely used in

68
00:06:30,048 --> 00:06:37,027
Google and is available publicly using the
BigQuery service.

69
00:06:38,004 --> 00:06:43,030
There is some effort at creating an open
source equivalent of Dremel.

70
00:06:43,030 --> 00:06:46,096
It's in its infancy right now.
It's under Apache.

71
00:06:46,096 --> 00:06:51,107
It's called Drill.
But beyond the name, I don't think they've

72
00:06:51,107 --> 00:06:55,843
made too much progress so far.
So, we can now summarize our picture of

73
00:06:55,843 --> 00:07:00,022
how database technology has evolved over
the years.

74
00:07:00,022 --> 00:07:05,614
We started out with the relational row
store, which was essentially a one size

75
00:07:05,614 --> 00:07:09,052
fits all and still works fine for
gigabytes of data.

76
00:07:09,052 --> 00:07:14,092
Then we moved onto column-oriented data
warehouse technologies, specifically

77
00:07:14,092 --> 00:07:19,097
designed for all that queries.
Which scaled up to terabytes of data, but

78
00:07:19,097 --> 00:07:25,030
required us to move off of the relational
row store into a data warehouse.

79
00:07:26,024 --> 00:07:32,209
In parallel, the web side just created,
distributed NoSQL databases which were a

80
00:07:32,209 --> 00:07:38,212
mix of row and column stores which also
allowed MapReduce pro-, processing for

81
00:07:38,212 --> 00:07:44,742
bulk analysis and this scaled to tens of
terabytes or sometimes even more volumes

82
00:07:44,742 --> 00:07:49,034
of data.
In parallel with this, we had in memory

83
00:07:49,034 --> 00:07:54,078
databases emerging in the past few years,
which now can do what the

84
00:07:54,078 --> 00:07:58,038
one-size-fits-all relational row stores
did.

85
00:07:58,038 --> 00:08:04,077
Again, on gigabytes of data, but with an
order of magnitude more performance.

86
00:08:05,012 --> 00:08:11,084
And on the large scale processing for
petabytes of data, Google has evolved

87
00:08:11,084 --> 00:08:17,517
Dremel, which again, is a
one-size-fits-all model for petabytes of

88
00:08:17,517 --> 00:08:20,679
data.
So we have three models today.

89
00:08:20,679 --> 00:08:27,308
We have Dremel which only Google uses.
We have In-Memory which is fine for doing

90
00:08:27,308 --> 00:08:33,319
OLAP on reasonably small databases.
And for intermediate processing to do

91
00:08:33,319 --> 00:08:38,543
things like computing classifiers on large
terabytes of data.

92
00:08:38,543 --> 00:08:42,556
Distributed noSQL's here's the preferred
choice.

93
00:08:42,556 --> 00:08:48,518
At the same time, when you have terabytes
of data and you want to do all that

94
00:08:48,518 --> 00:08:55,548
queries very fast using SQL, then there
still remains a place for the column store

95
00:08:55,548 --> 00:09:04,281
data warehouses, typically the special
purpose appliances like Netezza which use

96
00:09:04,281 --> 00:09:07,010
panel computing and column storage also
have a place.

97
00:09:07,010 --> 00:09:13,011
This place where the column store
warehouse is might evolve to a Dremel-like

98
00:09:13,011 --> 00:09:16,094
architecture in the future once we
actually have.

99
00:09:16,094 --> 00:09:22,064
A publicly-available version of Dremel.
This is a space to really watch carefully

100
00:09:22,064 --> 00:09:27,565
and that's what big data technology is
looking forward to in the next three