1 00:00:00,086 --> 00:00:08,075 Since Google invented MapReduce, BigTable, distributed file systems. 2 00:00:08,075 --> 00:00:13,041 It has moved on and is now using something called Dremel. 3 00:00:13,041 --> 00:00:18,074 Recall that, in the early days of databases, the relational database was 4 00:00:18,074 --> 00:00:24,037 used both for transaction processing. That is, inserting new records, as well 5 00:00:24,037 --> 00:00:28,028 for answering complex queries. Storage was expensive. 6 00:00:28,028 --> 00:00:33,091 It was too expensive to, for example, create a fresh copy of the entire data. 7 00:00:33,091 --> 00:00:39,888 To, in a, in a better form, more suited for efficient query processing. 8 00:00:39,888 --> 00:00:47,327 So, one was happy with the compromise of having a sim-, the one size fits all model 9 00:00:47,327 --> 00:00:53,456 of the relational database. Over the years storage became very cheap, 10 00:00:53,456 --> 00:01:00,098 and one started to move data into specific column oriented databases as well as have 11 00:01:00,098 --> 00:01:05,859 analytical queries that touched all the data performed using MapReduce, where 12 00:01:05,859 --> 00:01:12,371 large volumes data would be read and then equally large volumes freshly written 13 00:01:12,371 --> 00:01:18,132 again and again as one performed more and more processing on them. 14 00:01:18,132 --> 00:01:24,140 This was fine as long as you had terabytes, hundreds of terabytes of data, 15 00:01:24,140 --> 00:01:27,831 even Google. But, once one started dealing with 16 00:01:27,831 --> 00:01:32,223 petabytes of data, and wanted queries on such volumes. 17 00:01:32,223 --> 00:01:39,357 One could not afford to produce a new petabyte of data every time when processed 18 00:01:39,357 --> 00:01:44,062 the old petabyte. So, the challenge of storage being a 19 00:01:44,062 --> 00:01:51,043 constraint once again, enters into the arena when one is dealing with very large 20 00:01:51,043 --> 00:01:56,076 volumes. At the same time, writing extremely large 21 00:01:56,076 --> 00:02:02,635 volumes is itself costly, and so by avoiding writing again and again, one does 22 00:02:02,635 --> 00:02:07,720 introduce more efficiencies. So this is, is essentially what Dremel 23 00:02:07,720 --> 00:02:11,464 does. Dremel today powers Google's BigQuery, 24 00:02:11,464 --> 00:02:15,988 which is a service, which one can access over the web. 25 00:02:15,988 --> 00:02:22,400 One can define extremely large tables. One can populate them through computations 26 00:02:22,400 --> 00:02:28,903 or importing data from various sources and execute extremely fast queries that 27 00:02:28,903 --> 00:02:34,581 process large volumes of data using the Dremel structure underneath. 28 00:02:34,581 --> 00:02:42,071 There are two important innovations in Dremel, which was published only in 2010. 29 00:02:42,071 --> 00:02:49,992 First, it uses a column-oriented storage. Just like a column-oriented database in, 30 00:02:49,992 --> 00:02:56,286 in, in some sense. But for nested and possibly non-unique 31 00:02:56,286 --> 00:03:01,414 fields. For example, you could have a document 32 00:03:01,414 --> 00:03:06,520 which had a field A. Within that field A, one could have 33 00:03:06,520 --> 00:03:14,033 another field B, which itself would have say, three, two different fields C and D 34 00:03:14,033 --> 00:03:20,094 which actually contain values. So, the nested field A.B.C or A.B.D is 35 00:03:20,094 --> 00:03:28,405 actually how you would access this data. Further in a particular record, there 36 00:03:28,405 --> 00:03:33,133 could be multiple values for A.B.C. For example. 37 00:03:33,133 --> 00:03:40,294 So, you could have multiple names, multiple IP addresses or whatever for this 38 00:03:40,294 --> 00:03:48,363 particular nested field name. This is very common in web-oriented text, 39 00:03:48,363 --> 00:03:56,423 text to unstructured data, not that common in structured relational data, for 40 00:03:56,423 --> 00:04:01,092 example. But this is the kind of large petabyte 41 00:04:01,092 --> 00:04:09,070 volume data that Google needs to process. So, the column where in to storage is 42 00:04:09,070 --> 00:04:16,033 fairly unique in that each nested field is stored contiguously. 43 00:04:16,069 --> 00:04:23,050 So, all the values for record one and record two for this nested field are 44 00:04:23,050 --> 00:04:30,758 stored in, close together on disk and processed by leaf servers. 45 00:04:30,758 --> 00:04:39,332 Similarly these, this nested field A.B.D is stored contiguously and so on. 46 00:04:39,332 --> 00:04:45,708 So, first innovation that it's column orientated for nested and possibly 47 00:04:45,708 --> 00:04:50,568 non-unique fields. The second innovation is that instead of 48 00:04:50,568 --> 00:04:54,913 reading and writing data repeatedly like in MatReduce. 49 00:04:55,110 --> 00:05:00,131 One assumes that the intermediate data that one produces is always much, much 50 00:05:00,131 --> 00:05:04,759 less than the original data, which was quite obvious if you're dealing with 51 00:05:04,759 --> 00:05:08,060 petabytes of data or you won't be producing more petabytes. 52 00:05:08,060 --> 00:05:13,036 You'll be summarizing it in some form or selecting it or querying it, exactly as 53 00:05:13,036 --> 00:05:17,099 you had in the traditional relational databases where you would query and get 54 00:05:17,099 --> 00:05:21,026 small results from large results, from large data. 55 00:05:21,026 --> 00:05:27,020 So, the second innovation is that there, there is a tree of query servers that pass 56 00:05:27,020 --> 00:05:31,092 intermediate results from the root to the leaves and back. 57 00:05:31,092 --> 00:05:37,039 And the intermediate servers essentially execute a complex query plan. 58 00:05:37,039 --> 00:05:42,007 Very similar, in some respects to traditional sequel engines. 59 00:05:42,007 --> 00:05:45,043 However, these operate at a different scale. 60 00:05:45,043 --> 00:05:49,018 Sequel engines predominantly operated in memory. 61 00:05:49,018 --> 00:05:55,011 These are operating in a distributed fashion across a tree of query servers, 62 00:05:55,011 --> 00:05:59,018 and passing results back and forth across a network. 63 00:06:00,048 --> 00:06:07,050 As a result, Google is able to demonstrate orders of magnitude better performance 64 00:06:07,050 --> 00:06:12,070 than MapReduce when performing queries on petabytes of data. 65 00:06:12,070 --> 00:06:19,081 Not only does it give more speed, but it also clearly saves storage as compared to 66 00:06:19,081 --> 00:06:24,023 MapReduce. The underlying storage layer remains the 67 00:06:24,023 --> 00:06:30,048 distributed GFS system. But the Dremel is now widely used in 68 00:06:30,048 --> 00:06:37,027 Google and is available publicly using the BigQuery service. 69 00:06:38,004 --> 00:06:43,030 There is some effort at creating an open source equivalent of Dremel. 70 00:06:43,030 --> 00:06:46,096 It's in its infancy right now. It's under Apache. 71 00:06:46,096 --> 00:06:51,107 It's called Drill. But beyond the name, I don't think they've 72 00:06:51,107 --> 00:06:55,843 made too much progress so far. So, we can now summarize our picture of 73 00:06:55,843 --> 00:07:00,022 how database technology has evolved over the years. 74 00:07:00,022 --> 00:07:05,614 We started out with the relational row store, which was essentially a one size 75 00:07:05,614 --> 00:07:09,052 fits all and still works fine for gigabytes of data. 76 00:07:09,052 --> 00:07:14,092 Then we moved onto column-oriented data warehouse technologies, specifically 77 00:07:14,092 --> 00:07:19,097 designed for all that queries. Which scaled up to terabytes of data, but 78 00:07:19,097 --> 00:07:25,030 required us to move off of the relational row store into a data warehouse. 79 00:07:26,024 --> 00:07:32,209 In parallel, the web side just created, distributed NoSQL databases which were a 80 00:07:32,209 --> 00:07:38,212 mix of row and column stores which also allowed MapReduce pro-, processing for 81 00:07:38,212 --> 00:07:44,742 bulk analysis and this scaled to tens of terabytes or sometimes even more volumes 82 00:07:44,742 --> 00:07:49,034 of data. In parallel with this, we had in memory 83 00:07:49,034 --> 00:07:54,078 databases emerging in the past few years, which now can do what the 84 00:07:54,078 --> 00:07:58,038 one-size-fits-all relational row stores did. 85 00:07:58,038 --> 00:08:04,077 Again, on gigabytes of data, but with an order of magnitude more performance. 86 00:08:05,012 --> 00:08:11,084 And on the large scale processing for petabytes of data, Google has evolved 87 00:08:11,084 --> 00:08:17,517 Dremel, which again, is a one-size-fits-all model for petabytes of 88 00:08:17,517 --> 00:08:20,679 data. So we have three models today. 89 00:08:20,679 --> 00:08:27,308 We have Dremel which only Google uses. We have In-Memory which is fine for doing 90 00:08:27,308 --> 00:08:33,319 OLAP on reasonably small databases. And for intermediate processing to do 91 00:08:33,319 --> 00:08:38,543 things like computing classifiers on large terabytes of data. 92 00:08:38,543 --> 00:08:42,556 Distributed noSQL's here's the preferred choice. 93 00:08:42,556 --> 00:08:48,518 At the same time, when you have terabytes of data and you want to do all that 94 00:08:48,518 --> 00:08:55,548 queries very fast using SQL, then there still remains a place for the column store 95 00:08:55,548 --> 00:09:04,281 data warehouses, typically the special purpose appliances like Netezza which use 96 00:09:04,281 --> 00:09:07,010 panel computing and column storage also have a place. 97 00:09:07,010 --> 00:09:13,011 This place where the column store warehouse is might evolve to a Dremel-like 98 00:09:13,011 --> 00:09:16,094 architecture in the future once we actually have. 99 00:09:16,094 --> 00:09:22,064 A publicly-available version of Dremel. This is a space to really watch carefully 100 00:09:22,064 --> 00:09:27,565 and that's what big data technology is looking forward to in the next three