Of course as we have seen parallel computing is not new and has been around essentially since the early 80s and databases have also, traditionally evolved to exploit parallel computing over the years. In the beginning we had shared memory databases. Which still persists to date where you have a large. Multi processor system, with multiple CPUs sharing memory. A single operating system scheduling jobs or processes across different CPUs and a common disk or storage area network, where all the data is stored. Examples of such systems abound. Almost all servers today are multiprocessing shared memory systems. But they, almost a few dozen processors, and even with each of them having multiple cores, we find shared memory systems supporting at most a few hundred processing units. The shared memory model simply doesn't scale beyond this level. Databases have exploited other parallel architectures, such as the shared disk architecture, and the shared nothing architecture, which scale to greater number of processors as compared to shared memory. In a shared disk architecture you may have multiple processors which communicate over a network. Accessing, however, a common disk system which could be a storage area network or a network attached system of storage using two different networks, one for communication between processors, and the other one for accessing storage. The shared nothing architecture on the other hand. Relies on local disks with each processor. So that the only communication that takes place over the network is between processors. In parallele database in both the shared nothing architectures as well as the others. Sequel queries are executed in parallele by multiple processors. In the shared nothing architecture in addition. Data itself is distributed across different discs using varieties of partitioning schemes. Such as different sets of rows on different discs or in the case of column oriented, specialized engines for analytical processing, different sets of columns on different discs. All this has happened, in the database community and parallel databases are now almost given. They all support sequel. Some of them support transaction processing, where of course there is the additional overhead of managing transaction isolation and consistency of cross multiple processors. But we won't get into that right now. The thing that is not handled by parallel databases properly is fault tolerance. They didn't have to handle fault tolerance because with just a few dozen processors, you don't need to worry about processors failing in while executing a sequel query which in any way will take few seconds or few minutes utmost. When you are executing a large batch job which touches virtually all the data. The chances of a processor failing are very high, especially a large number of processors. And some situations, the pile of database architecture simply are not full tolerant. Full tolerance in the parallel database world relies on having a hot standby architecture or deployment which is identical to the primary and essentially replicating data over a high speed network between the primary and the hot standby. A very costly and still not completely fault tolerant architecture as compared to say, the highly distributed, fault-tolerant, map-produced system on a distributed GFS or HDFS. So here is where the dichotomy comes in when you're doing large volume analytical processing which touches all the data the parallel database architecture simply doesn't work. So, where databases have been evolving over the past few years. Because of the big data technology that has emerged from the web, is in two directions. On the one hand we have the no sequel databases. Which are based on big data technology. They're called no sequel because firstly don't, they don't support full asset transactions or rather fully isolated serializable transactions in the traditional sense. Secondly, instead of having complex indexing, they have chartered indexing, which is essentially having partitioning of the data between different chunks or different blocks of disk. We'll come to sharded indexing shortly. Third, they don't support full joints. They, for a variety of reasons, they are restricted in the kind of joints that they can perform efficiently. And lastly they support column-oriented stories if needed, so that for very long, wide columns, different parts of a record can be stored on different service. The other side of database evolution is in memory databases, which is, has been driven by the, increasing. Volumes of main memory available on today's servers. And the falling cost of memories. So today servers have, you know, 64 gigabytes or even 124 gigabytes of main memory. Which allows for most practical purposes many ordinary enterprise transaction systems to actually reside in main memory. So, real time transactions become possible. At much higher rates than before. Varieties of indexes can be supported just as before. In a, in traditional databases, but now in memory. So, different kinds of indexing structures are possible. And of course complex joints are possible. Of course the data has to be still small compared to web scale petabytes of data or many hundreds of terabytes. But if you are talking about gigabytes of data which are traditional enterprise transaction system, rarely exceeds in memory database is a happy compromise where all lab queries and all kinds of bulk data processing as well can be fairly efficiently performed. However this is not the big data world, in the web. That doesn't quite work and, therefore, no Sequel databases have become quite popular. We will study some of the non sequel table databases and the concepts very shortly. As, as far in memory databases are concerned, many of the techniques that worked in traditional databases, simply carry over, except for the fact that because everything is in memory, one doesn't have to deal with additional complexity of some parts of the index being cached versus being on disc and things like that.