One of the first no sequel databases was Google's Big Table. And its, Hadoop equivalent, which is called HBase. Big Table is, in many ways, probably the origin of the term big data. This is one of the earliest, papers describing this was almost three or four years ago and it is Google's implementation on top of the distributed GFS file system, which led to the development of many no sequel databases in the first place. A big table, table is distributed across many different servers by row first. So that table is broken up into many tablets, each containing multiple rows. The tablets are called regions by the way in H base tablet is the GFS term. Each tablet in turn, is broken up into column families, each containing a set of columns from among those in that row. Each column family which contains. A particular set of columns spanning multiple rows, is stored in a chunk of the distributed file system. The GFS or HGFS. Each, such chunk is served by, a tablet server which takes care of three separate replicas as they are maintained in GFS as we have seen before, and the tablet servers together, form the entire big table. Of course in order to access any particular record, we need to know which tablet server that particular record and column that we are looking for falls in. And for that purpose there is a metadata table which keeps track of which tablet server a particular row, lies in. The metadata table itself is another big table and is therefore cons-, comprises of tablets or regions which are maintained on, a separate set of tablets servers. One particular tablet is the root tablet. So all our searches start from there. So to look for a particular row, and column. One looks up the tablet, root tablet. Which will tell us which child's tablet, among the metadata tablets contain the information about that particular row column. The child tablet will in turn tell us which tablet server to look for, and there we can pick up that chunk where the required row and column family is actually stored. Let's take a look at date, what data in a big table looks like, to get a feel for how it works. The row is indexed by some key, for example a transaction ID, when one is storing sales transactions. Different column families can have multiple columns within them. For example, we can have the location column family which contains a whole bunch of columns with to deal with location, and all those are stored together in single chunks. The sale column family might have other set of columns and similarly for products and many other column families would be there. Each of them is stored separately in different chunks servers. An important point about big table is that, the num, while the number of column families for a big table is fixed when you create it. The number of columns within the column family can vary. So you could, dynamically add new columns to a particular big table column family. Further, each parti, each column family for a particular row can have multiple entries. For example, you could have the region being the U.S. East Coast as well as the U.S. Northeast, and that's perfectly fine unlike a relational database where a particular row-column combination can have only one value. Additionally, each of these different values for a particular row. Column combination can be time stamped. So that for example, the, the location for this transaction might be the US East Coast today, but tomorrow one is free to change it by making it U.S. Northeast. But that's not done by updating the value US East Coast, but by inserting a new value with a new time stamp. So that one can always look up whichever version of the region, one wants to look for depending on, which time stamps one are, one is searching for. Someone can keep essential multiple snapshots of one's categorization of data together in the same big table, which is a, a tredmendous advantage compared to a traditional relational database. Because Big Table and HBase rely on the underlying distributed file system GFS or HDFS respectively. They also inherit some of the properties of these systems. In particular, large parallel reads and inserts are efficiently supported even simultaneously on the same table, unlike a traditional relational database. Similarly, reading all the rows for a small number of column families in a large table. Such as for an aggrega, aggregation query, like summing them all up. Is efficient in the manner similar to the column oriented databases. Of course one of the down sides of the Big Table or H Base architecture is that, there is really only one key, which is the primary key with which, data is distributed across different processors or different channels. If one wants to access data by any other column family. One can't really rely on any other index and the only way to do that is by reading all the data. So for queries, Big Table is not that efficient, and one needs to add additional structure to it, so as to enable efficient queries. In fact this is true for, any mechanism that one used to store data in sharded form. By sharding, one means storing different rows on different pieces of disk or different servers like, like tablet servers and distributed file systems. So, if you have sharded data, where data is distributed across machines by some key, and you want to access it using another key. You need to do something smarter. Which is essentially to create an index of some kind. For example let's take our, Big Table of records which indicate invoice transactions. For example which are to do with billing of some kind, or some products. So that your, our main table is the invoice table which has keys which is the transaction. Which might have different values for different time-stamps as we have discussed earlier. But now if you want to search this table by some other column such as by product, one would need to create index tables which would also be big tables but, their keys would be things like, product CD Achieve, product DBME, etcetera, which would tell us which key in the original table, this particular combination of product actually lies. Similarly one could create index tables for amounts, as well as for combinations between, say city and the status of the index, of the, of the transaction. It's also useful to create this index in sorted form. So that when you insert the records in a index table they actually get inserted in sorted order. As a result, making a query which asks us to find all transactions with amounts between two ranges, say between 50 and 90 becomes easier since all the values between such a range lie in the same contiguous piece of the index table for same amount. Once you've retrieved all these index values, one knows which keys to access and one then can directly access them using the original Big Table or H Base Table. One example where exactly such a structure is likely used is Google App Engine's Data Store which is, probably based on big table as most have speculated and uses indexes in exactly this way to query the big table efficiantely