[MUSIC] Okay, but let's go back to this idea that depending on the structure, or rather the distribution of data in the space, you can actually get things looking closer to log N. So here's an example where we have fairly uniform distribution of points across the space. And for a given query, we've pruned a huge amount of the search space and just had to search two different boxes here. Whereas in that more contrived example where our data lied on a circle, and based on that center heuristic that we had for how to create the splits, we end up having to search almost every node in the tree. So this looks a lot closer to order N, so not much savings in this case. Okay, so like I said, to really offset the cost of creating this data structure, it's useful to use these things in cases where you're doing lots and lots of queries. So let's imagine asking for the nearest neighbor for every article in our corpus. So let's assume that we're doing N queries. And let's look at the complexity of this. So under a brute force search, remember per query it was order N complexity. So for N queries, we're going to have something that's order N squared. But on the other hand for kd-trees, it's going to be somewhere in the range of order N log N, if we get lucky. And if we get unlucky, it's going to be on the order of N squared, just like the brute force search. But just to note, this represents, potentially, very large savings for large N. Okay, so let's look at a little empirical study of the number of inspections we have as a function of the number of observations and then as a function of the dimension of the space that we're working in. So here, we're increasing the number of observations, N. And in this case we see roughly a log(N) trend. But on the other hand here, when we're looking at going to higher and higher dimensions, so just to be clear, we're saying that our observations xi live in some R to the d space, so that's what d represents. We get a trend where we have an exponential increase in the number of inspections we have to do, the number of distance calculations, or really nodes in the tree, to find our nearest neighbor. So for large d, This is really a regime where using kd-trees becomes pretty pointless. And we'll return to this to talk about it a little bit more. And also to think of an alternative approach that will be more helpful to us in high-dimensional scenarios. But before that, I just want to mention how we can think about using kd-trees to do k nearest neighbors search, instead of one nearest neighbor search. And the extension is very intuitive and straightforward. So instead of just maintaining the distance to the nearest neighbor, what we're going to do is we're going to maintain the distance to our kth nearest neighbor that we found so far. So in this example, this is our distance. We're looking at a two nearest neighbor task. And here, this is our distance to our second nearest neighbor found so far in this first search that we've done, the search of this first box here. And then just as before, based on the distance that we found so far, we'd backtrack, go and search other nodes. And prune them if the distance to that kth nearest neighbor is smaller than the distance to the bounding box of the node that we're looking at. So the algorithm is almost identical. The only thing you would need to change in code is instead of maintaining distance to the nearest neighbor, maintain distance to the kth nearest neighbor found so far. [MUSIC]