Solution to the k-closest point in space problem

Adarsh Dubey
6 min readMar 26, 2022
Photo by Shubham Dhage on Unsplash

Well the title is pretty obvious to what I intend to do in this blog.. but if you have no context whatsoever as to what the context is, I will be giving the detail of the problem statement we will solve but if you are new to Data Sciences and Algorithms, I’d recommend go to the heaps post before this post.

So, let’s talk about the question we wanted to solve: so we are given our coordinate in 3-D space and we are given a stream of points around us. Now all the points are with respect to our position and we are always at origin i.e. (0,0,0). The distance between us and a point (x,y,z) is the euclidean distance(i.e. the length of straight line connecting me to the point). Now we are interested to know at any given time, what are the closest k points(k is +ve integer). We will receive a stream of points and we want to keep track of closest k points(assume like you are in a spaceship and the stream of points represent closest k- planets you can rest at).

I really hope the statement was clear but if it was not, I will be taking an example to tell you what that means.

Lets assume, we want the closest 2 points from a stream of points, i.e. k=2.

points = [(1,1,1),(-1,1,1),(2,1,1,)]

with this very easy example, I can tell you that the point of interest to us are (1,1,1) and (-1,1,1) because if we assume our position at (0,0,0); euclidean distances of above points are math.sqrt(1² + 1² + 1²), math.sqrt(-1² +1²+1²); math.sqrt(2²+1²+1²). “Why?” you ask? refer this wikipedia page to checkout how to get distance.(notice math.sqrt() is a function in every programming language in one form or the other).

Now, comes the crazy/interesting part.. How do we get the closest k points?

one option is to just calculate the distances of all the numbers and return the k min valued points every query but that, means you take the distance of all the points you get in the stream and store it and sort is to get the value .. but it’s a stream.. where will you keep all the points? what if I introduce float valued points? I can have infinite points between (0,0,0) and (1,0,0).. how do we handle the vastness of possible points?

Also can you imagine how slow your program be.. if we get points too fast and already have a few points in out bad, we are bound to miss quite a few points.(infact for N points in stream, your time complexity will be O(N²log(N)) and some to find the root because you will calculate distance and sort for all the N points. So.. if you had like 1000 points, and your system did like 10⁶ operations in one sec, your code will take approximately 7 sec for getting the answer. and if you got like one point per sec, you would have missed 7 points, if I add just one more 0 to the number of points, you’d be lost in space if you used this algorithm.. )..

So we can all agree the previous approach had some speed issues. But what else can we do? Well one thing I can do is instead of finding root, we can use the value (x²+y²+z²) for distance from me to point (x,y,z), now we get rid of some overhead in the code.. let’s make some good use of out DSA knowledge and try to throw some DS on the problem(now, since this blog is to solve a problem heaps problem , everyone knows what’s coming but play along for a bit because this is actually the same trick I’ll use in future DP, Recursion, backtracking, graphs and pretty much all good questions’ solution optimizations . Trust me this journey to discovery will be worth it).. so, We try to see the problem again and we see it needs us to keep track of the closest k points. now.. if we think of DS that can store and return the smallest value very fast would be our preferred weapon of choice.. So, we know BSTs store values in sorted form but if will still require us to store all the points and if the tree get’s skewed, we get screwed.

Again, we need the minimum k distances.. so how can we know very fast if the new point we see should belong to this elite k-values? Let’s say at some random point in time , we have a bag containing all the closest points. Now we need to check if the current point is smaller than the largest value in the bag. If it is, We remove the greatest point and add the current value, otherwise we already have out k-closest points. (Makes sense? why greatest of smallest? because if this point should be in our bag, it should replace some other point and the farthest point among the closest points need to leave to make room for a closer point.. also if it is smaller than the greatest point in our bag, it will be closer than the farthest point in the bag. I guess that should clear why we do what we do).

Now having seen all this requirements, what kind of bag will give me the max/min value in an instance? Well.. (drum roll for dramatic effects)…….. *Stacks!! 😅Nah.. just kidding, it’s HEAPS.. How to use them though?

Well as discussed above, our target is the greatest value among the closest points right.. so it make sense we need a Max-Heap kind of heap.. Below is the idea/algo to the solution of the problem..

so the below is the steps to do in your code.(I won’t provided the code as it’s just a language and you should be free to use whichever you want without any bias(in future I will provide python code though)):

  • initialize our max_heap. the heap inserts based on the (x²+y²+z²) value of the points.(check out the previous heap blog to checkout the logic for the same)
  • for every point in the stream of points, do following{

if max_heap.size()<k{

add the point to max_heap

}

else{

if dist(point)<dist(max_heap.root)//max_heap.root contains max value{

max_heap.pop_root(); // remove the max value of the mins

max_heap.push(point); // add the new point and heapify to maintain the heap structure

}

}

}

end the loop once points runout.

at the end of the above algo, we will have our closest k points.

Now, few final thoughts and analysis..

First things first, if N is small enough and k is large, you can just have a custom sorted function to sort the points based on the distance and save on space and it would be fast and pretty much all you can do..

But if k is a reasonable value and the stream is infinitely long, it will only take O(k) space and O(k*log(k)) time to get the answer.. imagine , from an infinite number of points, we were able to extract just the amount needed and it does not require to store anything extra or calculate anything extra.

Aren’t Heaps amazing?

Try finding the median of a stream now. Above discussion should give you enough push to think how to do it.. if it still makes no sense, feel free to reach-out to me on linkedin or comment on this post.

See you in another, post and I will try to cover some more crazy good `DSA`. Until then, take care and have fun (P.S. trying to get proper gear for youtube videos too😀✌ looking forward to your support there as well). Chao!!

--

--