Heaps, the python way

heap

Hello and welcome to another blog(which was supposed to be a vlog but well.. another time, another topic!), where I would like to discuss an amazing data structure(Heaps, as is evident from title), and see an implementation in python. Now the thing with heaps is that, they are easy to implement on your own, as per your own requirements and are quite powerful. I would be taking an example problem to show how custom heap implementation can be awesome.

What Is HEAP:

Lets, first talk about what data structure is and what heap is. If you are a pro at the definition and want to just focus on the implementations, feel free to skip this section(I won’t feel bad, I promise. As long as you enjoy the content and clap for the post, that is 😬).

So, a data structure is a data organisation, management, and storage format that enables efficient access and modification.<yes, it was copied from wikipedia>. So what does that mean? Well, a data structure helps you to store data and apply some functions to the data. Every programming language provides some out of the box data structures for you to work with, like array, maps etc. etc. but that is not always sufficient and sometimes, to create an application that can handle scale requires some modifications and special operations on data that are also fast. This requires the devs to come-up with some ‘more applicable to my purpose data structures’. Some common examples are trees, heaps, graphs .. these can be internally implemented using the trivial data structures but also provide some more operations that you can do in the data, and have a more optimised handling of data. For example, a Binary search tree can have a good use case if you want to make many searches very fast in a huge amount of data(stay tuned for “Trees are awesome” blog for a full analysis), also traversing it in-order, will give you values in sorted order(again, for full explanation, stay tuned for the “Trees are awesome” blog).

Heaps, are a special type of data structure that can give you minimum or maximum (depending on the type of heap) value of a collection of values in O(1) time. If you are not familiar with Big-O notations, just know that O(1) is the best you can do. It means that no matter the number of data points (10¹¹, 10¹², …) it will take a constant amount of time to tell you the minimum or maximum of those. This is very useful if you want to find , let’s say k-largest numbers from a stream of numbers(We will do something similar). Heaps are actually complete binary trees(that means, until the second last level, it is filled completely, and the last level is filled from left to right). This allows us to use an array to implement a Heap. Heaps are structurally hierarchical but implemented linearly (using arrays, as complete binary trees). This is some really cool stuff. The array can be transmitted as is and anyone who receives it can build tree form of the heap as it was intended to be. The array can also be seen as ‘level-order traversal’ of the complete binary tree (which has heap property, in our case, but just wanted to give you some more ideas).

We can also see that for a heap(array) for element at i-th index, it’s children are at 2*i<left child> and 2*i + 1<right child> for 1-based indexing (2*i + 1<left_child> and 2*i + 2<right_child> for 0 based index), and the parent of i-th node is at i//2 ((i-1)//2 for 0 based indexing), where //=>integer devision.

A Heap can be a Min-Heap for which the root (i.e. index 0 in the array) will contain the min value and throughout the tree, the paren is always lesser than both of the children(and hence all the values in that subtree, if you think about it), and Max-Heap for which each parent is greater than both of the children throughout the tree and the root (i.e. the first element in the array) contains the max value.

One thing to note here is that there is no relation between the children of a node, just between parent node and it’s child node. For example,

in the above heap, the ‘red’ node is a relation with ‘green’ node(‘red’>’green’) and ‘red’ has a relation with ‘blue’ node(‘red’> ‘blue’), but ‘green’ and ‘blue’ have no relation (‘blue’ can be > or < {or == if implementation demands so} ‘green’). There are a lot of things about heap that would make this blog very long, so I hope to explain the further theory with some practical work..

Implementing Heaps in Python:

We have had a good talk about this beautiful data structure that seems to be magical. So let’s dive into it’s implementation and demystify this data structure and master it’s powers..

I will be using python3 for the implementation but the concepts are same and can be easily translated to any other language you like.

To begin the implementation, let’s first discuss what the code should be able to do:

  1. It should be able to take in a values in form of array and convert it into a min/max heap(we will use heapify method to do this in linear time)
  2. it should be able to take in a new data value and put it to the right place in the heap(we will implement insert method which inserts to heap in O(log(N)) time, where N is the number of elements in heap)
  3. it should be able to tell you the min/max value in O(1) time(this will be just one return statement)
  4. it should be able to delete the min/max value(we will implement a delete method that does this in O(log(N)) time)

We will be implementing two classes for this demo, one will be for min_heap and the other to demo a max_heap

MinHeap:

  • the following implementation is assuming data to be int, but in the problem solving part, I will be showing it actually can be anything. The major purpose of this section is to deliver the concepts and basics.
  • Heapify: heapify converts an array of values to a min heap. point to know about heapify is that it builds the heap from top towards bottom. another fact is that a single node is always a heap(as a single element is the min/max of all the 1 values(mild obviousness here)). Heapify will work only when both the sub trees are heaps. We start at the last parent(which will be a N//2. can you see why? think about it..) and move upwards till root.
  • Insertion: Insertion will be a simple function that takes in the new value at the leaf and pushes it up the heap till it reaches a point where heap is maintained.
  • Deletion: Delete will be a fun one. It uses the fact that removing the leaf(last node) from heap will be the most inconsequential, hence, we swap the root with the last node and push the root value down to maintain heap after removing the last val(which is now what root held). If it sounds confusing, I hope the code will make it clear.

In the above code, calling the minHeap class with an array value will convert the array to a min heap and store it as such. You might also see that I am raising exceptions here and there, it is just to sound fancy, you van just return some messages as well. Also the reason I used images of code and not as a text is because this is just to show the concept, We will the heaps in the question as per our requirement so just focus on the concepts and ideas for now.

MaxHeap:

Now, if the above code made sense to you and you got the concepts I would encourage you to implement the MaxHeap class on your own, it just needs some flipping of the checks. If you are still having trouble understanding what max heap will do, worry not coz I have a ‘for your reference’ code ready for max heap.

Above disclaimer holds for this part of code as well.. focus on the concepts.

Also if you want, the codes are in this git repository .

Now, as promised, let’s see how heaps can be used to solve a seemingly hard problem.

Using Heap for a fun problem:

Problem Statement: Let’s say you are receiving a stream of 3-D points (in the form [x,y,z] ). We are given a point P0 = (x0,y0,z0) and in that stream, we want to keep track of k points, closest to the point P0. {the distance is euclidean i.e. distance(P1 and P2) = sqrt((x1-x2)² + (y1-y2)² + (z1-z2)²)}

Now since this is a stream of points, there can be infinitely many numbers so you can not store the distances somewhere and give the k smallest distances by sorting or otherwise..

we are assuming k is a reasonable value and not some ridiculously large value like 10²⁰ or like that. Although even if it is, there are ways to handle it but for our purpose, let us assume 1≤k≤10⁶ (which is like 12MB (3*4(bytes per int)*10⁶ = 12*10⁶ bytes ~ 12MB)worth of points approx which is quite a lot of points).

Let’s do some analysis and have fun with the problem..

let’s say we are receiving points as follows:

Points = [[1,2,3],[3,4,5],[-1,1,4],[0,0,1],[1,2,3]…]

P0 = [0,0,0] . i.e. k points closest to origin

take k=3 for this example

we can assume that we have a get_next_point() function that gives the next point in the stream.

Now, things to think here, 1. How will you deal with duplicates? 2. Do you really need to find the square root to get the closest points(we are not asked the distances of the points)? 3. How will heap help you get the points. notice that we just want k closest points, order is not important.

This blog is getting too long so probably will have a part 2 where I will discuss the solution to this problem and the thought process that goes into it. Until then take care, enjoy and try to think of a solution and I will see you in the next part (I promise it will be published real soon and link will be updated -> here).

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store