Implementation of an Efficient Priority Queue for a UNIX Scheduler

CS 161Programming Project

Richard Bragg



The Data Structure

After some careful analysis of which data structure to use for this project, I chose to implement my scheduler using a binary heap stored as an array [CLRS Chapter 6].  While the actual data structure itself is identical to the binary heap data structure discussed in CLRS, I did change the operations slightly to increase the overall efficiency of the heap to be used as a real priority queue.

The first change is in the Extract-Max function - rather than extracting the maximum node from the heap, then fixing the remaining nodes to maintain the heap property, I extract the maximum node and simply mark the heap as invalid.  Any heap operations that are invoked on a "valid" heap proceed as they would normally.  However, if the Insert operation is invoked on an "invalid" heap, then the new node is inserted at the newly vacated top position on the heap and moved down the heap to a legal position using a call to Heapify.  For all commands besides Insert called on an "invalid" heap, the heap must first be "fixed" using the same procedure that Extract-Max would typically use, then the algorithms proceed as usual.  The large gain here is obviously found only when an Insert call follows an Extract-Max call.  Under the normal implementation, a call to Extract-Max would return the top element in the heap, swap the bottom element in the heap to the top position, and then call Heapify from the top position - an O(lg n) operation.  If a call to Insert came next, the element would be added at the bottom of the heap and moved up to a legal position using Increase-Key - another O(lg n) operation.  Clearly, in my implementation, we eliminate the need for one of these two O(lg n) operations in the situation in which Insert follows Extract-Max.  This algorithmic improvement is key in the UNIX scheduler, since quite often a node is extracted and then re-inserted.

The second change is also in the Extract-Max function, and is really a byproduct of the first change.  Rather than dynamically allocating memory and making a copy of the node during Extract-Max, I can simply pass a pointer to the first node in the "invalid" heap with the stipulation that if the client wants to keep the node, it must make its own copy before calling any other function on the priority queue.  Since the scheduler only needs to look at the values in the maximum node, it is never necessary to make this copy, and we gain the time that would have been required for allocating the memory, copying the node, and freeing the memory using this method.



The Operations

Interface Methods

Helper Methods

Running Time

Per-Operation Basis
Since any of the operations could be called on an "invalid" heap, a call to either Heapify or Increase-Key could be required for any of the 5 basic operations - either directly in the case of Insert, or indirectly via a call to Fix-Heap in the other 4 operations.  Since the asymptotic running times for both Heapify and Increase-Key are bounded from above by O(lg n) where n is the number of nodes in the queue [CLRS], all operations must have per-operation running times of at least O(lg n).  It makes more sense then, to use an amortized analysis here, since the Fix-Heap operation is only necessary immediately after a call to Extract-Max.

Amortized Analysis
The important insight to make here is that when Extract-Max is called, it must pay an extra O(lg n) to account for the fact that it leaves the heap in an "invalid" state, and it will need that O(lg n) to return the heap to a "valid" state.  Thus whenever any of the other operations are called, we can ignore the cost of fixing the heap since it was paid in advance by Extract-Max.

Since the Insert operation is now only O(1) in some cases, clearly this is more efficient over a sequence of operations than the standard implementation of binary heaps presented in CLRS.


Best Case Input

The best case input would be any sequence of operations in which each of the operations were only called under the following conditions:

Since each of the 5 operations can be run in O(1) actual time in the best case, a sequence of any k operations subject to these constraints could be run in  actual time in the best case.


Worst Case Input

The worst case input for each operation would involve the operations being called under the following conditions:

The worst case running time of a sequence of k operations would be a sequence of k calls to Insert where each consecutive node inserted was larger than the previous one.  Only calls to Insert would be included in this worst case, since that is the only operation that increases the size of the priority cue, and thus is the only way to make a single operation take more time than the call before it.  This worst case running time on k operations would take , with the derivation as follows:


Comparison to Other Data Structures

In choosing the best data structure to use from the list we were given, I made the following table to aid in my decision, where I include my solution for comparison.

Comparison of Candidate Data Structures

Insert(K,I) Delete(E) Increase-Key(E,K) Maximum() Extract-Max() Other considerations
Doubly Linked Lists O(n) average/worst case O(1) O(n) average/worst case O(1) O(1) data structure to beat for this assignment
Binary Heaps O(lg n) average/worst case O(lg n) average/worst case O(lg n) average/worst case O(1) O(lg n) average/worst case efficient, simple to implement and debug
Binomial Queues O(lg n) worst case O(lg n) average/worst case O(lg n) average/worst case O(1) O(lg n) average/worst case difficult to implement, main advantage  is the Union operation [CLRS]
Leftist Trees O(lg n) O(1) to mark the node only, deleted later O(lg n) O(1) O(lg n) 30% slower than standard heaps [Jones86]
Pagodas O(n) worst case for some operations, complicated implementation [Mhatre01]
Fibonacci Heaps O(1) amortized O(lg n) amortized O(1) amortized O(1) amortized  O(lg n) amortized best when Extract-Max and Delete are called infrequently [CLRS], better  for large queue sizes
Modified Binary Heaps
(My solution)
O(1) on invalid heap, O(lg n) on valid heap (amortized) O(lg n) amortized O(1) amortized O(1) amortized O(lg n) amortized simple but efficient, especially when Extract-Max is followed by Insert

The textbook [CLRS] and [Mhatre01] were of tremendous help in filling in this table.  While searching for algorithms, I was looking for a data structure that was relatively straightforward to implement and debug (since I was working by myself), but also performed well compared to the other alternatives.  After reviewing all the other data structures, I determined that the simple binary heaps represented the ideal tradeoff between performance and ease of implementation.  It also provided the benefit of only requiring the memory to be allocated once (provided we know the maximum heap size), and once it was created all of the operations could be performed in place.  Once I noticed I could even improve the performance of binary heaps for the purpose of the scheduler using the modifications I have presented here, I felt I had been able to create a solution that was on par with the best from the list above!



Real-Time Analysis

The development of the algorithm presented here was not developed all at once.  It took a series of improvements.  First I implemented a binary heap exactly as it appeared in CLRS (Rev. 1).  Then I noticed that I could modify the algorithm to defer fixing the heap until the call following Extract-Max, since so often it was following by an Insert (Rev. 2).  Finally, I noticed that it was unnecessary to copy the node for the client in Extract-Max, since it was only needed temporarily by the scheduler (Rev. 3).  In Rev. 3, I also optimized the code itself as much as possible, using pointer arithmetic instead of array accesses and other such speed-ups I have learned over the years.  In the end, I was able to develop an implementation that equaled or outperformed the supplied doubly linked list in every case, including the test file "test.10000.a", which was supposedly one of the best case performances of that implementation.  While my implementation required slightly more comparisons for that input file, the fact that no memory needed to be allocated in my implementation probably made up the difference.  The following table shows the evolution of my project (all tests were performed on elaine27 and averaged over 5 trials).  Rev. 3 is what I have submitted with this report, and clearly shows a significant improvement from both the doubly linked list, and just the basic binary heap as defined in CLRS.

Real-Time Performance Improvements of the Binary Heap Implementation
(Total Time (seconds) / Client Time (seconds) / Comparisons)

Test File
Doubly Linked List 
(Sample Implemenation)
Binary Heap
(Rev. 1)
Binary Heap
(Rev. 2)
Binary Heap
(Rev. 3)
test.100.a
1.128 / 0.102 / 967,642
0.427 / 0.107 / 234,228 
0.398 / 0.095 / 211,467
0.358 / 0.069 / 211,467
test.100.b
1.139 / 0.101 / 968,434
0.427 / 0.111 / 234,388
 0.396 / 0.099 / 211,308
0.359 / 0.084 / 211,308
test.500.a
31.17 / 2.61 / 27,787,093 
3.426 / 0.790 / 1,957,773
3.254 / 0.818 / 1,851,882
3.08 / 0.666 / 1,851,882
test.500.a.rev
31.01 / 2.64 / 27,781,669
 3.428 / 0.804 / 1,955,706
 3.304 / 0.782 / 1,850,521
3.088 / 0.610 / 1,850,521
test.1000.a
168.47 / 12.93 / 151,071,987
10.632 / 2.620 / 6,097,461
 10.406 / 2.550 / 5,855,713
9.776 / 1.916 / 5,855,713
test.10000.a
3.068 / 0.634 / 1,206,286
 5.332 / 1.456 / 2,228,595
 3.819 / 1.01 / 1,460,753
3.064 / 0.632 / 1,460,753



References

[CLRS] Cormen, T.H, Leiserson, C.E., Rivest, R.L., and Stein, C.  Introduction to Algorithms, Second Edition.  McGraw Hill, 2001.
[Jones86] Jones, D.W., An Empirical Comparison of Priority-Queue and Event-Set-Implementations. Communications of the ACM, 1986. 29(4): p. 300-311.
[Mhatre01] Mhatre, N., A Comparative Performance Analysis of Real-Time Priority Queues.  Master's Thesis, Fall 2001, Florida State University.