Merging Sorted Lists of Similar Strings

Gene Myers

Merging T sorted lists into a single sorted result is a classic problem typically solved with an O(log T) priority-queue data structure the most basic of which is the simple heap.  We revisit this problem in the situation where the list elements are strings and the lists contain many identical or nearly identical elements.

By keeping certain auxiliary information with each heap node, we devise a method that performs no more comparisons than the sum of the lengths of all the strings, and another method that takes time proportional to the size of the resultant list, and so is especially efficient when many elements are equal.