Most Commonly Used Python Data Structures that are NOT built-in

This blog will introduce deque, bisect, defaultdict, Counter, OrderedDict, heapq. For every data structure, I’ll introduce their basic knowledge, example code and LeetCode problems. The four basic inbuilt data structures (Lists, Dictionary, Tuple, and Sets) can be learned from GeeksforGeeks.

Of course, this blog won’t cover all useful functions or data structures. I’ll update this article when I find more useful data structures.

The best way to understand these data structures is to read official documents. You can access these websites by click titles. However, to avoid people being overwhelmed by massive information, this article will only cover the most significant knowledge.

deque

“deque” is the abbreviation of “double-ended queue”[wikipedia]. The queue operations (push/pop) can be operated on both the head and tail of “deque” list. The time complexity of operations is O(1).

You should import collections library to use deque. The functions of deque are shown below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from collections import deque
queue = deque([iterable[, maxlen]]) # create deque object
queue.append(val) # Add x to the right side of the deque
queue.appendleft(val) # Add x to the left side of the deque
queue.clear() # Remove all elements from the deque leaving it with length 0
queue.count(val) # Count the number of deque elements equal to x
queue.copy() # Create a shallow copy of the deque
queue.extend(iterable) # Extend the right side of the deque
queue.extendleft(iterable) # Extend the left side of the deque
queue.insert(i, x) # Insert x into the deque at position i
queue.index(x[, start[, stop]]) # Return the position of x in the deque
queue.pop() # Remove and return an element from the right side of the deque
queue.popleft() # Remove and return an element from the left side of the deque
queue.reverse() # Reverse the elements of the deque in-place and then return None
queue.remove(val) # Remove the first occurrence of value
queue.rotate(n=1) # Rotate the deque n steps to the right

These functions are easy to understand from their name. If you learned Lists, all of the above functions can be implemented by List. So, why we need deque?

The document explains it in below.

Though list objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for pop(0) and insert(0, v) operations which change both the size and position of the underlying data representation.

It seems like there are some optimizations in deque for appendleft and popleft operations. Let’s do some tests!

1
2
3
4
5
6
7
8
9
# Test1: list.pop(0)
def test1():
l = list(range(1<<19))
tick = time.time()
while l:
l.pop(0)
print("f1 time: %.2f sec"%( time.time() - tick))
test1()
# output: f1 time: 128.42 sec
1
2
3
4
5
6
7
8
9
# Test2: deque.popleft()
def test2():
dq = deque(range(1<<19))
tick = time.time()
while dq:
dq.popleft()
print("f2 time: %.2f sec"%( time.time() - tick))
test2()
# output: f2 time: 0.05 sec

128.42 sec VS 0.05 sec !!!

Now, you know the difference. You can paractice it by LeetCode shown below.

Practice: LeetCode 102. Binary Tree Level Order Traversal

Counter

Counter is another class in the collections library. A Counter is a dict subclass for counting hashable objects. Therefore, you can call all functions in dict. Besides, specific functions for Counter are listed below.

1
2
3
4
5
6
from collections import Counter
cnt = Counter([iterable-or-mapping])
cnt.elements() # Return an iterator over elements repeating each as many times as its count
cnt.most_common([n]) # Return a list of the n most common elements and their counts from the most common to the least
cnt.subtract([iterable-or-mapping]) # Elements are subtracted from an iterable or from another mapping
cnt.update([iterable-or-mapping]) # Elements are counted from an iterable or added-in from another mapping

Practice: LeetCode 169. Majority Element ; LeetCode 242. Valid Anagram

OrderedDict

OrderedDict is a dictionary that remembers insertion order. Because it’s a dict subclass, all of the functions of dict can be used in OrderedDict either. Unlike the unsorted keys in dict, keys of OrderedDict is sorted by insertion order from early to late. Specific functions are shown below.

1
2
3
4
from collections import OrderedDict
odd = OrderedDict()
odd.popitem(last=True) # Return and Remove a pair in LIFO (FIFO) order if last is true (false)
odd.move_to_end(key, last=True) # Move an existing key to either end of an ordered dictionary

Practice: LeetCode 146. LRU Cache

defaultdict

defaultdict is a dict subclass that calls a factory function to supply missing values. dict class will raise KeyError exception if an unexistence key is used. To avoid such an error, we can use defaultdict. The example of the class is shown below.

1
2
from collections import defaultdict
d = defaultdict(lambda : value) # Value will be returned if key cannot be found in the dictionary

Practice: LeetCode 212. Word Search II

heapq

heapq“ is an implementation of the heap queue. The knowledge of heap can be found in the GeeksforGeeks and Wikipedia). Cited from GeeksforGeeks

Max-Heap (Min-Heap): In a Max-Heap (Min-Heap) the key present at the root node must be greatest (minimum) among the keys present at all of it’s children. The same property must be recursively true for all sub-trees in that Binary Tree.

The basic operations of heap include build empty heap, insert new element, get the top element, delete the top element. The time complexity of operations are O(n), O(log n), O(1), O(log n) respectively.

The implementation of heapq is shown below (cited from the official document).

This implementation uses arrays for which heap[k] <= heap[2*k+1] and heap[k] <= heap[2*k+2] for all k, counting elements from zero. For the sake of comparison, non-existing elements are considered to be infinite. The interesting property of a heap is that its smallest element is always the root, heap[0].

There are some examples for heapq functions.

1
2
3
4
5
6
7
8
9
10
11
import heapq
heap = [] # use a list to store elements
heapq.heappush(heap,item) # Push the value item onto the heap
heapq.heappop(heap) # Pop and return the smallest item from the heap
heapq.heappushpop(heap, item) # Push item on the heap, then pop and return the smallest item from the heap
item = heap[0] # Access the smallest item from the heap
heapq.heapify(x) # Transform list x into a heap, in-place, in linear time
item = heapq.heapreplace(heap,item) # Pop and return the smallest item from the heap, and also push the new item
heapq.merge(*iterables, key=None, reverse=False) # Merge multiple sorted inputs into a single sorted output
heapq.nlargest(n, iterable, key=None) # Return a list with the n largest elements
heapq.nsmallest(n, iterable, key=None) # Return a list with the n smallest elements

You can also learn the use case of heapq, like the implementation of priority queue and theory in the official document.

Practice: LeetCode 703. Kth Largest Element in a Stream

bisect

Citation from official document

This module provides support for maintaining a list in sorted order without having to sort the list after each insertion.

Bisection method is one of the most commonly used methods. For searching an element in the sorted array, the time complexity of bisection searching is O(log n). Even if the implementation of bisection is simple, implementing it everywhere is cumbersome and inelegant.

The bisect library provides a standard way to do it. Functions of bisect are shown below.

1
2
3
4
5
6
7
import bisect
bisect.bisect_left(a, x, lo=0, hi=len(a)) # Return the insertion point for x in a to maintain sorted order. If x is already present in a, return the left most position
bisect.bisect_right(a, x, lo=0, hi=len(a)) # Return the insertion point for x in a to maintain sorted order. If x is already present in a, return the right most position
bisect.bisect(a, x, lo=0, hi=len(a)) # same as bisect_right
bisect.insort_left(a, x, lo=0, hi=len(a)) # Insert x in a in sorted order
bisect.insort_right(a, x, lo=0, hi=len(a))
bisect.insort(a, x, lo=0, hi=len(a)) # same as insort_right

Some interesting use cases of bisect are introduced in the official document.

Reference

[1] python 在LeetCode中常用的数据结构

Recommended Posts