In this article, we will walk through various sorting techniques in python
Anything that can be compared can be sorted, this includes sequence of Numeric values such as integers, floats and chars (using ASCII values).
[1, 3, 4, 2] -> [1, 2, 3, 4]
Sequence of Ordinal values, such as Educational degrees, Cloth sizes, etc. Ordinal values are nonnumeric values that have an order among them, i.e. they can be relatively compared. For example.
["10th", "M.Sc", "12th", "B.Sc"] -> ["10th", "12th", "B.Sc", "M.Sc"]
We can sort simple iterables, such as List of ints, List of floats, etc.
[1, 3, 4, 2] -> [1, 2, 3, 4]
Complex elements such as List of Lists, Complex json, User-defined classes can also be sorted given that you specify a key(s) for comparison.
For instance, we can sort the following list of lists where key = first column.
V V
[[2, 1, 3], [[2, 1, 3],
[9, 8, 7], -> [4, 5, 6],
[4, 5, 6]] [9, 8, 7]]
or this list of coordinates by x-axis
[(1, 2), (0, 1), (2, 2)] -> [(0, 1), (1, 2), (2, 2)]
you will notice the two kinds of sort functions generally used in python sequence.sort()
and new_sequence = sorted(sequence)
.
sequence.sort()
This is an inplace sort,which means that the iterable is mutated/changed and the original order is lost.
.sort()
can also provides a minor performance boost.
sorted(sequence)
This returns a new iterable which contains the sorted iterable. The original order is still preserved in 'sequence'.
let's take an example
list1 = [1, 3, 2, 4]
list1.sort()
print(list1)
[1, 2, 3, 4]
sequence.sort()
does not return anything
print(list1.sort())
None
list1 = [1, 3, 2, 4]
new_list1 = sorted(list1)
print(list1, new_list1)
[1, 3, 2, 4] [1, 2, 3, 4]
As you can see the new_list1
is a sorted copy of list1
and the original order is preserved in list1
list_int_rev = [1, 3, 4, 5, 2]
print(sorted(list_int_rev, reverse=True))
[5, 4, 3, 2, 1]
Tuples are immutable so sequence.sort()
will not work.
The sorted()
function always returns a list and we will have to convert it back to the original iterable type.
tup = (1, 3, 4, 5, 2)
sorted_tup = sorted(tup)
sorted_tup, tuple(sorted_tup)
([1, 2, 3, 4, 5], (1, 2, 3, 4, 5))
Similar to tuples, strings are also immutable in python and sorted("random string")
will return a sorted list of characters. We will have to join the results afterward.
sorted_char_list = sorted("ioaue")
sorted_string = "".join(sorted_char_list)
sorted_char_list, sorted_string
(['a', 'e', 'i', 'o', 'u'], 'aeiou')
Characters are sorted based on their ASCII values, so 'Z' (90) comes before 'a' (97).
ord("Z"), ord("a")
(90, 97)
char_list = sorted("iOaUe")
sorted_string = "".join(char_list)
char_list, sorted_string
(['O', 'U', 'a', 'e', 'i'], 'OUaei')
So convert the characters to lowercase before comparing.
We will talk about the key
parameter in detail in the next example.
char_list = sorted("iOaUe", key=str.lower)
sorted_string = "".join(char_list)
char_list, sorted_string
(['a', 'e', 'i', 'O', 'U'], 'aeiOU')
To sort a list of Ordinal values, we need to assign a relative order to it, i.e map them to numeric values.
This allows us to compare the ordinal values based on the mapping.
ordinal_list = ["B.Sc", "10th", "M.Sc", "12th", "B.Sc", "10th", "12th"]
ordinal_map = {"10th": 1, "M.Sc": 4, "12th": 2, "B.Sc": 3}
We need to let the sorted function know how to use the mapping for comparison.
Use the key
argument to do this and pass a function that returns the mapping.
def ordinal_mapper(key: str) -> int:
return ordinal_map[key]
sorted(ordinal_list, key=ordinal_mapper)
['10th', '10th', '12th', '12th', 'B.Sc', 'B.Sc', 'M.Sc']
Let's talk about lambda expressions while we are at it.
In a nutshell, lambda
expressions are temporary use functions, they have two parts
keyword argument expression
lambda x, y, ... : x + y + ..
Here the expression part is returned.
We can rewrite ordinal_mapper
in this way.
sorted(ordinal_list, key=lambda key: ordinal_map[key])
['10th', '10th', '12th', '12th', 'B.Sc', 'B.Sc', 'M.Sc']
Here lambda key : ordinal_map[key]
, means that when an ordinal value is passed, we return the mapped numeric value for comparison.
something like this
lambda "10th": 1
lambda "B.Sc": 3
so "B.Sc" is greater than "10th" because 3 > 1 .
This key_function(
keeps None
values at either at the beginning or end based on the none_at_front
arg.
By, default this is False
. If you wish to pass the argument use a partial function.
from functools import partial
def key_function(key: str, none_at_front: bool = False) -> int | float:
if key is None:
return float("-inf") if none_at_front else float("inf")
return key
sorted([1, 2, 6, None, 4, 9, None], key=key_function)
[1, 2, 4, 6, 9, None, None]
key_function_end = partial(key_function, none_at_front=True)
sorted([1, 2, 6, None, 4, 9, None], key=key_function_end)
[None, None, 1, 2, 4, 6, 9]
Sorting complex objects requires one or multiple primitive keys.
For instance, assume a list of coordinates.
[[1, 2], [2, 5], [3, 3], [4, 2]]
How would you sort it ? x-axis or y-axis ?
This is why we need to define a sort key for complex iterables, here we will sort by y-axis so the second element will be the sort key.
# Sorting y-axis
coordinates = [[1, 2], [2, 5], [3, 3], [4, 2]]
sorted(coordinates, key=lambda x: x[1])
[[1, 2], [4, 2], [3, 3], [2, 5]]
Assume we have a tuple of products ("product name", discount, price, "availability")
.
We will sort by availability then highest discount and finally lowest price.
We are sorting by multiple keys, consider ('Moto G5', 0.3, 200, True)
and ('Asus Tuf', 0.3, 1000, True)
, both have the same discount so we will order them by price.
products = [
("IPhone", 0.1, 1000, False),
("Oneplus", 0.5, 300, True),
("Moto G5", 0.3, 200, True),
("Asus Tuf", 0.3, 1000, True),
("Samgsung Galaxy", 0.4, 800, False),
]
# -value, means sort by descending order
sorted(products, key=lambda tup: (-tup[3], -tup[1], tup[2]))
[('Oneplus', 0.5, 300, True),
('Moto G5', 0.3, 200, True),
('Asus Tuf', 0.3, 1000, True),
('Samgsung Galaxy', 0.4, 800, False),
('IPhone', 0.1, 1000, False)]
sorted(products, key=lambda tup: (-tup[3], (1 - tup[1]) * tup[2]))
[('Moto G5', 0.3, 200, True),
('Oneplus', 0.5, 300, True),
('Asus Tuf', 0.3, 1000, True),
('Samgsung Galaxy', 0.4, 800, False),
('IPhone', 0.1, 1000, False)]
Let's take another example here we have a tuple of student marksheets
("student name", marks in maths, marks in com sc)
marksheet = [
("Edward", 70, 70),
("James", 80, 60),
("Kenway", 40, 50),
("James", 70, 90)
]
sorted_marksheet = sorted(marksheet, key=lambda x: x[1] + x[2], reverse=True)
print(tuple(sorted_marksheet))
(('James', 70, 90), ('Edward', 70, 70), ('James', 80, 60), ('Kenway', 40, 50))
Both ('Edward', 70, 70)
and ('James', 80, 60)
got the same total marks and they are sorted based on their original ordering.
This is known as Stable Sort, by default every sorting method in python is stable sort.
Here we are sorting by multiple keys, ("James", 70, 90)
and ("Edward", 70, 70)
, both have the same marks in maths so we will compare them by com sc marks.
sorted_marksheet = sorted(marksheet, key=lambda x: (x[1], x[2]), reverse=True)
print(tuple(sorted_marksheet))
(('James', 80, 50), ('James', 70, 90), ('Edward', 70, 70), ('Kenway', 40, 50))
This one is a little different, considering we will be sorting either key or value.
Currently, there are no direct ways to sort a dict
, because no matter what iterable you pass into sorted it will return a list.
we will need to sort the key-value tuple and convert it back to dict
.
rainfall = {"April": 20, "May": 15, "June": 40, "July": 70}
print(rainfall.items())
dict_items([('April', 20), ('May', 15), ('June', 40), ('July', 70)])
Consider the rainfall dict, we will first get the tuple using rainfall.items()
. Now you can sort the tuple as shown previously.
use key=lambda x: x[0]
to sort by key and key=lambda x: x[1]
to sort by value.
sorted_rainfall_by_val = sorted(rainfall.items(), key=lambda x: x[1])
dict(sorted_rainfall_by_val)
{'May': 15, 'April': 20, 'June': 40, 'July': 70}
sorted_rainfall_month = sorted(rainfall.items(), key=lambda x: x[0])
dict(sorted_rainfall_month)
{'April': 20, 'July': 70, 'June': 40, 'May': 15}
note we are not changing the strings so this returns a list of strings unlike sorting a string which returns a list of chars
string_list = ["aab", "aaa", "a", "aa", "b"]
sorted(string_list)
['a', 'aa', 'aaa', 'aab', 'b']
string_list = ["aab", "ac", "aa", "bb"] # assumes len(string) > 2
sorted(string_list, key=lambda string: string[1])
['aab', 'aa', 'bb', 'ac']
import re
my_list = ["1. mango", "2. cherry", "3. banana", "4. watermelon", "5. apple"]
sorted_list = sorted(my_list, key=lambda s: re.sub(r"[^a-zA-Z]", "", s))
print(sorted_list)
['5. apple', '3. banana', '2. cherry', '1. mango', '4. watermelon']
gamedf = [
["Zelda", "NES", 1986, 6.5],
["Mario.", "NES", 1985, 40.24],
["Minecraft", "Multi-platform", 2011, 200.0],
["GTA V", "Multi-platform", 2013, 140.0],
]
# Sorting by total sales
sorted(gamedf, key=lambda row: row[3], reverse=True)
[['Minecraft', 'Multi-platform', 2011, 200.0],
['GTA V', 'Multi-platform', 2013, 140.0],
['Mario.', 'NES', 1985, 40.24],
['Zelda', 'NES', 1986, 6.5]]
llist = [[2, 1, 3], [9, 8, 7], [4, 5, 6]]
sorted(llist, key=lambda row: max(row), reverse=True)
[[9, 8, 7], [4, 5, 6], [2, 1, 3]]
Take a look at this JSON object, we will be sorting this for the next few examples
video_games = {
"games": [
{
"title": "The Legend of Zelda",
"platform": "NES",
"year": 1986,
"sales_millions": 6.5,
"details": {
"genres": ["Action-Adventure"],
"developer": "Nintendo",
"ratings": {"IGN": 9.0, "GameSpot": 8.5},
},
},
{
"title": "Super Mario Bros.",
"platform": "NES",
"year": 1985,
"sales_millions": 40.24,
"details": {
"genres": ["Platformer"],
"developer": "Nintendo",
"ratings": {"IGN": 9.5, "GameSpot": 9.0},
},
},
{
"title": "Minecraft",
"platform": "Multi-platform",
"year": 2011,
"sales_millions": 200,
"details": {
"genres": ["Sandbox", "Survival"],
"developer": "Mojang Studios",
"ratings": {"IGN": 9.0, "GameSpot": 9.0},
},
},
{
"title": "Grand Theft Auto V",
"platform": "Multi-platform",
"year": 2013,
"sales_millions": 140,
"details": {
"genres": ["Action", "Adventure"],
"developer": "Rockstar North",
"ratings": {"IGN": 10, "GameSpot": 9.5},
},
},
]
}
games = video_games["games"]
sorted(games, key=lambda x: x["year"], reverse=True)
[{'title': 'Grand Theft Auto V',
'platform': 'Multi-platform',
'year': 2013,
'sales_millions': 140,
'details': {'genres': ['Action', 'Adventure'],
'developer': 'Rockstar North',
'ratings': {'IGN': 10, 'GameSpot': 9.5}}},
{'title': 'Minecraft',
'platform': 'Multi-platform',
'year': 2011,
'sales_millions': 200,
'details': {'genres': ['Sandbox', 'Survival'],
'developer': 'Mojang Studios',
'ratings': {'IGN': 9.0, 'GameSpot': 9.0}}},
{'title': 'The Legend of Zelda',
'platform': 'NES',
'year': 1986,
'sales_millions': 6.5,
'details': {'genres': ['Action-Adventure'],
'developer': 'Nintendo',
'ratings': {'IGN': 9.0, 'GameSpot': 8.5}}},
{'title': 'Super Mario Bros.',
'platform': 'NES',
'year': 1985,
'sales_millions': 40.24,
'details': {'genres': ['Platformer'],
'developer': 'Nintendo',
'ratings': {'IGN': 9.5, 'GameSpot': 9.0}}}]
sorted(games, key=lambda x: len(x["details"]["genres"]), reverse=True)
[{'title': 'Minecraft',
'platform': 'Multi-platform',
'year': 2011,
'sales_millions': 200,
'details': {'genres': ['Sandbox', 'Survival'],
'developer': 'Mojang Studios',
'ratings': {'IGN': 9.0, 'GameSpot': 9.0}}},
{'title': 'Grand Theft Auto V',
'platform': 'Multi-platform',
'year': 2013,
'sales_millions': 140,
'details': {'genres': ['Action', 'Adventure'],
'developer': 'Rockstar North',
'ratings': {'IGN': 10, 'GameSpot': 9.5}}},
{'title': 'The Legend of Zelda',
'platform': 'NES',
'year': 1986,
'sales_millions': 6.5,
'details': {'genres': ['Action-Adventure'],
'developer': 'Nintendo',
'ratings': {'IGN': 9.0, 'GameSpot': 8.5}}},
{'title': 'Super Mario Bros.',
'platform': 'NES',
'year': 1985,
'sales_millions': 40.24,
'details': {'genres': ['Platformer'],
'developer': 'Nintendo',
'ratings': {'IGN': 9.5, 'GameSpot': 9.0}}}]
sorted(
games,
key=lambda x: sum(x["details"]["ratings"].values()) / len(x["details"]["ratings"]),
reverse=True,
)
[{'title': 'Grand Theft Auto V',
'platform': 'Multi-platform',
'year': 2013,
'sales_millions': 140,
'details': {'genres': ['Action', 'Adventure'],
'developer': 'Rockstar North',
'ratings': {'IGN': 10, 'GameSpot': 9.5}}},
{'title': 'Super Mario Bros.',
'platform': 'NES',
'year': 1985,
'sales_millions': 40.24,
'details': {'genres': ['Platformer'],
'developer': 'Nintendo',
'ratings': {'IGN': 9.5, 'GameSpot': 9.0}}},
{'title': 'Minecraft',
'platform': 'Multi-platform',
'year': 2011,
'sales_millions': 200,
'details': {'genres': ['Sandbox', 'Survival'],
'developer': 'Mojang Studios',
'ratings': {'IGN': 9.0, 'GameSpot': 9.0}}},
{'title': 'The Legend of Zelda',
'platform': 'NES',
'year': 1986,
'sales_millions': 6.5,
'details': {'genres': ['Action-Adventure'],
'developer': 'Nintendo',
'ratings': {'IGN': 9.0, 'GameSpot': 8.5}}}]
Take a look at the Marksheet class, it is similar to the marksheet tuple we used earlier.
class Marksheet:
def __init__(self, student, com_sc, maths, english):
self.student = student
self.com_sc = com_sc
self.maths = maths
self.english = english
def total_marks(self):
return self.com_sc + self.maths + self.english
def __repr__(self):
return str(self)
def __str__(self):
return f"{self.student}({self.com_sc}, {self.maths}, {self.english})"
marksheets = [
Marksheet("Edward", 70, 70, 90),
Marksheet("James", 80, 50, 76),
Marksheet("Kenway", 40, 50, 82),
Marksheet("James", 70, 90, 61),
]
sorted(marksheets, key=lambda x: x.total_marks())
[Kenway(40, 50, 82), James(80, 50, 76), James(70, 90, 61), Edward(70, 70, 90)]
Now using lambda to define the sort key can be rather cumbersome, esspecially if someone who is using the Marksheet class is unaware of its internals. In such situations, it is better to define a default sort key.
You need to define the less than dunder method __lt__
to define the comparison key, which also works as sort key.
class Marksheet:
def __lt__(self, other):
return self.total_marks() < other.total_marks()
Add some static methods to predefine some sort keys.
class Marksheet:
def __init__(self, student, com_sc, maths, english):
self.student = student
self.com_sc = com_sc
self.maths = maths
self.english = english
def total_marks(self):
return self.com_sc + self.maths + self.english
@staticmethod
def by_maths(marksheet):
return marksheet.maths
@staticmethod
def by_name(marksheet):
return marksheet.student
def __lt__(self, other):
return self.total_marks() < other.total_marks()
def __repr__(self):
return str(self)
def __str__(self):
return f"{self.student}({self.com_sc}, {self.maths}, {self.english})"
marksheets = [
Marksheet("Edward", 70, 70, 90),
Marksheet("James", 80, 50, 76),
Marksheet("Kenway", 40, 50, 82),
Marksheet("James", 70, 90, 61),
]
sorted(marksheets)
[Kenway(40, 50, 82), James(80, 50, 76), James(70, 90, 61), Edward(70, 70, 90)]
sorted(marksheets, key=Marksheet.by_maths)
[James(80, 50, 76), Kenway(40, 50, 82), Edward(70, 70, 90), James(70, 90, 61)]
sorted(marksheets, key=Marksheet.by_name)
[Edward(70, 70, 90), James(80, 50, 76), James(70, 90, 61), Kenway(40, 50, 82)]
In all the examples we have seen so far the sort key is hardcoded. To define the sort key at runtime use itemgetter and attrgetter.
from operator import itemgetter, attrgetter
string_list = ["ca", "bb", "aa"]
for sort_index in [0, 1]:
print(sorted(string_list, key=itemgetter(sort_index)))
['aa', 'bb', 'ca']
['ca', 'aa', 'bb']
for sort_key in ["maths", "com_sc"]:
print(sorted(marksheets, key=attrgetter(sort_key)))
[James(80, 50, 76), Kenway(40, 50, 82), Edward(70, 70, 90), James(70, 90, 61)]
[Kenway(40, 50, 82), Edward(70, 70, 90), James(70, 90, 61), James(80, 50, 76)]
If you are familiar with c/c++, we use a comparator function for sorting instead of a sort key.
A comparator function essentially returns a relative ranking for sorting in the form of ('<' -> -1, '==' -> 0, '>' -> 1)
.
In general you will not find yourself using a compartor over a sort key. However, there are cases where you might want to use a comparator instead.
def comp(a, b):
if a < b:
return -1
elif a == b:
return 0
else:
return 1
from functools import cmp_to_key
def comp(a, b):
if a == "young" and b == "old":
return -1
elif a == b:
return 0
else:
return 1
sorted(["old", "young", "old", "young"], key=cmp_to_key(comp))
['young', 'young', 'old', 'old']
This is one such case where you cannot use sort keys.
The code below returns a valid sequence of Rock-Paper-Scissor, where the (i+1)th element beats or draws with the ith element
from functools import cmp_to_key
def rock_paper_scissor(a, b):
if a == b:
return 0
elif (a == 'rock' and b == 'scissor') or \
(a == 'paper' and b == 'rock') or \
(a == 'scissor' and b == 'paper'):
return 1
else:
return -1
sorted(["rock", "scissor", "paper", "paper", "rock", "scissor"], key=cmp_to_key(rock_paper_scissor))
['paper', 'paper', 'scissor', 'scissor', 'rock', 'rock']
sorted(
["rock", "rock", "scissor", "paper", "paper", "scissor"],
key=cmp_to_key(rock_paper_scissor),
)
['scissor', 'scissor', 'rock', 'rock', 'paper', 'paper']
sorted(
["paper", "rock", "rock", "paper", "scissor", "scissor"],
key=cmp_to_key(rock_paper_scissor),
)
['rock', 'rock', 'paper', 'paper', 'scissor', 'scissor']
sorted(
["rock", "rock", "paper", "paper", "scissor", "scissor"],
key=cmp_to_key(rock_paper_scissor),
)
['rock', 'rock', 'paper', 'paper', 'scissor', 'scissor']
You can use a comparator to sort continuous values with respect to a threshold.
def thresold_comparator(a, b):
if abs(a - b) < 0.001:
return 0
elif a < b:
return -1
else:
return 1
sorted([2.1001, 1.00002, 1.0001, 2.1], key=cmp_to_key(thresold_comparator))
[1.00002, 1.0001, 2.1001, 2.1]
This can also be done using sort keys by truncating the threshold
So far all the examples we have seen so far are offline sort. Offline sort is when we sort fixed-sized iterables.
Consider the case where we are required to sort a constant stream of data, this is called online sort.
Generally, self-balancing trees are used for online sorting, however, they are nonlinear data structures and fundamentally different than arrays/lists.
import random
def stream(lim=500000):
while True:
num = random.randint(0, 100)
print("->", num)
yield num
gen = stream()
In this dummy example, we are sorting the list after every insertion, which is similar to online sort but with a very poor time complexity of $O(n^2log(n))$
offlist = []
offlist.append(next(gen))
print(offlist)
offlist.append(next(gen))
offlist.sort()
print(offlist)
offlist.append(next(gen))
offlist.sort()
print(offlist)
-> 66
[66]
-> 50
[50, 66]
-> 2
[2, 50, 66]
You can use insertion sort to create an online sorting data structure, but remember it has a very poor $O(N)$ insertion time.
import bisect
class OnlineList:
def __init__(self):
self._container = []
self._length = 0
def append(self, value):
bisect.insort(self._container, value)
def __getitem__(self, index):
return self._container[index]
def __str__(self):
return str(self._container)
Onlist = OnlineList()
Onlist.append(24)
Onlist.append(112)
Onlist.append(32)
print(Onlist)
Onlist.append(next(gen))
print(Onlist)
Onlist.append(next(gen))
print(Onlist)
Onlist.append(next(gen))
print(Onlist)
print(Onlist[0], Onlist[-1])
[24, 32, 112]
-> 72
[24, 32, 72, 112]
-> 25
[24, 25, 32, 72, 112]
-> 53
[24, 25, 32, 53, 72, 112]
24 112
import time
Onlist = OnlineList()
start_time = time.time()
LIM = 500001
for _ in range(1, LIM):
Onlist.append(random.randint(0, LIM))
end_time = time.time()
f"Time taken to append {LIM-1} inputs: {end_time - start_time} seconds"
'Time taken to append 500000 inputs: 40.95912146568298 seconds'
Truth be told this is a decent performance for a $O(n^2)$ algorithm.
You need to implement either the __next__
and __iter__
dunder methods or the __getitem__
dunder method to implement iterable behavior. sorted()
will use this iterator to sort your custom iterable.
In this example, we have created a Singly Linked List and implemented the__next__
dunder method.
class Node:
def __init__(self, value):
if isinstance(value, Node):
self.value = value.value
else:
self.value = value
self.next = None
def __lt__(self, other):
return self.value < other.value
def __str__(self):
return f"({self.value})"
def __repr__(self):
return str(self)
class LinkedList:
def __init__(self, sequence=None):
self.head = None
self.tail = None
self._length = 0
self._next = None
try:
iter(sequence)
for e in sequence:
self.push_back(e)
except TypeError:
pass
def __len__(self):
return self._length
def __iter__(self):
self._next = self.head
return self
def __next__(self):
if self._next is None:
raise StopIteration
else:
tmp = self._next
self._next = self._next.next
return tmp
def is_empty(self):
return self.head is None
def push_back(self, value):
self._length += 1
new_node = Node(value)
if self.is_empty():
self.head = new_node
self._next = new_node
else:
self.tail.next = new_node
self.tail = new_node
def push_front(self, value):
self._length += 1
new_node = Node(value)
if self.is_empty():
self.tail = new_node
self._next = new_node
else:
new_node.next = self.head
self.head = new_node
def __str__(self):
arr = []
tmp = self.head
while tmp is not None:
arr.append(str(tmp))
tmp = tmp.next
return " -> ".join(arr)
def __repr__(self):
return str(self)
llist = LinkedList()
llist.push_back(5)
llist.push_back(8)
llist.push_back(1)
liter = iter(llist)
print(llist)
(5) -> (8) -> (1)
next(liter)
(5)
liter = iter(llist)
LinkedList(sorted(llist)), sorted(llist)
((1) -> (5) -> (8), [(1), (5), (8)])
This will actually not sort the linked list but a list of nodes, and as stated earlier sorted()
will always return a list, so we will have to convert it back to a LinkedList.
You can use this process to make your custom iterables sortable using sorted()
That's it for today, we will further discuss online sorting in the next article.