Sorting Algorithms in Python — Bucket Sort

thelicato
3 min readMar 16, 2022

--

I got sad news for you guys: this will be the last post about sorting algorithms. I know that there are a lot of sorting algorithms I did not describe (for example the super-famous Timsort). Unfortunately, the enthusiasm about this kind of post quickly reached zero, so this is all.

After this looong introduction, we can go back to what you want: Bucket Sort!

How Bucket Sort works

As in the previous posts about sorting algorithms not based on comparison, we have to describe what are the hypotheses we need to fulfill to be able to use this algorithm and get the right result:

  • The elements must be evenly distributed on the interval [0,1) (if you think about it, it is a huge limitation)

The approach is quite easy: assume you want to order an array of dimension n; the algorithm divides the interval [0,1) in n sub-arrays. After that, the elements are distributed in those n sub-arrays. Then you just order every sub-array and merge everything back together. Here is a practical example:

As you may notice the trick is to just add those values to the appropriate sub-array (there are some missing values, but it is just to help you understand the process). How to order every sub-array? It may seem odd but Insertion Sort is used!. As I said in the first post of this series Insertion Sort is used with small arrays. Statistically, if the hypotheses are satisfied (elements evenly distributed) we should have sub-arrays with just a few elements in them and for this reason, Insertion Sort can be used.

First Python Implementation

The first time I implemented this algorithm in Python I got this:

At the beginning of the algorithm, I create an array of n empty arrays. Using a simple for loop I add the values in the right sub-array and finally, I order every one of those using Insertion Sort. In the end, I merge everything back into the C variable. As you may understand from the title of this section there is a Second Python Implementation. Let’s see why.

Second Python Implementation

The second implementation is very similar to the first except for a simple difference. After running the previous code I realized that it was very slow. So I tried to understand which one was the slow part and it came out that more than 95% of the time was spent on the concatenation of the sub-arrays (lines 15 to 16 in the previous snippet). Just for fun, I tried implementing a very stupid concatenation element by element and, to my surprise, it was extremely faster (even though it is very ugly to watch).

I just show you the code of the concatenation part:

Extremely ugly, do you agree?

Conclusions

As written in the introduction this is the last post about sorting algorithms. As usual, you can download the source code from my GitHub repo. Stay tuned for a new series of posts coming in!

--

--

thelicato
thelicato

Written by thelicato

F R A G I L E — Handle with care 👨‍💻 Security Researcher 🖖 Incurable nerd 🎞️ Movie/TV Show addicted

Responses (1)