Tricky Sorting
Radix
Radix sort is the idea that you can sort numbers, or whatever you are sorting, by looking at portions of the items rather than the whole thing at once. There are two types of radix sort, each having particular use cases.
LSD- not the drug one, least significant digit. This is
typically more useful for numbers. You start at the end of
everything you are sorting, putting them in order. Like so:
941 —> 1 941 —> 4| 1 465 —> 4| 65
573 —> 3 465 ->6| 5 573—> 5| 73
465—> 5 573 —> 7| 3 941 —> 9| 41
This is advantageous when you are comparing numbers of different sizes (1, 435, etc.) There is another way to do Radix, MSD or most significant digit.
Most significant digit is actually typically used for words,
since radix sort sorts everything by value, and then
lexicographically. What that means is this:
a —> ab —> abc —>b
This is because b is greater in value than nothing, and c is
greater in value than nothing. But, because of the beginning
letter (a), they still come before b. You don’t want to sort
numbers lexicographically though, since it would end up like
this:
1 —> 10 —> 2
For visual learners
Heap sort
Complex Data Structures
Stop If you don’t know what a heap is, go to the data structures
page. Continue
A brief refresher (since I am sure NO ONE will not know what a heap is, but everyone likes a nice reminder). Heaps are semi-sorted piles of numbers. The are either max heaps (biggest values on top) or min heaps (smaller values on top). There is not guarantee of sorting, so just because you have 100 at the top of a heap, doesn’t mean 99 comes next. In fact, the remaining numbers in a heap could be all under the value of 10. It just means 100, is on top, and is the largest value (max heap).
Again, because it matters, overtime you add a value to a heap, you have to adjust the heap until it is in the right spot. That, is what we are about to talk about.
Since heaps are always numbers, there isn’t really a analogous way to talk about them. In a max heap, the highest value is on top, the two values below this value (the children nodes) are lower than the top value, and are greater than their children, and so on. This means, that as you go further down, you are going to find lower values. So, what if you want to add a value to a max heap? Easy, you add the value to to the top, pushing the current top value to down to the right or left.
Now, you sort the heap, which looks like this. You compare the
top value to its two children. If it is larger, you are done.
If not, you pick the greater of the two children, put it on top.
You then compare the new spot where the node is, against its two
new children. You continue to do this, swapping the values,
until everything is in the proper place.
For visual learners
Min heaps work the same way, just with minimum values. You can also add values to the bottom, and reverse it. Now, how long does this process take? Well in Big O, n log n.
Dijikstra's
Let’s talk about maps. There are so many companies who make maps that we could talk about: Tom Tom, UberNav, Apple....
Alright, enough jokes. Google does maps, and unlike some they are quite good at it. How do they solve the issue of determining the shortest path from one place to another? Or the shortest time? So, what you do is calculate the distance to each point individually. Each point that you have yet to reach is infinity distance away. When you reach a new node, you replace the existing value, if the value you calculated to reach it is shorter.
Here is how it would work.
Let’s say you started at point A, and you needed to get to point
New York, because that punk Chris owes you money.
You could go
from:
A to B to New York
or
A to C to B to New York.
Here is how you would calculate the distance. First, one step.
From A to B or A to C. If the distance from A to B is 10, and
the distance from A to C is 5, then you put in those values.
Shortest distance from A to B goes from infinity, to 10, and
from A to C goes to 5. Now, the next step. From B to NY is 48.
From C to B is 3.
For visual learners
The shortest distance to NY is set to 58, the shortest distance to B goes to 8. Now, the last step goes from B to NY, which goes from 58 to 56. This is Dijikstra’s algorithm. There are different variants of it. Some forms only calculate the distance between two points. The one I just explained assumed A was the source, and then calculates the distance to every node. Updating as it goes along.
A* (A Star, just like you aren't)
A* is an informed search algorithm for path-finding. It uses the advantages of Dijkstra’s (finding the shortest path) and Best First Search. It does this by using a guide to determine the best next point (based on the current/starting point) and the next best point (based on the goal end point).
Similar to Dijikstra’s, we could imagine that we must compute the lowest distance between two nodes. But instead of traveling through all the nodes, we take guesses at how to best reach our destination. We must know our starting position, and our ending position, which we do, A —> New York. Now, we calculate the distance between our points, and the points distance from New York. And try to take the best path based on that.
You can see the assumptions this makes, we have to know which way is closer to NY, but can you see the issues. What if the next spot is closer, but there are obstructions which inhibit the travel from the closer point?
Let’s say from A to B or A to C, B is closer to New York, but you
have to drive around some mountains, making the overall distance
longer. A star is efficient at calculating the lowest distance
between two points, but it requires some knowledge (distance
between nodes) and has some dangerous assumptions.
For visual learners