Big O Notation

Word count: < Retrieving word count... >

Time for me to type 2000 words on a random tech subject because I'm bored. This time, I'm going to talk the basics of Big O notation and why it (might) matter to you.

Big O is a tool we use to answer a simple question:

“About how bad is this thing going to get when the pile of stuff it has to cope with grows?”

The “thing” can be an algorithm, a database query, a React component, whatever. The “pile of stuff” might be the number of customers, photos, tweets, torrented movies you can't bear to delete even though you really didn't like watching them, whatever. Big O is your way to tell if your server will blow up.

Before we talk about it, though, you may have a few questions.

The Questioner

"Why not just time it?"

We could just run the code and look at a stopwatch afterwards, but:

Your phone, my MacBook, and the Azure us-east-2c instance all have different opinions about time. And latency. And consistency.
“Works fine on 20 items” is a lie you tell yourself to feel better about your impending scrum deadline. “Works fine on 200,000 items” is too. You just can't make a realistic estimate.
Unlike you, who presumably must have an 8-digit budget to be asking such questions (a. because you're not concerned with spinning up an entire AWS instance just to play craps with a test database, b. because then, presumably, you are a C-suite exec, which explains why you're reading this instead of listening to your subordinates), most people either cannot or do not have the ability to test suitable real-world scenarios. This could be hardware, software, or just plain time limitations.

Big O lets us forget the hardware for a second and just focus on how the algorithm itself behaves as input size grows to infinity, or, in regular terms, “how the thing behaves as we shove increasingly large piles of junk into it”. (Look, if you take offense to me calling your inputs 'junk', you have not met the average end user. I have seen entire Slack threads pasted inside Search columns.)

"How are we going to count to infinity?"

Patience and a really big abacus. Hope you're willing to work off the clock!

Just kidding. If you took first-year calculus, you can probably skip this part since you already know the answer. See, turns out a lot of algorithms behave like functions, and if we take the limit of an algorithm $\lim_{ n \to \infty } {O}(n)$, we can study how it behaves as it approaches the big infinity. This also lets us drop constant terms, small terms, and terms that don't increase quickly. For example, we can say stuff like $O(100k \cos^{2}{n} \cdot n) \approx O(n)$. If you're a big-words guy, this is called asymptotic behavior. But, as you may see later, quite a bit of the time we won't even need to go there- there is often a mathy way to get around needing to do calculus (eww!).

"But our input stack will never be infinity!"

Well, it works mighty well as a placeholder for x really big value. Let's say you're making an app. You test some filtering algorithm with 100 items, then 1,000, then you even stretch it out to 100,000 items. You declare this functional, push to production, and all is well. Then, a week later, your app blows up from TikTok or however all the trendy apps blow up these days, and you've got 80,000 users waiting at the front door! Turns out that algorithm of yours was $O(n^{2})$, and now you can't capitalize on your five minutes of fame because your server instance keeps blowing up every time a user so much as breathes on it. Cue stress, anxiety, stroke at 30, death, no more Taco Tuesdays. Don't let this happen to you! Test using Big O Notation™ today!

Anyways, as ridiculous as that sounds, it does happen. Just ask Cloudflare.

The Big O

In any case, let's talk about the notation itself.

It looks something like this: $$O(n)$$

This would be called "the Big $O$ of $n$". $n$ is the size of your input stack, and $O(n)$ represents the runtime of the algorithm, a.k.a how much time it takes to run the algorithm as $n$ grows. This one here is the simplest and a very common one, so common we have a name for it: linear time complexity. We call it that because the function is $n$. If we have an input stack of size $n=10$, we will need to execute $10$ steps in order to complete the task. $n=1000$, $1000$ steps, and so on. Quite a few of your favorite operations in any given programming language will be $O(n)$, like looping through an array:


   numbers = [1, 2, 3, 4, 5]
       for num in numbers:  # O(n): One operation per element
       print(num)

Common $O$ Values

$O(1)$ - “Fixed time.”
- Like flicking a light switch. Flick the switch, lights come on. Whether it's just a lamp bulb or the lights of an entire stadium, you wait the same micro-second. A backend developer's wet dream.
- Real things: looking up an array item by index, popping the last element off a stack.
$O(n)$ - “Linear time.”
- Explained earlier. With our light switch analogy, it's like if each lightbulb in the stadium had its own switch, and you had to run and flick all of them to turn the place's lights on.
- Real things: looping through an array once, traversing down the worst path in an unbalanced binary tree, counting occurrences of an item in an array
$O(\log{n})$ - “Logarithmic time”
- The good thing about this one is that while runtime grows without bound, O(∞) = ∞, it does it a lot slower than O(n), which is usually 'good-enough™'. With our light switch analogy, this is like... alright, I'm not using this analogy anymore. I think you get the idea.
- Real things: binary search, balanced trees.
$O(n \log{n})$ - “You'll see this one a lot if you sort things.”
- While it seems kind of random, it's actually pretty cool how this came to be. Check Appendix A below for the process: I recommend reading it, it is fascinating.
- Real things: merge sort, quicksort (average case), most database “ORDER BY” operations.
$O(n^2)$ - “Quadratic time.”
- New analogy. Everyone at a giant conference meeting has to shake everyone else's hand exactly once. By the time everyone's done, the first people will be dying from old age. You probably wouldn't want to deal with this one.
- Real things (careful!): nested loops over the same collection, naive matrix operations we wrote at 2 a.m.
$O(n^3)$ or $O(2^n)$ or worse - “Please don't.”
- Each additional guest at the conference invites entire universes of new operations. Your laptop fan becomes a jet engine about to lift off your desk. You really don't want to have to deal with this one.
- Real things (ABORT!!): travelling-salesperson brute force (for large N), cryptoanalysis without cunning tricks.

That's the basics of O notation. I've written a bit more on some questions and considerations you may have.

How do we pick the “n”?

Whichever thing is growing fastest and matters most. The $n$ is just a stand-in for the variable that measures the size or complexity of your input, and different problems have different definitions.

If you have a giant list of user IDs, $n$ is that length.
If it's a $2d$ grid, $h \times w$ might become your $n$, the number of cells you need to process.

Big O must consider which variable will cause resource problems first.

Asterisks and Other Considerations

Worst-case is what the textbooks always dissect, but practically speaking best-case or average-case are what is realistically going to come up.
Don't get used to throwing away lower-level terms, as if you are using tiny input sizes or data pools, those will have a huge impact on performance.
Time isn't the only factor; the space in memory you take up to run your algorithm can be just as, if not more important.
- An algorithm might run incredibly fast ($O(1)$ time! Yayyy!), but if it requires storing all our data in 10TB of RAM ($O(N)$ space where $N$ is huge), it's just as impractical as a slow algorithm. We actually use Big O in terms of memory usage too, helping us understand if our solution will fit within available resources.

The Blind Spot

Let me tell you a story.

I was testing a sorting algorithm once, it was this thing that would let you inject multiple typed desiderata into a filtering/sort system to semi-randomly seed your results according to multiple factors. Don't ask, it didn't pan out. Anyways, according to my calculations, it would be $O(n \log{n})$ in complexity: it worked kind of the same as heap sort, I guess. But, for some reason, when I was testing it, it didn't act like that at all. When I added $50$, $500$, $5000$ values, it would always take around $10$ seconds or so, only when I bumped the dummy load up to $100,000$ values did it start to take a little longer. What the hell, man, I was thinking, what was happening?

Digging into some logs, I started spotting a pattern. Every time I'd start a test, the system would quietly shoot a bunch of GET requests to a dummy API (some vestigial bit from early on where I thought I'd benchmark runs against a database in a Cloudflare D1 bucket). Each request waited for a timeout before continuing. Guess how long the timeout was. If the cloud didn't answer (and it never did, because I hadn't gotten around to setting it up yet), the code would hang and slow down the code.

That's the lesson.

Big O tells you how the algorithm scales, but it doesn't account for everything that can cause a slowdown- slow networks, cache misses, hardware limitations, anything other than the software itself. In this case, my sorting algorithm always had a 10-second weight dragging along on it. Only when $n$ was so big it generated a second, bigger weight (thanks, $O(n \log n)$! You did your job great :D) did the expected curve appear.

Appendix A

Let's break down how we get $n \log{ n}$ for a very common sorting algorithm- Merge Sort.

But first, a quick TL;DR of how merge sort works: We split the array into halves, sort each, then merges it back together.

Splitting

We are splitting the array until we get subarrays of length $1$. To do this, it turns out we need $\log_{2}{n}$ steps. But how?

We start with a size $n$ array.
We split it into two halves, each of size $\frac{n}{2}$. We took $1$ step to do this.
Then, we split those halves into haves, each of size $\frac{n}{4}$. This takes $2$ steps.
We keep going, and you will notice that the size of each subarray at step $k$ is $\frac{n}{2^{k}}$.
We stop when the size of each subarray is $1$. Substituting in:
- $\frac{n}{2^{k}} = 1 \longrightarrow n = 2^{k} \longrightarrow k = \log_{2}{n}$

Isn't that neat? (Explore recurrence trees for a more scientific explanation of this topic.)

Merging

Going up each level of the recursion, we need to merge the sorted subarrays. Merging two sorted subarrays of length $n$ takes $O(n)$ time.

Intuitively, at each level, we thus need to do two kinds of work: Splitting (on the 'way down'), and merging (on the 'way up'). We can combine these works by multiplying:

$$\boxed{\text{ Total Work } = O(n) \cdot O(\log{n}) = O(n \log{n})} $$

Note that the base $2$ on the logarithm was removed because in Big O notation, we concern ourselves only with the behavior of the function, and the difference between base $10$ and base $x$ is only a constant factor. We use the change of base formula as proof:

$\log_{x}{n}= \frac{\log_{10}n}{\log_{10}{x}} = \frac{1}{\log{x}}\log{n} \approx \log{n}$

Return home.