Who’s Space is it Anyway?

A year or so ago my brother was building a JPEG encoder/decoder. While I stayed at his place on the last night before moving across the country we discussed the concepts behind JPEG compression and he asked me if I had any intuition about these transformations which, to him, were obscure (he’s a comp sci major. Gun developer with no linear algebra). When I told him that it was just a change of basis to the 2d DCT basis functions he looked at me blankly and then asked me a series of questions including aren’t basis vectors, well, vectors? How can you dot product a picture with a vector? How is this intuitive?

We didn’t have time that night to talk about it so I promised him something intuitive instead of rigorous and this is what I wrote while I flew across the country the next day.

Inner product spaces

Inner product spaces are a type of vector space, so let’s start there. What does it mean to be a vector space? It’s a space for vectors so the better question is what does it mean for something to be a vector. Merely that it satisfies two properties.

Vector Addition. We can add two vectors to get another vector: $\mathbf{U} + \mathbf{V} = \mathbf{W}$ .
Scalar Multiplication. We can multiply a vector $\mathbf{v}$ by a scalar $a$ and get another vector $a\mathbf{v}$ .

As long as they also follow the usual properties (associativity, commutivitity, etc.) these vectors form a pretty sensible space. Importantly, functions satisfy these properties. You can add $\sin(x)$ and $\cos(x)$ to get $\sin(x) + \cos(x)$ and you can multiply by a scalar to get $a\sin(x) + a\cos(x)$ . Remember, at this point our normal geometric intuitions about vector spaces are a bit useless. Cross products, dot products, and other operations haven’t been defined.

In order to become more useful our inner product spaces are vector spaces that include an “inner product”. Think of the inner product as measuring the angle between two vectors. The dot product is the particular inner product used in our nice comfortable warm Euclidean spaces we know so well. So, what’s the inner product for a vector space of functions? Often we use one based on integration. Given two functions f and g:

$\langle f, g \rangle = \int_a^b f(x)g(x) dx$

The limits a and b are a choice particular to the space you’re constructing and $\langle f, g \rangle$ is our notation for an inner product. So, if our two functions are orthogonal then $\langle f, g\rangle = 0$ . If a function/vector is normalised $\langle f, f\rangle = 1$ . This part, at least, is decently simple and should correspond to what you remember of the dot product. All you have to do is trust that this definition of an inner product is a good one. It is. If you want some intuition justifying this remember that the dot product for two vectors is

$u \cdot v = u_1 v_1 + u_2 v_2 \ldots u_n v_n$

On the other hand, if write down the numerical form for the inner product of our functions f and g by replacing the integral with a numerical equivalent (remember integration is just a sum of infinitely many points spaced infinitesimally far apart) you get

$\langle f, g\rangle = (f(a) g(a) + f(a + dx) g(a + dx) + f(a + 2dx) g(a + 2dx) \ldots f(b)g(b))dx$

You see the correspondence I’m trying to highlight? When we talked about the idea of an inner product between two functions you didn’t seem too happy about the idea. Hopefully this makes a bit more sense.

Now, let’s build out our space a bit. Remember, we define a vector in an inner product space, such as a Euclidean space with the dot product, by how much of each basis vector they have in them. If we have a set of $n$ basis vectors $u$ then we define $v$ as

$v = a_1u_1 + a_2u_2 \ldots a_nu_n$

How do we find our coefficients $a$ ? How do we find how much of a basis vector $v$ has in it? For an orthonormal basis it’s simply the inner product $\langle v, u_n\rangle$ for each $n$ . So the above becomes

$v = \langle v, u_1\rangle u_1 + \ldots \langle v, u_n\rangle u_n$

Now we’ve got all the tools we need we can build up a basis set and work out some vectors in the associated space. I’m going to do a well known set called the Legendre polynomials (actually a variation because for historical reasons the Legendre polynomials aren’t normal) which is defined usually over the interval $[-1,1]$ though there are plenty of other orthogoal polynomials you can use and you’re certainly not limited to polynomials alone for basis functions. How are we going to make our polynomials orthogonal? With the normal Gram-Schmidt process. I’m not going to summarise the process here. Wikipedia has a nice animation and it even uses the same notation we’re using.

I’m also not going to do this symbolically. If you want to see the symbolic version of each of these functions, check the Legendre wiki page (which, as above, will be some constant normalisation factor out for each function). Numerical will be plenty to see some nice lovely graphs and it’s a bit more terse.

import numpy as np

n = 100
x = np.linspace(-1, 1, n)

def inner(f, g):
    return np.trapz(f*g, x)

def norm(f):
    return inner(f, f)

def gram_schmidt(f):
    for u in basis:
        f = f - inner(f, u)*u
    f /= norm(f)**0.5
    return f

basis = []
for i in range(5):
    v = x**i
    basis.append(gram_schmidt(v))
    
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
for u in basis:
    plt.plot(x, u)
plt.xlabel('x')
plt.ylabel('basis functions')
plt.title('Orthonormal basis functions for polynomials to degree 5')
plt.show()

png

Under our definition of the inner product, these functions are all orthonormal. In fact, just to prove it:

inner(basis[1], basis[4])

-2.0816681711721685e-17

Effectively 0. Now comes the good part. So far, this has been a fairly academic exercise, but now we have a basis set of 5 functions, we can start inserting other functions into our 5-dimensional function space. How do we do that? Same as above, find the coefficients $a_n = \langle v, u_n\rangle$ . So, as a simple example, consider the polynomial

$f(x) = 3x^3 -2x + 5$

fv = np.zeros((len(basis),1))
f = 3*x**3 - 2*x + 5

for i, u in enumerate(basis):
    fv[i] = inner(f, u)
print("Our vector for the function is\n" + str(fv))

Our vector for the function is
[[ 7.07106781e+00]
 [-1.62616306e-01]
 [ 1.11022302e-16]
 [ 6.42341881e-01]
 [ 1.55431223e-15]]

This makes sense. The entries for the constant, $x$ , and $x^3$ basis functions are non-zero because that’s what our function is made of. Let’s compare the original function and the result after applying the vector to the basis functions.

basisMat = np.mat(np.transpose(basis))
fVec = np.dot(basisMat,fv) #our vector applied to the basis

plt.plot(x, f)
plt.plot(x, fVec, '-.r')
plt.title("Original polynomial and vector applied to basis functions")
plt.xlabel('x')
plt.show()

png

Exactly overlapping (it’s a bit difficult to see the blue line behind the dashed red). This is to be expected. For a polynomial of degree 4 or less we can describe it perfectly in such a basis because any such polynomial can be represented as a linear combination of our basis functions. On the other hand, you asked last night if any function could be represented this way and the answer is no. There’s no way of representing $\sin(x)$ as a linear combination of our existing basis functions. We could do it if we had an infinite number of polynomial basis functions (the basis set would then be “complete”). Still, we can do the calculation and see how close the approximation is and compare it to the Maclaurin series for $\sin(x)$ to degree 5 (to make it fair).

import math

f = np.sin(x)

for i, u in enumerate(basis):
    fv[i] = inner(f, u)
fVec = np.dot(basisMat,fv)
print("The vector for sin(x) is\n" + str(fv))

fMac = x - x**3/6
    
plt.plot(x, f, label="sin(x)")
plt.plot(x, fVec, '-.r', label="Orthonormal basis approximation")
plt.plot(x, fMac, '--g', label="Maclaurin series")
plt.title("Original polynomial and vector applied to basis functions")
plt.xlabel('x')
plt.legend()
plt.show()

The vector for sin(x) is
[[-1.04083409e-17]
 [ 7.37749435e-01]
 [ 2.08166817e-17]
 [-3.37449260e-02]
 [ 1.38777878e-17]]

png

I’m not going to lie. I imagined this as a much more convincing demonstration of the optimality of our approach compared to a simple power series. It would be more convincing over a large range but a Maclaurin series behaves quite well over a small interval around 0 (that’s kinda the point). Let’s zoom in on the first tenth of the plot.

plt.plot(x[0:10], f[0:10], label="sin(x)")
plt.plot(x[0:10], fVec[0:10], '-.r', label="Orthonormal basis approximation")
plt.plot(x[0:10], fMac[0:10], '--g', label="Maclaurin series")
plt.title("The same but zoomed in to the left side")
plt.xlabel('x')
plt.legend()
plt.show()

png

How do this apply in any way to compression and JPEGs?

Now you can see it. This way of approximating functions can be incredibly accurate but it relies on an intelligent choice of basis functions. As a bonus, if you choose them right, the entries for most basis functions might be negligible and you can perform compression without losing much information. For example, in our vector for $\sin(x)$ we have three entries that are practically zero, we can discard them and we’d only need to store that the 2nd coordinate is 0.737 and the 4th coordinate is -0.037. Now I’m telling you things you already know from JPEG. The DCT basis set is not only fairly accurate, it’s also great for compression.

So, now we’re here let’s break down that JPEG DCT transform.

$G_{u,v} = \frac{1}{4}\alpha(u)\alpha(v)\sum_{x=0}^7\sum_{y=0}^7 g_{x,y} \cos\left(\frac{(2x+1)u\pi}{16}\right) \cos\left(\frac{(2y+1)u\pi}{16}\right)$

Let’s start at the cos terms. These are our basis vectors! In the Wikipedia article in the caption for the associated picture it talks about representing a block as a linear combination of 64 patterns. This should sound familiar. These are those patterns, nested in a double sum over 8 entries each (getting 64). A sum is how we perform integration numerically (I was using the trapezoidal method above which is pretty close to the same thing). So, I’m going to simplify the above equation with the understanding that both cos terms are a basis vector/function/pattern indexed by u and v. I’ll denote it as $e_{u,v}(x,y)$ .

$G_{u,v} = \frac{1}{4}\alpha(u)\alpha(v)\sum_{x=0}^7\sum_{y=0}^7 g_{x,y} e_{u,v}(x,y)$

Now, if we’re finding the coefficients/entries for a vector in this DCT frequency space/basis we expect to be finding the inner product of our original 8x8 block to the basis functions/patterns. This is exactly what we’re doing and it’s why $g_{x,y}$ is sitting there in the double sum. In fact, as above it’s completely implicit in the notation for $\langle f, g\rangle$ that it’s over some range $x$ . We defined the inner product as this integral which operates over $x$ . Well, seeing as we’re operating over $x$ and $y$ here, I’d say it’s similarly implicit in the same notation, so we can express this as:

$G_{u,v} = \frac{1}{4}\alpha(u)\alpha(v)\langle g, e_{u,v}\rangle$

So, all we have are some normalisation factors to go which are there for normality purposes. You’ve asked the question why $\alpha (u)$ takes on special values for $u = 0$ . It’s because we require that $\langle e_{0, v}, e_{0, v}\rangle = 1$ , but unlike when u is any other value, the basis pattern/function doesn’t oscillate. Intuitively, as a constant it spends less time close to 0 and therefore has a higher norm. That is, $\cos(0) = 1$ over the entire $x$ will have a larger norm than $\cos(nx) \leq 1$ over the entire range. Back to simplifying. Way above with our polynomial basis set, we required our basis vectors to be normal, so I’ll redefine $e_{u,v}$ to include these normality constants in the relevant entry. As a result we have

$G_{u,v} = \langle g, e_{u,v}\rangle$

The purpose of getting to this simple representation is to show the correspondence to how we were finding coefficients before with $a_n = \langle v, u_n\rangle$ . That’s all we’re doing! We want the coefficients for our function, the 64-dimensional pixel block $g_{x,y}$ with respect to a desirable two dimensional DCT basis set, so we take its inner product with a series of orthonormal DCT vectors/functions/patterns using the inner product. Those coefficients then characterise the block in this new space.

I think at this point you should be fairly comfortably back in the JPEG realm and out of inner product spaces. Hope the diversion wasn’t too painful.