Taylor series is a familiar tool in analysis and often provides effective polynomial approximations to complicated differentiable functions. The need arises because polynomials are easier objects to manipulate. We say that the Taylor series of an infinitely differentiable function $f(x)$ around $x_0$ can be represented as an infinite sum of polynomials as
In this notation, the number of ticks on $f$ represent the derivative. This means $f^{\prime}$ represents the first derivative, $f^{\prime\prime}$ the second and so on. When used in practice, we resort to a truncated sum where we ignore the higher order derivatives. Vector inputs can also be handled. The derivation of this result is most easily seen for scalar inputs.
In this post, we want to derive Taylor polynomials using linear algebra. I must confess that this will not look elegant at all from a calculus standpoint. You've probably seen the most elegant and simple derivation already. I hope, however, you'll appreciate how beautiful the result is from the perspective of linear algebra. I call this "the hard way" not because it is conceptually hard but because it is a long-winded path to an otherwise straightforward concept. It is also important to note that what we'll arrive at is not Taylor series per se and is probably better characterized as a "polynomial" approximation.
The key result that we'll use from linear algebra is Orthogonal Projections.
Suppose $U$ is a finite-dimensional subspace of $V$, $v \in V$, and $u \in U$. Then
$\lVert v - P_{U}v \rVert \leq \lVert v - u \rVert.$Furthermore, the inequality is an equality if and only if $u = P_{U}v$.
To prove this result, we note the following series of equations.
The first equation follows simply from the positive definiteness of the norm $\lVert P_{U}v - u \rVert$ and the second follows by the Pythagoras theorem (yes, it works in abstract spaces too!). The Pythagoras theorem is applicable because the two vectors are orthogonal - $v - P_{U}v$ intuitively amounts to removing all components of $v$ in the subspace $U$ ^{2} and $P_{U}v - u$ belongs to $U$ (by definition of projection and additive closure of subspaces). As a consequence of this derivation, we also see that the inequality above would be equal only when the norm of the second term is zero, implying $u = P_{U}v$.
This result says that the shortest distance between a vector and a subspace is given by the vector's orthogonal projection onto the subspace. This result is akin to a two-dimensional result we've always been familiar with that the shortest distance between a point $p$ and a line $x$ is along another line $l$ perpendicular to $x$ that goes through $p$.
The key message of post is this - Taylor series can be viewed as an orthogonal projection from the space of continuous functions to the subspace of polynomials ^{3}.
We'll do this by example. Let $\mathcal{C}_{[-\pi, \pi]}$ be the space of continuous functions in the range $[-\pi, \pi]$ and $\mathcal{P}_5$ be the space of polynomials of degree at most 5. We would like to find the best polynomial approximation to the function $f(x) = sin(x)$, which does belong to $\mathcal{C}_{[-\pi, \pi]}$. The precise notion of "best" requires us to reformulate this problem in linear algebra speak.
By defining an inner product between two functions $f$ and $g$ in the space of continuous functions as $\langle f, g \rangle = \int_{-\pi}^{\pi} f(x) g(x) dx$, we can afford the notion of a norm. We desire the best approximation in the sense of minimizing this norm. For a $v \in \mathcal{C}_{[-\pi,\pi]}$, we seek $u \in \mathcal{P}_5$ such that norm $\lVert v - u \rVert$ is minimized. In our case $v$ is the sine function and from our previous result, we know that the minimum is achieved by the projection of $v$ onto the subspace $\mathcal{P}_5$. Hence, the polynomial we are after is
As we've noted before, projections can be written in an orthonormal basis of $e_1, \dots, e_6$ of $\mathcal{P}_5$ as
Note that I've preemptively chosen fixed the number 6 in the sequence above because the dimension (number of basis vectors) of $\mathcal{P}_5$ is 6. As one can verify, $1, x, x^2, x^3, x^4, x^5$ form a basis of $\mathcal{P}_5$. These, however, are not orthonormal. Nevertheless, we can form an orthonormal basis from a known one using the Gram–Schmidt process. Be warned, there are ugly numbers ahead.
By the Gram-Schmidt orthonormalization, we note that for a given basis $b_1, \dots, b_m$, an orthonormal basis is given by $e_1, \dots, e_m$ as
We've slightly overloaded the projection notation $P_{e_i}$ here for brevity. This actually is supposed to mean $P_{U_i}$ where $U_i$ is the span of basis vector $e_i$. Intuitively, this method is simply removing components of the basis vector that already have been covered by previous basis vectors and then simply normalizing each of them.
We need to solve more than a few integrals for the inner product calculations as a part of the projections but they are straightforward. You'll often find yourself computing symmetric integrals of odd polynomials which just amount to zero. I will note the complete orthonormal basis here for reference.
With this derived orthonormal basis for $\mathcal{P}_5$, we are now in a position to find the optimal (in the sense of norm) projection of $sin(x)$ using the projection identity
For $sin(x)$, we note that its inner product with $e_1, e_3, e_5$ is zero because these turn out to be symmetric integrals of odd functions around 0. We note the following results for $f = sin(x)$ sine function.
Summing these up, we get
We compare the exact function $f(x) = sin(x)$ with $f_t(x)$ the Taylor polynomial approximation and $f_a(x)$, our orthogonal projection approximation.
Here is some quick code to generate these plots using Altair.
import altair as alt
import pandas as pd
import numpy as np
x = np.arange(-np.pi, np.pi, 0.01)
f = np.sin
ft = lambda x: x - (x**3 / 6) + (x**5 / 120)
fa = lambda x: 0.987862 * x - 0.155271 * x**3 + 0.005643 * x**5
data = pd.DataFrame({ 'x': x, 'Original': f(x), 'Taylor': ft(x), 'Polynomial': fa(x) })
orig = alt.Chart(data).mark_line(color='green').encode(x='x', y='Original')
tayl = orig.mark_line(color='red').encode(x='x', y='Taylor')
poly = orig.mark_line(color='yellow').encode(x='x', y='Polynomial')
orig + tayl + poly
Our approximation is indeed very accurate as compared to the Taylor polynomial which demands higher order terms to do better. Note how the green and yellow curves stay very close and are virtually indistinguishable.
This was a fun way to discover polynomial approximations to functions and that too quite accuracte. Of course, I promise to never use this in real life.
It is enough think of fields as just real or complex numbers for now. You could also think of apples if you don't like abstract concepts (although you are probably going to have trouble thinking of irrational apples). ↩
Formally, we say $v - P_{U}v$ belongs to the orthogonal complement $U^{\perp}$ of $U$. $U^{\perp}$ is the set of all vectors that are orthogonal to all vectors in $U$. ↩
Applying the definitions may help us see why set of all continuous functions is a vector space. To start with, sum of two continuous functions is continuous and multiplication with a scalar also keeps the function continuous. Further, polynomials are just a subset of continuous functions and also satisfy these two closure properties. Trust me that all other necessary properties are also satisfied. ↩