Discovering Taylor series the hard way

Using linear algebra to (almost) derive Taylor series
Aug 30, 2020Last updated: Aug 30, 2020

Taylor series is a familiar tool in analysis and often provides effective polynomial approximations to complicated differentiable functions. The need arises because polynomials are easier objects to manipulate. We say that the Taylor series of an infinitely differentiable function f(x)f(x) around x0x_0 can be represented as an infinite sum of polynomials as

f(x)=f(x0)+f′(x)1!(x−x0)+f′′(x)2!(x−x0)2+… .f(x) = f(x_0) + \frac{f^{\prime}(x)}{1!} (x - x_0) + \frac{f^{\prime\prime}(x)}{2!} (x - x_0)^2 + \dots.

In this notation, the number of ticks on ff represent the derivative. This means f′f^{\prime} represents the first derivative, f′′f^{\prime\prime} the second and so on. When used in practice, we resort to a truncated sum where we ignore the higher order derivatives. Vector inputs can also be handled. The derivation of this result is most easily seen for scalar inputs.

In this post, we want to derive Taylor polynomials using linear algebra. I must confess that this will not look elegant at all from a calculus standpoint. You've probably seen the most elegant and simple derivation already. I hope, however, you'll appreciate how beautiful the result is from the perspective of linear algebra. I call this "the hard way" not because it is conceptually hard but because it is a long-winded path to an otherwise straightforward concept. It is also important to note that what we'll arrive at is not Taylor series per se and is probably better characterized as a "polynomial" approximation.

Background

The key result that we'll use from linear algebra is Orthogonal Projections.

Notation and Terminology

  • Finite Dimension (n): By definition, a vector space with finite number of basis vectors, where a basis affords a unique way to represent every vector.
  • Vector Space, UU or VV: Finite-dimensional vector spaces over some field F\mathbf{F} 1 that follow the usual rules of closure under addition, scalar multiplication, additive identity, additive inverse, multiplicative identity and so on.
  • Vector Subspace: Subspaces are to spaces what subsets are to sets.
  • Linear Functional Ï•\phi: This is a special name for linear maps that go from VV to the field F\mathbf{F}, a scalar. Since every linear map has an associated matrix, it is equivalent to simply think of this as multiplying a 1×N1 \times N matrix with a N×1N \times 1 vector.
  • Inner Product ⟨v,u⟩\langle v, u \rangle: This is an abstract generalization of the dot product with the usual rules. The usual rules apply. I will note an additional property of conjugate symmetry ⟨v,u⟩=⟨u,v⟩‾\langle v, u \rangle = \overline{\langle u, v \rangle} which isn't needed for real fields (because conjugate is the scalar itself).
  • Norm ∥⋅∥\lVert \cdot \rVert: Norm gives us the notion of distance in vector spaces. It is defined as ∥v∥=⟨v,v⟩\lVert v \rVert = \sqrt{\langle v,v \rangle}, the square root of inner products. Our familiar Euclidean distance is also known as the L2L_2 norm.
  • Orthogonal Projection, PUxP_{U}x: The orthogonal projection of a vector xx onto the subspace U=span(x)U = span(x) is defined as PU x=⟨v,x⟩∥x∥xP_{U}~x = \frac{\langle v, x \rangle}{\lVert x \rVert} x. This should look familiar and intuition of "component" of a vector along a line usually works (although the subspace UU need not only be a line). Equivalently, we can write this in an orthonormal basis e1,…,eme_1, \dots, e_m of UU as
    PUx=⟨x,e1⟩e1+⋯+⟨x,em⟩emP_{U}x = \langle x, e_1 \rangle e_1 + \dots + \langle x, e_m \rangle e_m
    which invokes the idea of projecting the vector onto each basis vector where the coefficients defined by the inner products are unique. Since, these basis vectors are normal, the norm is unity - ∥ek∥=1 ∀ k∈{1,…,m}\lVert e_k \rVert = 1~\forall~k \in \{1, \dots, m \}.

Orthogonal projections

Suppose UU is a finite-dimensional subspace of VV, v∈Vv \in V, and u∈Uu \in U. Then

∥v−PUv∥≤∥v−u∥.\lVert v - P_{U}v \rVert \leq \lVert v - u \rVert.

Furthermore, the inequality is an equality if and only if u=PUvu = P_{U}v.

To prove this result, we note the following series of equations.

∥v−PUv∥≤∥v−PUv∥+∥PUv−u∥=∥v−u∥\lVert v - P_{U}v \rVert \leq \lVert v - P_{U}v \rVert + \lVert P_{U}v - u \rVert = \lVert v - u \rVert

The first equation follows simply from the positive definiteness of the norm ∥PUv−u∥\lVert P_{U}v - u \rVert and the second follows by the Pythagoras theorem (yes, it works in abstract spaces too!). The Pythagoras theorem is applicable because the two vectors are orthogonal - v−PUvv - P_{U}v intuitively amounts to removing all components of vv in the subspace UU 2 and PUv−uP_{U}v - u belongs to UU (by definition of projection and additive closure of subspaces). As a consequence of this derivation, we also see that the inequality above would be equal only when the norm of the second term is zero, implying u=PUvu = P_{U}v.

This result says that the shortest distance between a vector and a subspace is given by the vector's orthogonal projection onto the subspace. This result is akin to a two-dimensional result we've always been familiar with that the shortest distance between a point pp and a line xx is along another line ll perpendicular to xx that goes through pp.

Polynomial approximations as Orthogonal projections

The key message of post is this - Taylor series can be viewed as an orthogonal projection from the space of continuous functions to the subspace of polynomials 3.

We'll do this by example. Let C[−π,π]\mathcal{C}_{[-\pi, \pi]} be the space of continuous functions in the range [−π,π][-\pi, \pi] and P5\mathcal{P}_5 be the space of polynomials of degree at most 5. We would like to find the best polynomial approximation to the function f(x)=sin(x)f(x) = sin(x), which does belong to C[−π,π]\mathcal{C}_{[-\pi, \pi]}. The precise notion of "best" requires us to reformulate this problem in linear algebra speak.

By defining an inner product between two functions ff and gg in the space of continuous functions as ⟨f,g⟩=∫−ππf(x)g(x)dx\langle f, g \rangle = \int_{-\pi}^{\pi} f(x) g(x) dx, we can afford the notion of a norm. We desire the best approximation in the sense of minimizing this norm. For a v∈C[−π,π]v \in \mathcal{C}_{[-\pi,\pi]}, we seek u∈P5u \in \mathcal{P}_5 such that norm ∥v−u∥\lVert v - u \rVert is minimized. In our case vv is the sine function and from our previous result, we know that the minimum is achieved by the projection of vv onto the subspace P5\mathcal{P}_5. Hence, the polynomial we are after is

u=PP5v\begin{aligned} u &= P_{\mathcal{P}_5}v \end{aligned}

As we've noted before, projections can be written in an orthonormal basis of e1,…,e6e_1, \dots, e_6 of P5\mathcal{P}_5 as

PP5x=⟨x,e1⟩e1+⋯+⟨x,em⟩e6.P_{\mathcal{P}_5}x = \langle x, e_1 \rangle e_1 + \dots + \langle x, e_m \rangle e_6.

Note that I've preemptively chosen fixed the number 6 in the sequence above because the dimension (number of basis vectors) of P5\mathcal{P}_5 is 6. As one can verify, 1,x,x2,x3,x4,x51, x, x^2, x^3, x^4, x^5 form a basis of P5\mathcal{P}_5. These, however, are not orthonormal. Nevertheless, we can form an orthonormal basis from a known one using the Gram–Schmidt process. Be warned, there are ugly numbers ahead.

Orthonormal basis for polynomials

By the Gram-Schmidt orthonormalization, we note that for a given basis b1,…,bmb_1, \dots, b_m, an orthonormal basis is given by e1,…,eme_1, \dots, e_m as

u1=b1ek=uk∥uk∥uk=bk−∑i=1k−1Peibi, ∀ k≥2\begin{aligned} u_1 &= b_1 \\ e_k &= \frac{u_k}{\lVert u_k \rVert} \\ u_k &= b_k - \sum_{i=1}^{k-1} P_{e_i}b_i,~\forall~k \geq 2 \\ \end{aligned}

We've slightly overloaded the projection notation PeiP_{e_i} here for brevity. This actually is supposed to mean PUiP_{U_i} where UiU_i is the span of basis vector eie_i. Intuitively, this method is simply removing components of the basis vector that already have been covered by previous basis vectors and then simply normalizing each of them.

We need to solve more than a few integrals for the inner product calculations as a part of the projections but they are straightforward. You'll often find yourself computing symmetric integrals of odd polynomials which just amount to zero. I will note the complete orthonormal basis here for reference.

e1=12πe2=32π3xe3=458π5(x2−π23)e4=1758π7(x3−3π25x)e5=11025128π9(x4−67π2x2+335π4)e6=43659128π11(x5−10π29x3+5π421x).\begin{aligned} e_1 &= \sqrt{\frac{1}{2\pi}} \\ e_2 &= \sqrt{\frac{3}{2\pi^3}} x \\ e_3 &= \sqrt{\frac{45}{8\pi^5}} \left( x^2 - \frac{\pi^2}{3} \right) \\ e_4 &= \sqrt{\frac{175}{8\pi^7}} \left(x^3 - \frac{3\pi^2}{5}x \right) \\ e_5 &= \sqrt{\frac{11025}{128\pi^9}} \left(x^4 - \frac{6}{7}\pi^2 x^2 + \frac{3}{35} \pi^4 \right) \\ e_6 &= \sqrt{\frac{43659}{128\pi^{11}}} \left(x^5 - \frac{10\pi^2}{9} x^3 + \frac{5\pi^4}{21}x \right). \end{aligned}

Computing the orthogonal projection

With this derived orthonormal basis for P5\mathcal{P}_5, we are now in a position to find the optimal (in the sense of norm) projection of sin(x)sin(x) using the projection identity

PP5f=⟨f,e1⟩e1+⋯+⟨f,em⟩e6.P_{\mathcal{P}_5} f = \langle f, e_1 \rangle e_1 + \dots + \langle f, e_m \rangle e_6.

For sin(x)sin(x), we note that its inner product with e1,e3,e5e_1, e_3, e_5 is zero because these turn out to be symmetric integrals of odd functions around 0. We note the following results for f=sin(x)f = sin(x) sine function.

⟨f,e2⟩e2=3π2x⟨f,e4⟩e4=1758π6(4π25−12)(x3−3π25x)⟨f,e6⟩e6=4365964π10(863π4−403π2+120)(x5−10π29x3+5π421x)\begin{aligned} \langle f, e_2 \rangle e_2 &= \frac{3}{\pi^2}x \\ \langle f, e_4 \rangle e_4 &= \frac{175}{8\pi^6} \left(\frac{4\pi^2}{5} - 12 \right) \left(x^3 - \frac{3\pi^2}{5}x \right) \\ \langle f, e_6 \rangle e_6 &= \frac{43659}{64\pi^{10}} \left( \frac{8}{63} \pi^4 - \frac{40}{3} \pi^2 + 120 \right) \left(x^5 - \frac{10\pi^2}{9} x^3 + \frac{5\pi^4}{21}x \right) \\ \end{aligned}

Summing these up, we get

PP5f=0.987862x−0.155271x3+0.005643x5P_{\mathcal{P}_5} f = 0.987862 x - 0.155271 x^3 + 0.005643 x^5

Visual comparisons

We compare the exact function f(x)=sin(x)f(x) = sin(x) with ft(x)f_t(x) the Taylor polynomial approximation and fa(x)f_a(x), our orthogonal projection approximation.

f(x)=sin(x)ft(x)=x−x33!+x55!fa(x)=0.987862x−0.155271x3+0.005643x5\begin{aligned} f(x) &= sin(x) \\ f_t(x) &= x - \frac{x^3}{3!} + \frac{x^5}{5!} \\ f_a(x) &= 0.987862 x - 0.155271 x^3 + 0.005643 x^5 \end{aligned}

Here is some quick code to generate these plots using Altair.

import altair as alt
import pandas as pd
import numpy as np
 
x = np.arange(-np.pi, np.pi, 0.01)
 
f = np.sin
ft = lambda x: x - (x**3 / 6) + (x**5 / 120)
fa = lambda x: 0.987862 * x - 0.155271 * x**3 + 0.005643 * x**5
 
data = pd.DataFrame({ 'x': x, 'Original': f(x), 'Taylor': ft(x), 'Polynomial': fa(x) })
 
orig = alt.Chart(data).mark_line(color='green').encode(x='x', y='Original')
tayl = orig.mark_line(color='red').encode(x='x', y='Taylor')
poly = orig.mark_line(color='yellow').encode(x='x', y='Polynomial')
 
orig + tayl + poly
Comparing original (green) with Taylor (red) and Polynomial (yellow) approximations
Comparing original (green) with Taylor (red) and Polynomial (yellow) approximations

Our approximation is indeed very accurate as compared to the Taylor polynomial which demands higher order terms to do better. Note how the green and yellow curves stay very close and are virtually indistinguishable.

Summary

This was a fun way to discover polynomial approximations to functions and that too quite accuracte. Of course, I promise to never use this in real life.

Footnotes

  1. It is enough think of fields as just real or complex numbers for now. You could also think of apples if you don't like abstract concepts (although you are probably going to have trouble thinking of irrational apples). ↩

  2. Formally, we say v−PUvv - P_{U}v belongs to the orthogonal complement U⊥U^{\perp} of UU. U⊥U^{\perp} is the set of all vectors that are orthogonal to all vectors in UU. ↩

  3. Applying the definitions may help us see why set of all continuous functions is a vector space. To start with, sum of two continuous functions is continuous and multiplication with a scalar also keeps the function continuous. Further, polynomials are just a subset of continuous functions and also satisfy these two closure properties. Trust me that all other necessary properties are also satisfied. ↩

© 2023 Sanyam Kapoor