Approximation Theory and Method part 2
Approximation operators
在前面的讨论中,我们得到了 best approximation 的一些性质. 但是实际上我们并不总是能有 best approximation 这么好的结果。那么我们能不能退而求其次,研究一下更具一般性的 approximation operators 呢?
We note that the operator $\boldsymbol X$ is defined to be a projection if the equation \(\boldsymbol{X}[\boldsymbol{X}(f)]=\boldsymbol{X}(f), \quad f \in \mathscr{B},\) is satisfied. Hence a sufficient condition for $X$ to be a projection is the equation \(X(a)=a, \quad a \in \mathscr{A}\) Most of the approximation methods that are considered in this book do satisfy condition (3.2), but an important exception is the Bernstein operator, which is discussed in Chapter 6. Sometimes $X(f)$ is written as $X f$.
The idea of a linear operator is also well known; namely, we define $X$ to be linear if the equation \(X(\lambda f)=\lambda \boldsymbol{X}(f)\)
holds for all $f \in \mathscr{B}$, where $\lambda$ is any real number, and if the equation \(\boldsymbol{X}(f+g)=\boldsymbol{X}(f)+\boldsymbol{X}(g)\) is obtained for all $f \in \mathscr{B}$ and for all $g \in \mathscr{B}$.
最好还是定义一下这些 approximation operator 的 norm.
Also we make frequent use of the norm of an approximation operator. The norm of $X$ is written as $\lVert X \rVert$, and it is the smallest real number such that the inequality \(\lVert \boldsymbol{X}(f) \rVert \leqslant\lVert \boldsymbol{X} \rVert\lVert f \rVert\) 或者可以这样写 \(\lVert \boldsymbol{X} \rVert = \sup _{ \lVert f \rVert \neq 0} \frac{\lVert \boldsymbol{X}(f) \rVert }{\lVert f \rVert}\) 这和我们定义矩阵时是类似的. 比如矩阵$A$ 的 norm: \(\lVert A \rVert = \sup _{ \lVert x \rVert \neq 0} \frac{\lVert Ax \rVert }{\lVert x \rVert}\) 关于 norm of approximation operator 的取值问题,可以考虑一个例子.
An example of an approximation operator that is useful because it is easy to apply is as follows. Let $\mathscr{B}$ be the space $\mathscr{C}[0,1]$ of real-valued functions that are continuous on $[0,1]$, and let $\mathscr{A}$ be the linear space $\mathscr{P}_1$ of all real polynomials of degree at most one. Then, in order that the calculation of an approximation to a function $f$ in $\mathscr{R}$ depends on only two function evaluations, we let $p$ be the polynomial in $\mathscr{A}$ that satisfies the interpolation conditions \(\left.\begin{array}{l} p(0)=f(0) \\ p(1)=f(1) \end{array}\right\rbrace\) Thus $p=X(f)$, where $X$ is a linear projection operator from $\mathscr{B}$ to $\mathscr{A}$. In order to define the norm of this operator we choose a norm for the space $\mathscr{C}[0,1]$. However, if the 2-norm \(\lVert f \rVert_2=\left\lbrace\int_0^1[f(x)]^2 \mathrm{~d} x\right\rbrace^{\frac{1}{2}}, \quad f \in \mathscr{C}[0,1],\) is used, we find that the operator $X$ is unbounded, because it is possible for $\lVert X f \rVert_2$ to be one when $\lVert f \rVert_2$ is arbitrarily small. It is therefore necessary to prefer the $\infty$-norm \(\lVert f \rVert_{\infty}=\max _{0 \leqslant x \leqslant 1}\lvert f(x) \rvert, \quad f \in \mathscr{C}[0,1]\) when considering approximation operators that are defined by interpolation conditions. In this case, because $p$ is in $\mathscr{P}_1$, equation (3.6) implies the inequality \(\begin{aligned} \lVert X(f) \rVert & =\lVert p \rVert \\ & =\max [\lvert p(0) \rvert,\lvert p(1) \rvert] \\ & =\max [\lvert f(0) \rvert,\lvert f(1) \rvert] \\ & \leqslant\lVert f \rVert, \quad f \in \mathscr{C}[0,1] . \end{aligned}\) 在这个例子中,我们的 approximation operator 可以理解为把 0 和 1 处,也就是函数的两端进行连线. 可以发现此时 2-norm 可以去到很大,而 inf-norm 是有上确界的.
Lebesgue constants
Theorem 3.1 Let $\mathscr{A}$ be a finite-dimensional linear subspace of a normed linear space $\mathscr{B}$, and let $X$ be a linear operator from $\mathscr{B}$ to $\mathscr{A}$ that satisfies the projection condition (3.2). For any $f$ in $\mathscr{B}$, let $d^*$ be the least distance \(d^*=\min _{a \in \mathbb{A}}\lVert f-a \rVert\) from $f$ to an element of $\mathscr{A}$. Then the error of the approximation $X(f)$ satisfies the bound \(\lVert f-X(f) \rVert \leqslant[1+\lVert X \rVert] d^*\) 在上面我们退而求其次,使用一个满足 projection 和 linearity 的 approximation operator 来进行逼近. 感觉这个逼近的结果和最优点有一段距离,在这里定理3.1 给出了一个上界: 肯定在最优附近的$[1+\lVert X \rVert]$范围内.
Polynomial approximations to differentiable functions
在后面我们会证明,对于任意一个函数$f \in \mathscr{C}[a, b]$,我们可以用多项式$p$进行逼近,并且可以让误差要多小有多小. \(\lVert f-p \rVert_{\infty} \leqslant \varepsilon\) 看起来这是一个很强,而且不错的结果.
Polynomial interpolation
所谓多项式插值,就是对多项式进行一个值的插()
比如有 $(n + 1)$ 组观测到的数据: $(x_i, y_i)$ for $i = 0,1,…,n$, 实际上背后的函数是$f$, 这个函数是什么函数都可以. 比如城市气温的函数, 我们想用多项式$p$进行逼近. 按照经验来看,比如我们有两个观测点,那我们能唯一确定一个 $p \in \mathscr P_1$, 如果有5个观测点,那用那我们能唯一确定一个 $p \in \mathscr P_4$.
关于$\mathscr P_n$, 我们知道$n$是多项式的最高次数. 比如 $p = 1+x$ 那么 $p \in \mathscr P _1$, $p = 1$ 那么 $p \in \mathscr P_0$
那多项式 $p = 0$ 是几次多项式呢?
可以考虑 $p = 0$ 的性质. 假设 $p \in \mathscr P_?$ 假设有个 $q \in \mathscr P_{4}$, $pq = 0$ 所以 $pq \in \mathscr P_{?}$.
所以感觉 $? = - \infty$ 蛮不错. 那就定义成 $p \in \mathscr P_{-\infty}$ .
希望在观测的地方,多项式的值就是实际函数的值. 那么这实际上就是一个解方程的问题: \(\left\lbrace\begin{array}{c} \sum c_i x_0^i=y_0 \\ \sum c_i x_1^i=y_1 \\ \vdots \\ \sum c_i x_n^i=y_n \end{array}\right.\) 想要知道这个方程组解的数量,实际上就是想知道 $\det(V)$, where $V$ is the Vandermonde matrix \(V=V\left(x_0, x_1, \cdots, x_m\right)=\left[\begin{array}{ccccc} 1 & x_0 & x_0^2 & \ldots & x_0^n \\ 1 & x_1 & x_1^2 & \ldots & x_1^n \\ 1 & x_2 & x_2^2 & \ldots & x_2^n \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_m & x_m^2 & \ldots & x_m^n \end{array}\right]\) The determinant of a square Vandermonde matrix is called a Vandermonde polynomial or Vandermonde determinant. Its value is the polynomial \(\operatorname{det}(V)=\prod_{0 \leq i<j \leq n}\left(x_j-x_i\right)\) 我们的假设是每个观测的点都不一样,所以 $\operatorname{det}(V) \neq 0$. 也就是有唯一解.
在引入 Lagrange interpolation 之前,其实这个思想和中国剩余定理很相似. 不妨先复习复习中国剩余定理.
有物不知其数,三三数之剩二,五五数之剩三,七七数之剩二。问物几何?
也就是求解同余方程组: \(\left\lbrace\begin{aligned} x & \equiv 2 \quad\left(\bmod 3\right) \\ x & \equiv 3 \quad\left(\bmod 5\right) \\ x & \equiv 2 \quad \left(\bmod 7\right) \end{aligned}\right.\) 我们想知道,解的
- existence
- uniqueness
- how to find
首先确定解的范围. 可以注意到,如果 $n_1$ 是一个解,那么 $n_2 = n_1 + 105$ 也是一个可行解. 那么只考虑 $R = [1, 105]$ 中的可行解就行了.
如果 $n_1,n_2 \in R$ 是两个不同的解, 那么$n_1 - n_2$ 一定被 3, 5, 7 整除,但是注意到 $-104 \leq n_1 - n_2 \leq 104$,这样的 $n_1,n_2$ 并不存在. 所以,如果有在 $R$ 中的解,这个解是唯一的.
关于存在性, 考虑问题集: \(\left\lbrace\begin{aligned} x & \equiv 0,1,2 \quad\left(\bmod 3\right) \\ x & \equiv 0,1,2,3,4 \quad\left(\bmod 5\right) \\ x & \equiv 0,1,2,3,4,5,6 \quad \left(\bmod 7\right) \end{aligned}\right.\) 意思是,在 $\mod{}$ 后面的数字确定的情况下,一共有$3\times 5 \times 7 = 105$ 个可能的问题. 可能的解有 $\lvert R \rvert = 105$ 个, 每个问题的解都是唯一的, 不同问题的解肯定不一样,所以每个问题都有唯一解.
最后考虑怎么找到解. 我们采取分治思想,把原问题分成互不相干的若干子问题,每个子问题都相对好求解. 最后把子问题的解合并起来就得到原问题的解.
我们首先求解每个子问题: \(\left\lbrace\begin{aligned} x & \equiv 1 \quad\left(\bmod 3\right) \\ x & \equiv 0 \quad\left(\bmod 5\right) \\ x & \equiv 0 \quad \left(\bmod 7\right) \end{aligned}\right. \quad \left\lbrace\begin{aligned} x & \equiv 0 \quad\left(\bmod 3\right) \\ x & \equiv 1 \quad\left(\bmod 5\right) \\ x & \equiv 0 \quad \left(\bmod 7\right) \end{aligned}\right. \quad \left\lbrace\begin{aligned} x & \equiv 0 \quad\left(\bmod 3\right) \\ x & \equiv 0 \quad\left(\bmod 5\right) \\ x & \equiv 1 \quad \left(\bmod 7\right) \end{aligned}\right.\) 通俗地讲就是求每一行式子的乘法逆元. 乘法逆元可以用 extended Euclidean algorithm 来求. 在这个问题中,上面三个子问题对应的解是 $70,21,15$. 这三个数字构成了问题集的解的线性基底,对于上述提到的 $105$ 个可能的问题,每个问题都是这个线性基底的线性组合. 比如在这里就是 $(2\cdot 70 + 3 \cdot 21 + 2\cdot 15) \mod 105$.
The Lagrange interpolation formula
回到这个问题本身. 类似于刚刚的思想, 我们希望找到一组多项式 $p_0, p_1, … ,p_n$ 一共 $n +1$ 个多项式,让每一个多项式 $p_k(x)$ 在「自己负责的」采样点 $x_k$ 处满足 $p(x_k) = 1$, 在其他地方 $p(x_i) = 0(i\neq k)$.
这样一来对于任意一个函数 $f$, 对于固定的采样点集合,我们都可以用上述的一组多项式基底来线性合成一个关于 $f$ 的多项式逼近.
For $k=0,1, \ldots, n$, let $l_k$ be the function \(l_k(x)=\prod_{\substack{j=0 \ j \neq k}}^n\left(x-x_j\right) /\left(x_k-x_j\right), \quad a \leqslant x \leqslant b .\) We note that $l_k \in \mathscr{P}n$ and that the equations \(l_k\left(x_i\right)=\delta_{k i}, \quad i=0,1, \ldots, n,\) hold, where $\delta{k i}$ has the value \(\delta_{k i}= \begin{cases}1, & k=i \ 0, & k \neq i\end{cases}\) Polynomial interpolation It follows that the function \(p=\sum_{k=0}^n f\left(x_k\right) l_k\)
\(p(x)=\sum_{k=0}^n f\left(x_k\right) l_k(x) .\) This method is called the Lagrange interpolation formula.
Properties
We can regrard the Lagrange interplation process as an operator from $\mathscr C [a,b]$ to $\mathscr P_n$. The operator is
- a projection
- linear
The error in polynomial interpolation
We use the notation $e$ for the error function of an approximation, and in this chapter it has the value \(e(x)=f(x)-p(x), \quad a \leqslant x \leqslant b\) where $p$ is the polynomial in $\mathscr{P}_n$ that satisfies the interpolation conditions (4.2). It should be clear that, if we change $f$ by adding to it an element of $\mathscr{P}_n$, then the interpolation process automatically adds the same element to $p$, which leaves $e$ unchanged. Expressions for the error should show this property. It is therefore appropriate, when $f \in \mathscr{C}^{(n+1)}[a, b]$, to state $e$ in terms of the derivative $f^{(n+1)}$, which is done in our next theorem. Theorem 4.2 For any set of distinct interpolation points $\left\lbrace x_i ; i=0,1, \ldots, n\right\rbrace$ in $[a, b]$ and for any $f \in \mathscr{C}^{(n+1)}[a, b]$, let $p$ be the element of $\mathscr{P}_n$ that satisfies the equations (4.2). Then, for any $x$ in $[a, b]$, the error (4.12) has the value \(e(x)=\frac{1}{(n+1) !} \prod_{j=0}^n\left(x-x_j\right) f^{(n+1)}(\xi)\) where $\xi$ is a point of $[a, b]$ that depends on $x$.
可以看到,误差的来源和
- 采样点的位置
- $f$ 本身
- 采样点的数量
有关.
Runge’s phenomenon
感觉上,采样点越多,误差应该越小才对. 可是事实上不是这样,如果我们只是简单均匀采样,多项式的误差并不会越来越小. \(f(x)=1 /\left(1+x^2\right), \quad-5 \leqslant x \leqslant 5\)
| n | $f\left(x_{n-\frac{1}{2}}\right)$ | $p\left(x_{n-\frac{1}{2}}\right)$ | $e\left(x_{n-\frac{1}{2}}\right)$ |
|---|---|---|---|
| 2 | 0.137931 | 0.759615 | -0.621684 |
| 4 | 0.066390 | -0.356826 | 0.423216 |
| 6 | 0.054463 | 0.607879 | -0.553416 |
| 8 | 0.049651 | -0.831017 | 0.880668 |
| 10 | 0.047059 | 1.578721 | -1.531662 |
| 12 | 0.045440 | -2.755000 | 2.800440 |
| 14 | 0.044334 | 5.332743 | -5.288409 |
| 16 | 0.043530 | -10.173867 | 10.217397 |
| 18 | 0.042920 | 20.123671 | -20.080751 |
| 20 | 0.042440 | -39.952449 | 39.994889 |
误差在变大?
![]()
图源维基百科. 和数据无关.
The Chebyshev interpolation points
用这个方法解决采样位置不好导致的误差变大问题.
Theorem 4.3 The norm of the Lagrange interpolation operator has the value \(\lVert \boldsymbol{X} \rVert=\max _{a \leqslant x \leqslant b} \sum_{k=0}^n\left\lvert l_k(x)\right\rvert\) where the functions $\left\lbrace l_k ; k=0,1, \ldots, n\right\rbrace$ are defined by equation (4.3). Proof. The definition of a norm and equation (4.6) give the identity \(\begin{aligned} \lVert \boldsymbol{X} \rVert & =\sup _{\lVert f \rVert \leqslant 1}\lVert \boldsymbol{X}(f) \rVert \\ & =\sup _{\lVert f \rVert \leqslant 1} \max _{a \leqslant x \leqslant b}\left\lvert \sum_{k=0}^n f\left(x_k\right) l_k(x)\right\rvert \\ & =\max _{a \leqslant x \leqslant b} \sup _{\lVert f \rVert \leqslant 1} \left\lvert \sum_{k=0}^n f\left(x_k\right) l_k(x)\right\rvert \\ & =\max _{a \leqslant x \leqslant b} \sum_{k=0}^n\left\lvert l_k(x)\right\rvert \end{aligned}\) which is the required result.
解释解释每一个等号都是为什么:
The first quality: Consider the definition of norm of approximation operator: The norm of $X$ is written as $\lVert X \rVert$, and it is the smallest real number such that the inequality $\lVert \boldsymbol{X}(f) \rVert \leqslant\lVert \boldsymbol{X} \rVert\lVert f \rVert$ holds. Here, if $\lVert f \rVert$ is not equal to one, then suppose $\lVert f \rVert = a \neq 0$, Then
Left hand side: \(\begin{aligned} \lVert \boldsymbol{X}((1/a)f) \rVert &= \lVert (1/a)\boldsymbol{X}(f) \rVert \quad \text{the approximation operator of Lagrange interpolation is linear}\\ &= \lvert (1/a) \rvert\cdot \lVert \boldsymbol{X}(f) \rVert \quad \text{absolute homogeneity of norm} \end{aligned}\)
Right hand side: \(\lVert \boldsymbol{X} \rVert\lVert (1/a)f \rVert = \lvert (1/a) \rvert\cdot \lVert \boldsymbol{X} \rVert\lVert f \rVert\quad \text{absolute homogeneity of norm}\)
That is, $\lvert (1/a) \rvert$ appears on both the LHS and RHS of $\lVert \boldsymbol{X}(f) \rVert \leqslant\lVert \boldsymbol{X} \rVert\lVert f \rVert$ when $\lVert f \rVert \neq 1$, it can thus be cancelled out.
So \(\begin{aligned} \lVert \boldsymbol{X} \rVert &= \sup _{ \lVert f \rVert \neq 0} \frac{\lVert \boldsymbol{X}(f) \rVert }{\lVert f \rVert} \\ &= \sup _{\lVert f \rVert = 1} \frac{\lVert \boldsymbol{X}(f) \rVert }{\lVert f \rVert} \\ &= \sup _{\lVert f \rVert = 1} \lVert \boldsymbol{X}(f) \rVert \\ &= \sup _{\lVert f \rVert \leq 1} \lVert \boldsymbol{X}(f) \rVert \end{aligned}\) The second equality: this is by the definition of infinity norm.
The third equality:
Theorem (supremum of supremum is interchangeable) \(\sup _{a \in A} \sup _{b \in B} F(a, b)=\sup _{(a, b) \in A \times B} F(a, b)=\sup _{b \in B} \sup _{a \in A} F(a, b)\) Let us prove the first equality and the second follows symmetrically. Let $S_1=\sup _{a \in A} \sup _{b \in B} F(a, b)$ and $S_2=\sup _{(a, b) \in A \times B} F(a, b)$. Suppose that $S_1<S_2$. Then, since $S_2$ is the least upper bound of the set $\lbrace F(a, b):(a, b) \in A \times B\rbrace$, there exist $\left(a_0, b_0\right) \in A \times B$ with $S_1<F(a, b)$. But then \(F\left(a_0, b_0\right) \leq \sup _{b \in B} F\left(a_0, b\right) \leq \sup _{a \in A} \sup _{b \in B} F(a, b)=S_1\) a contradiction. Next, suppose that $S_2<S_1$. Then, because $S_1$ is the least upper bound of $\left\lbrace\sup _{b \in B} F(a, b): a \in A\right\rbrace$, it follows that there is some $a_0 \in A$ such that $S_2<\sup _{b \in B} F\left(a_0, b\right)$. Similarly, there now exists $b_0 \in B$ such that $S_2<F\left(a_0, b_0\right)$, a contradiction. $\square$
Here, $\left\lvert \sum_{k=0}^n f\left(x_k\right) l_k(x)\right\rvert$ is continuous and is defined on a closed interval, so $\max {a \leqslant x \leqslant b}\left\lvert \sum{k=0}^n f\left(x_k\right) l_k(x)\right\rvert = \sup {a \leqslant x \leqslant b}\left\lvert \sum{k=0}^n f\left(x_k\right) l_k(x)\right\rvert$.
Then \(\begin{aligned} &\sup _{\lVert f \rVert \leqslant 1} \max _{a \leqslant x \leqslant b}\left\lvert \sum_{k=0}^n f\left(x_k\right) l_k(x)\right\rvert \\ =&\sup _{\lVert f \rVert \leqslant 1} \sup _{a \leqslant x \leqslant b}\left\lvert \sum_{k=0}^n f\left(x_k\right) l_k(x)\right\rvert \\ =&\sup _{a \leqslant x \leqslant b} \sup _{\lVert f \rVert \leqslant 1} \left\lvert \sum_{k=0}^n f\left(x_k\right) l_k(x)\right\rvert \\ =&\max _{a \leqslant x \leqslant b} \sup _{\lVert f \rVert \leqslant 1} \left\lvert \sum_{k=0}^n f\left(x_k\right) l_k(x)\right\rvert \\ \end{aligned}\)
The fourth equality: when $f(x_k) = 1$ for all $k$, $\left\lvert \sum_{k=0}^n f\left(x_k\right) l_k(x)\right\rvert $ reaches its maximum. Since $\lVert f \rVert \leqslant 1$ is a closed constraint, this is also its supremum.
Recall that
Theorem 3.1 Let $\mathscr{A}$ be a finite-dimensional linear subspace of a normed linear space $\mathscr{B}$, and let $X$ be a linear operator from $\mathscr{B}$ to $\mathscr{A}$ that satisfies the projection condition (3.2). For any $f$ in $\mathscr{B}$, let $d^*$ be the least distance
\(d^*=\min _{a \in \mathbb{A}}\lVert f-a \rVert\) from $f$ to an element of $\mathscr{A}$. Then the error of the approximation $X(f)$ satisfies the bound
\[\lVert f-\boldsymbol{X}(f) \rVert \leqslant [1+\lVert \boldsymbol{X} \rVert] d^*\]
我们想知道 $\lVert \boldsymbol X \rVert$ 好不好,至少想知道他至少是多少. 上面他做了一些实验,我们可以用观测到的数据给出 $\lVert \boldsymbol X \rVert$ 的一个下界. \(\begin{aligned} \lVert \boldsymbol{X} \rVert &\geq \frac{\lVert f-\boldsymbol{X}(f) \rVert }{d^*} - 1 \\ & \geq \frac{\max \left\lvert f(x_i)-\boldsymbol{X}(f(x_i))\right\rvert}{d^{*+}} \\ & = (39.994889 / 0.015507)-1 =10 986.71 \end{aligned}\) 还是蛮大的,毕竟 $\boldsymbol X$ 只说了是从什么东西映射到了什么东西,没说采样的时候是怎样采样,所以可以相当大. 所以我们在采样的时候要选择比较安全的方式,比如使用 Chebyshev interpolation points.
但是要注意,Chebyshev interpolation points 并不是万金油,尽管使用了这种方法,误差也相当大的情况依然存在.