# Digest of Essence of Linear Algebra

Yao Yao on November 17, 2016
$\newcommand{\icol}{ \bigl[ \begin{smallmatrix} #1 \end{smallmatrix} \bigr] } \newcommand{\irow}{ \begin{smallmatrix}(#1)\end{smallmatrix} }$

Greatest thanks to 3Blue1Brown!

## Chapter 1 - Vectors, what even are they?

The introduction of numbers as coordinates (by reference to the the particular division scheme of the open one-dimensional continuum) is an act of violence. ―Hermann Weyl

Mathematicians’ perspective:

• A vector can be anything where there’s a sensible notion of adding two vectors and multiplying a vector by a number.
• I.e. 只要实现了 “可以两两相加” 和 “可以与一个数相乘” 的都可以叫 vector
• Linear algebra revolves around vector addition and scalar multiplication. 线性代数是紧紧围绕着这两个基本概念的。

• An arrow inside a coordinate system, with it’s tail sitting at the origin (原点) (Sorry, Hermann Weyl)
• 虽然理论上，vectors can freely sit anywhere in the space.
• A vector $\vec v = \icol{a \newline b}$ gives instructions for how to get from its tail to its tip.
• First walk along 1st axis for $a$ units;
• then walk along 2nd axis for $b$ units.

• First walk along 1st axis for $a$ units;
• then walk along 2nd axis for $b$ units;
• then walk along 1st axis for $c$ units;
• then walk along 2nd axis for $d$ units.
• 或者
• First walk along 1st axis for $a+c$ units;
• then walk along 2nd axis for $b+d$ units.
• 简单的实数加法也可以这么理解，比如 $2+5$：
• First walk (along the only axis) for $2$ units;
• then walk for $5$ units;
• you would walk $7$ units in total.

### 1.2 Multiplying a vector by a number

• $2 \cdot \vec v$ means you stretch out $\vec v$ so that it’s $2$ times as long as before.
• First walk along 1st axis for $2a$ units;
• then walk along 2nd axis for $2b$ units.
• $-2 \cdot \vec v$ means first flip $\vec v$ then stretch it out by factor of $2$.
• $\frac{1}{3} \cdot \vec v$ means you squash it down so that it’s $\frac{1}{3}$ of the original length.

This process of 1) stretching or 2) squashing or 3) sometimes reversing the direction of a vector is called scaling. The number you used to scale the vector is called a scalar. (翻译为 “标量”. Never mind.) In fact, throughout linear algebra, one of main things that numbers do is to scale vectors.

## Chapter 2 - Linear combinations, span, and basis vectors

Mathematics requires a small dose, not of genius, but of imaginative freedom which, in a larger dose, would be insanity. ―Angus K. Rodgers

Unit vector: a vector that has a magnitude of one.

• Unit vector in $x$-direction $\hat i = \icol{1 \newline 0}$
• Unit vector in $y$-direction $\hat j = \icol{0 \newline 1}$

Think about each coordinate in $\icol{a \newline b}$ as a scalar. Then $\vec v = \icol{a \newline b} = a \hat i + b \hat j$. In this sense, $\icol{a \newline b}$ describes $\vec v$ as the sum of two scaled vectors. This is a surprisingly importance concept in linear algebra.

• Together, $\hat i$ and $\hat j$ are called the basis vectors of a $xy$ coordinate system. (坐标系的基向量)
• When you think about coordinates in $\vec v$ as scalars, the basis vectors are what those scalars scale to represent $\vec v$

What if we chose some arbitrary vectors as basis vectors? We could have done so and gotten a completely reasonable, new coordinate system.

• 亦即，如果有 $\vec v = a \vec p + b \vec q$，我们可以想象有一个 “$pq$ coordinate system”，它以 $\vec p$ 和 $\vec q$ 为 basis vectors，那么 $\vec v = a \vec p + b \vec q$ 就相当于是这个 $pq$ coordinate system 的一个 arrow 指向 $(a,b)$ 这个点。

Suppose $\vec p$ and $\vec q$ are 2-D vectors. Choose any possible $a$ and $b$ for $\vec v = a \vec p + b \vec q$. How many $\vec v$ can you get? The answer is that you can get every possible 2-D vectors. This means that a pair of 2-D basis vectors gives us a valid way to go back and forth between pairs of numbers and 2-D vectors.

• Rule of thumb: any time we describe vectors numerically, it depends on an implicit choice of what basis vectors we’re using.

Any time you scale two vectors and adding them like this, it’s called a linear combination of those two vectors.

• You can see “linear” this way: if you fix one scalar and let the other one change its value freely, the tips of the resulting vectors would draw a straight line.
• When your original vectors happen to line up (共线)，the tip of the resulting vector is limited to exactly the same line.
• If both vectors are zero, you’d just be stuck at the origin.

The span of $\vec p$ and $\vec q$ is the set of all possible vectors you can reach with a linear combination of them, or put simply is the set of all their linear combinations. So restating what we just discussed,

• The span of most 2-D vectors is all vectors in this 2-D space.
• When they line up, their span is all vectors whose tips sit on that line.
• The span of two vectors is basically a way of asking what are all the possible vector you can reach using only the 2 fundamental operations of vectors–vector addition and scalar multiplication.

### Vectors vs Points

In general, if you’re thinking about a vector on its own, think of it as an arrow. And if you’re dealing with a collections of vectors, it’s convenient to think of them all as points.

• The span of most 2-D vectors ends up being the entire infinite sheet of the 2-D space.
• When they line up (共线), their span is just that line.

In 3-D cases:

• Take 2 vectors in 3-D space that point in different directions. Their span is a flat sheet cutting through the origin of the 3-D space.
• Take 3 vectors:
• If the 3rd vector happens to be sitting on the span of the first two, their span would still be the flat sheet.
• In other words, adding a scaled 3rd vector to the linear combination doesn’t really give you access to any new vectors.
• Otherwise the span would be the whole of the 3-D space.
• You can imagine this way: when you scale the 3rd vector, it moves the span of the first 2 vectors along its direction, sweeping the flat sheet through all the 3-D space.

In the case where the 3rd vector was already sitting on the span of the first two, or the case where two vectors happen to line up, at least one of these vectors is redundant, not adding anything to our span. Whenever this happens, you could remove one vector without reducing then span. We say that these vectors are linearly dependent.

• Another way of phrasing that would be to say that one of the vectors can be expressed as a linear combinations of the others, since it’s already in the span of the others.
$\vec u \text{ is linearly dependent to } \vec v, \vec w \iff \exists a, b \text{ such that } \vec u = a \vec v + b \vec w$

On the other hand, if each vector really add another dimension to the span, they are said to be linearly independent.

$\vec u \text{ is linearly independent to } \vec v, \vec w \iff \forall a \text{ and } b, \vec u \neq a \vec v + b \vec w$

Technical definition of basis:

• The basis of a vector space is a set of linearly independent vectors that span this full space.
• $\hat i$ and $\hat j$ are called the basis vectors
• 它们俩合起来称作（被它们 span 构成的）2-D space 的 basis

## Chapter 3 - Linear transformations and matrices

Unfortunately, no one can be told what The Matrix is. You have to see it for yourself. ―Morpheus

Transformation is essentially a fancy word for “function”. 把 $xy$ 坐标画成网格，线性变换会旋转、拉伸、翻转这个网格（比如从正方形的格子变成菱形的）

Visually speaking, a transformation is linear if it has two properties:

1. all lines (平面内的所有 lines，不单单只是网格线) must remain lines, without getting curved;
2. and the origin must remain fixed in place

(满足 1 但是不满足 2 的我们称为 Affine Transformation，仿射变换)

In general, you should think of linear transformations as “keeping grid lines parallel and evenly spaced” (保持网格线平行并等距分布).

How would you describe one of these numerically? What formula did you give to the comupter so that if you give it the coordinates of a vector, it can give you the coordinates of where that vector lands (in the transformed space)? It turns out that you only need to record where the two basis vectors, $\hat i$ and $\hat j$, each land.

E.g. $\vec v = x \hat i + y \hat j \Rightarrow \text{Transformed } \vec v = x (\text{Transformed } \hat i) + y (\text{Transformed } \hat j)$

$\vec v = \icol{x \newline y} = x \icol{1 \newline 0} + y \icol{0 \newline 1} \Rightarrow \text{Transformed } \vec v = x \icol{i_1 \newline i_2} + y \icol{j_1 \newline j_2} = \icol{i_1x + j_1y \newline i_2x + j_2y}$

So a two dimensional linear transformation is completely described by just 4 numbers: $\icol{i_1 \newline i_2}$ and $\icol{j_1 \newline j_2}$. It’s common to package them into a $2\times2$ grid, $\icol{i_1 & j_1 \newline j_1 & j_2}$, called a $2\times2$ matrix, where you can interpret the columns as the two special vectors where $\hat i$ and $\hat j$ each land.

Matrix-vector multiplication is just a way to compute what the linear transformation does to a given vector. 矩阵向量乘法就是计算线性变换作用于给定向量的一种途径。

If $\text{Transformed } \hat i$ and $\text{Transformed } \hat j$ are linear dependent, it means the linear transformation squashes all of 2-D space on the line where those two vectors sit, also known as the 1-D span of those two linearly dependent vectors.

## Chapter 4 - Matrix multiplication as composition (of linear transformations)

To describe the effects of applying one linear transformation and then another.

$M_2 (M_1 \vec v) = (M_2 M_1) \vec v$

## Chapter 5 - The determinant (行列式)

The purpose of computation is insight, not numbers. ―Richard Hamming

Among those linear transformations, some of them seemed to stretch space out, while others squash it in. One thing that turns out to be pretty useful for understanding one of these transformations is to measure exactly how much it stretches or squashes things. More specificantly, to measure the factor by which the area of a given region increases or decreases.

This very special scaling factor is called the determinant of that transformation.

If the determinant of a 2-D transformation is 0, it squashes all of space onto a line, or even a single point. Since then, the area of any region would become 0.

However, in fact, a determinant can be negative. How could you scale an area by a negative amount? This has to do with the idea of orientation. Any transformation that turn over the area (想象把一张纸从正面翻到背面) is said to invert the orientation of space.

Another way to think about it is to consider $\hat i$ and $\hat j$. Notice that in their starting positions, $\hat j$ is to the left of $\hat i$ ($\hat j$ 在沿 $\hat i$ 剪开的左半边). If after transformation, $\hat j$ is on the right of $\hat i$, the orientation of space has been inverted.

Whenever the orientation of space is inverted, the determinant will be negative, but its absolute value still tells you the factor by which the area have been scaled.

If the determinant of a matrix is 0, the basis vectors inside this matrix are linearly depenedent. $\operatorname{det}(M_2M_1) = \operatorname{det}(M_2) \operatorname{det}(M_1)$

## Chapter 6 - Inverse matrices, column space, rank and null space (kernel)

To ask the right question is harder than to answer it. ―Georg Cantor

The main reasons that linear algebra is more broadly applicable and required for just about any technical discipline is that it let us solve certain systems of equations. By “systems of equations”, I mean you have a list of variables and a list of equations relating them.

To solve $A \vec x = \vec v$ is equivalent to, given linear transformation $A$, look for a vector $\vec x$, which lands on $\vec v$ after applying the transformation.

Let’s start with $\operatorname{det}(A) \neq 0$.

Applying the transformation on $\vec x$ in reverse is equivalent to applying the inverse of $A$ onto $\vec v$.

$\vec x = A^{-1} \vec v$ $A^{-1}A = I$

But when $\operatorname{det}(A) = 0$, there is no inverse of $A$ (i.e. $A^{-1}$ does not exist). You cannot “unsquash” a line into a plane. At least that’s not something a function can do because that would require transforming each individual vector on the line into a bunch of vectors on that plane but a function can only take a single input to a single output.

However, solution can still exist when $\operatorname{det}(A) = 0$. E.g. if your transformation $A$ squashes the space into a line and you’re lucky that $\vec v$ lives on that line.

You might notice that some of those zero determinant cases feel a lot more restrictive than others (有的是降二维有的只降一维). When the output of a transformation is a line, i.e. 1-D, we say the transformation has a rank of 1. So the rank of a matrix is the number of dimensions in the output of the transformation.

The set of all possible outputs from a matrix, is called the column space of the matrix. In other words, the column space is the span of the columns of the matrix. So a more precise definition of rank would be that it’s the number of dimensions in the column space.

When its rank is equal to the number of columns, we call a matrix full rank.

Note that $\icol{0 \newline 0}$ is always in the column space since linear transformations must keep the origin fixed in place. For a full rank transformation $A$, the only vector $\vec x$ that lands at the origin is the zero vector itself, but for a matrix $A’$ that are not full rank, which squash to a smaller dimension, you can have a whole bunch of vectors $\lbrace \vec{x’} \rbrace$ that land at the origin (比如 2-D 压缩到 1-D 就会有一条直线上的所有 vectors 被压缩到原点；3-D 压缩到 1-D 是有一整个平面上的 vectors 被压缩到原点). This set of vectors $\lbrace \vec{x’} \rbrace$ that lands on the origin is called the null space or the kernel of matrix $A’$. In terms of the linear system of equations, when $\vec v$ happens to be the zero vector, the null space of $A$ gives you all the possible solutions to $A \vec x = \vec v$.

## Chapter 6 Supplement - Nonsquare matrices as transformations between dimensions

A $3\times2$ matrix $A=\icol{a & b \newline c & d \newline e & f}$ would transform a 2-D vector $\vec x=\icol{x_1 \newline x_2}$ in 2-D space into a 3-D vector in 3-D space. 原来 2-D space 里的两个 2-D basis, $\icol{1 \newline 0}$ 和 $\icol{0 \newline 1}$, 现在变成了两个 3-D basis, $\icol{a \newline c \newline e}$ 和 $\icol{b \newline d \newline f}$。但是正常 3-D space 应该有 3 个 3-D basis vector，这说明 $A$ 的 column space 只可能是这个 3-D space 中的一个 plane。

A $2\times3$ matrix $A=\icol{a & b & c \newline d & e & f}$ would transform a 3-D vector $\vec x=\icol{x_1 \newline x_2 \newline x_3}$ in 3-D space into a 2-D vector in 2-D space. 原来 3-D space 里的三个 3-D basis, $\icol{1 \newline 0 \newline 0}$、$\icol{0 \newline 1 \newline 0}$ 和 $\icol{0 \newline 0 \newline 1}$, 现在变成了三个 2-D basis, $\icol{a \newline d}$、$\icol{b \newline e}$ 和 $\icol{c \newline f}$。虽然 2-D space 不需要 3 个 basis，但是不妨碍你用 3 个 basis 来 span 这个 2-D space。

## Chapter 7 - Dot products and duality

Let $\vec v = \icol{a \newline b}$, $\vec w = \icol{c \newline d}$ and $\theta$ be the angle between them.

\begin{aligned} \vec v \cdot \vec w &= \icol{a \newline b} \cdot \icol{c \newline d} = ac + bd \newline &= (\vec v)^{T} \vec w = (\vec w)^{T} \vec v \newline &= \vert \vec v \vert \times \vert \vec w \vert \times \cos \theta \newline \vert \vec v \cdot \vec w \vert &= (\text{length of projected } \vec v) \times (\text{length of } \vec w) \newline &= (\text{length of projected } \vec w) \times (\text{length of } \vec v) \end{aligned}

$(\vec v)^{T}$ 是这么一个 transformation：将 2-D space squash 到 1-D 数轴，basis $\icol{1 \newline 0}$ 压到数轴上 $a$ 这个点，basis $\icol{0 \newline 1}$ 压到数轴上 $b$ 这个点，$(\vec v)^{T} \vec w$ 就表示 $\vec w$ 会被压到数轴上 $ac + bd$ 这个点。  ## Chapter 8 - Cross products

### 8.1 Standard introduction

#### 非标准定义

$\vert \vec v \times \vec w \vert = \text{Area of parallelogram}$ (以 $\vec v$ 和 $\vec w$ 为两条边的平行四边形的面积). 当 $\vec v$ 在沿 $\vec w$ 切开的左侧时（此时 $\vec w$ 在沿 $\vec v$ 切开的右侧），符号为正；否则符号为负。

$\vec v \times \vec w = -\vec w \times \vec v$

Suppose $\vec v = \icol{a \newline b}$ and $\vec w = \icol{c \newline d}$

$\vec v \times \vec w = \operatorname{det} \big (\icol{a & c \newline b & d} \big )$ (叉乘等于向量合并成的矩阵的 determinant). 令这个合并而成的矩阵为 $A$。

$\because A \hat i = \icol{a & c \newline b & d} \icol{1 \newline 0} = \icol{a \newline b} = \vec v$

$\,\,\,\, A \hat j = \icol{a & c \newline b & d} \icol{0 \newline 1} = \icol{c \newline d} = \vec w$

$\,\,\,\, \text{and } \hat i \times \hat j = 1$

$\therefore \vec v \times \vec w = \operatorname{det}(A) \times (\hat i \times \hat j) = \operatorname{det}(A)$

#### 标准定义

The cross product is not a number, but a vector. $\vec v \times \vec w = \vec p$. The resulted-in vector $\vec p$’s length will be the area of the $vw$ parallelogram, and its direction will be perpendicular (垂直) to that parallelogram. But there are two directions perpendicular to that parallelogram. Here we need the “Right Hand Rule” (右手定则): 右手食指顺着 $\vec v$ 方向，中指顺着 $\vec w$ 方向，大拇指即为 $\vec p$ 的方向。

### Digress: dot product 与 cross product 的几何意义

• 令 $p = \vec v \cdot \vec w$，则 $p$ 值的大小代表 $\vec v$ 和 $\vec w$ 共线的程度
• 令 $\vec p = \vec v \times \vec w$，则 $p$ 的长度代表 $\vec v$ 和 $\vec w$ 垂直的程度

### 8.2 Deeper understanding with linear transformations

Our plan:

1. Define a 3D-to-1D linear transformation in terms of $\vec v$ and $\vec w$
2. Find its dual vector
3. Show that this dual vector is $\vec v \times \vec w$

Question: what 3-D vector $\vec p$ has the special property that when you take a dot product between $\vec p$ and some other vector $\icol{x \newline y \newline z}$, it gives the same result as if you took the same signed volumn of a parallelepiped (平行六面体) defined by this $\icol{x \newline y \newline z}$ along with $\vec v$ and $\vec w$

Answer: $\vec p = \vec v \times \vec w$

## Chapter 9 - Change of basis (基变换)

Mathematics is the art of giving the same name to different things. ―Henri Poincaré

A space has no grid. 所有的坐标系都是我们人为加上去的。如果一个 vector 用 basis $\hat i$ 和 $\hat j$ 表示为 $\icol{x \newline y}$，那么它在另外一组 basis $\hat{i’}$ 和 $\hat{j’}$ 应该如何表示？ (相当于从一个坐标系”翻译”到另外一个坐标系；坐标系的 origin 重合)

### How to translate a matrix?

$\therefore M’ = \icol{a & c \newline b & d}^{-1} M \icol{a & c \newline b & d}$

In general, whenever you see an expression like $A^{-1}MA$, it suggests a mathematical sort of empathy (同理心、换位思考、共情). The middle $M$ represents a transformation of some kind as you see it, and the outer two matrices represent the emapathy, the shift in perspective. And the full matrix product represents that same transformation, but as someone else sees it.

## Chapter 10 - Eigenvectors and eigenvalues

All these special vectors that remain on their spans after a lienar transformation are called the eigenvectors of the transformation. Each eigenvector has associated with it, what’s called an eigenvalue, which is just the factor by which it stretched or squashed during the transformation.

With any linear transformation described by a matrix, you could understand what it’s doing by reading off the columns of this matrix as the landing spots for basis vectors. But often a better way to get at the heart of what the linear transformation actually does, less dependent on your particular coordinate system, is to find the eigenvectors and eigenvalues.

$A \vec v = \lambda \vec v$

(现在这个公式看起来多好懂！$\lambda$ 是 eigenvalue，$\vec v$ 是 eigenvector)

\begin{aligned} A \vec v = \lambda I \vec v \newline (A - \lambda I) \vec v = \vec 0 \end{aligned}

If $\vec v = \vec 0$, it always stands. 如果 $\vec v \neq \vec 0$，我们可以看出 $(A - \lambda I)$ 是一个降维的 transformation，直接把 $\vec v$ squash 到原点，所以我们有 $\operatorname{det}(A-\lambda I) = 0$

A 2-D transformation doesn’t have to have (real) eigenvectors. 即 $\operatorname{det}(A-\lambda I) = 0$ 可能没有实数解。(但 complex eigenvector 和 complex eigenvalue 是存在的，这里不讨论)

What if both basis vectors are eigenvectors? 如果 $A$ 是一个 diagonal matrix，说明所有的 basis vector 都是 $A$ 的 eigenvector，对角线的值都是 eigenvalue。（diagonal matrix 相当于是 n 个 scalar，分别乘以 n 个 basis vector，即拉伸了 n 个 basis vector、并没有旋转）。所以 diagonal matrix 有这么一个特点：$A(A \vec v) = A^2 \vec v, A(A^2 \vec v) = A^3 \vec v, \dots$。

(如果没有足够的 eigenvector 能 span the full space，你就拼不出足够的 column 到 $\icol{a & c \newline b & d}$ 去给 $A$ 做 basis change)

## Chapter 11 - Abstract vector spaces

### Linear transformation for functions

Formal definition of linearity:

1. Additivity: $L(\vec v + \vec w) = L(\vec v) + L(\vec w)$
2. Scaling: $L(c \vec v) = c L(\vec v)$

“keeping grid lines parallel and evenly spaced” is really just an illustration of what these 2 properties mean in the specific case of points in 2-D space.

Another example: $\frac{d}{dx}$ operator is linear. $\frac{d}{dx}(f(x) + g(x)) = \frac{d}{dx}(f(x)) + \frac{d}{dx}(g(x)), \frac{d}{dx}(c f(x)) = c \frac{d}{dx}(f(x))$

To really drill in the parallele, let’s describe the derivative with a matrix. Let’s limit ourselves on polynomials (i.e. we have a space of all polynomials).

Define basis functions $b_n(x) = x^{n}$. Their roles are similar to $\hat i$, $\hat j$ and $\hat k$ in a 3-D space. Polynomial $a x^2 + bx + c$ can be rewritten as $\icol{c \newline b \newline a \newline \vdots}$. In this way, we can define

$\frac{d}{dx} = \icol{0 & 1 & 0 & 0 & \cdots \newline 0 & 0 & 2 & 0 & \cdots \newline 0 & 0 & 0 & 3 & \cdots \newline 0 & 0 & 0 & 0 & \cdots \newline \vdots & \vdots & \vdots & \vdots & \ddots }$

Linear algebra concepts Alternative names when applied to functions
Linear transformations Linear operators
Dot products Inner products
Cross products N/A
Eigenvectors Eigenfunctions

### What is a vector?

As long as you’re dealing with a set of objects where there’s a reasonable notion of scaling and adding, whether that’s a set of arrows in apce, list of numbers, functions or whatever you choose to define, all the tools developed in linear algebra regarding vectors, linear transformations and all that stuff, should be able to apply.

Thses sets of vetor-ish things (arrows, lists of numbers, functions, etc) are called vector spaces. These rules are called axioms. If a space statisfies these 8 axioms, it is a vector space. Axioms are not rules of nature, but an interface (consider Java Interface!)

The form that your vectors take doesn’t really matter as long as there is some notion of adding and scaling vectors that follow these axioms.

Abstractness is the price of generality.