Linear Algebra Review and Reference下载_在线阅读_26

is_640330

暂无简介

Linear Algebra Review and Reference Linear Algebra Review and Reference Zico Kolter (updated by Chuong Do) October 7, 2008 Contents 1 Basic Concepts and Notation 2 1.1 Basic Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Matrix Multiplication 3 2.1 Vector-Vect...

Linear Algebra Review and Reference Zico Kolter (updated by Chuong Do) October 7, 2008 Contents 1 Basic Concepts and Notation 2 1.1 Basic Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Matrix Multiplication 3 2.1 Vector-Vector Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Matrix-Vector Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Matrix-Matrix Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Operations and Properties 7 3.1 The Identity Matrix and Diagonal Matrices . . . . . . . . . . . . . . . . . . 8 3.2 The Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.4 The Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.5 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.6 Linear Independence and Rank . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.7 The Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.8 Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.9 Range and Nullspace of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . 12 3.10 The Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.11 Quadratic Forms and Positive Semidefinite Matrices . . . . . . . . . . . . . . 17 3.12 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.13 Eigenvalues and Eigenvectors of Symmetric Matrices . . . . . . . . . . . . . 19 4 Matrix Calculus 20 4.1 The Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 The Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Gradients and Hessians of Quadratic and Linear Functions . . . . . . . . . . 23 4.4 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.5 Gradients of the Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.6 Eigenvalues as Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1 1 Basic Concepts and Notation Linear algebra provides a way of compactly representing and operating on sets of linear equations. For example, consider the following system of equations: 4x1 − 5x2 = −13 −2x1 + 3x2 = 9. This is two equations and two variables, so as you know from high school algebra, you can find a unique solution for x1 and x2 (unless the equations are somehow degenerate, for example if the second equation is simply a multiple of the first, but in the case above there is in fact a unique solution). In matrix notation, we can write the system more compactly as Ax = b with A = [ 4 −5 −2 3 ] , b = [ −13 9 ] . As we will see shortly, there are many advantages (including the obvious space savings) to analyzing linear equations in this form. 1.1 Basic Notation We use the following notation: • By A ∈ Rm×n we denote a matrix with m rows and n columns, where the entries of A are real numbers. • By x ∈ Rn, we denote a vector with n entries. By convention, an n-dimensional vector is often thought of as a matrix with n rows and 1 column, known as a column vector . If we want to explicitly represent a row vector — a matrix with 1 row and n columns — we typically write xT (here xT denotes the transpose of x, which we will define shortly). • The ith element of a vector x is denoted xi: x =   x1 x2 ... xn   . 2 Bin Wang Highlight • We use the notation aij (or Aij, Ai,j, etc) to denote the entry of A in the ith row and jth column: A =   a11 a12 · · · a1n a21 a22 · · · a2n ... ... . . . ... am1 am2 · · · amn   . • We denote the jth column of A by aj or A:,j: A =   | | |a1 a2 · · · an | | |   . • We denote the ith row of A by aTi or Ai,:: A =   — aT1 — — aT2 — ... — aTm —   . • Note that these definitions are ambiguous (for example, the a1 and a T 1 in the previous two definitions are not the same vector). Usually the meaning of the notation should be obvious from its use. 2 Matrix Multiplication The product of two matrices A ∈ Rm×n and B ∈ Rn×p is the matrix C = AB ∈ Rm×p, where Cij = n∑ k=1 AikBkj. Note that in order for the matrix product to exist, the number of columns in A must equal the number of rows in B. There are many ways of looking at matrix multiplication, and we’ll start by examining a few special cases. 3 2.1 Vector-Vector Products Given two vectors x, y ∈ Rn, the quantity xTy, sometimes called the inner product or dot product of the vectors, is a real number given by xTy ∈ R = [ x1 x2 · · · xn ]   y1 x2 ... yn   = n∑ i=1 xiyi. Observe that inner products are really just special case of matrix multiplication. Note that it is always the case that xTy = yTx. Given vectors x ∈ Rm, y ∈ Rn (not necessarily of the same size), xyT ∈ Rm×n is called the outer product of the vectors. It is a matrix whose entries are given by (xyT )ij = xiyj, i.e., xyT ∈ Rm×n =   x1 x2 ... xm   [ y1 y2 · · · yn ] =   x1y1 x1y2 · · · x1yn x2y1 x2y2 · · · x2yn ... ... . . . ... xmy1 xmy2 · · · xmyn   . As an example of how the outer product can be useful, let 1 ∈ Rn denote an n-dimensional vector whose entries are all equal to 1. Furthermore, consider the matrix A ∈ Rm×n whose columns are all equal to some vector x ∈ Rm. Using outer products, we can represent A compactly as, A =   | | |x x · · · x | | |   =   x1 x1 · · · x1 x2 x2 · · · x2 ... ... . . . ... xm xm · · · xm   =   x1 x2 ... xm   [ 1 1 · · · 1 ] = x1T . 2.2 Matrix-Vector Products Given a matrix A ∈ Rm×n and a vector x ∈ Rn, their product is a vector y = Ax ∈ Rm. There are a couple ways of looking at matrix-vector multiplication, and we will look at each of them in turn. If we write A by rows, then we can express Ax as, y = Ax =   — aT1 — — aT2 — ... — aTm —  x =   aT1 x aT2 x ... aTmx   . 4 Bin Wang Highlight In other words, the ith entry of y is equal to the inner product of the ith row of A and x, yi = a T i x. Alternatively, let’s write A in column form. In this case we see that, y = Ax =   | | |a1 a2 · · · an | | |     x1 x2 ... xn   =   a1  x1 +   a2  x2 + . . .+   an  xn . In other words, y is a linear combination of the columns of A, where the coefficients of the linear combination are given by the entries of x. So far we have been multiplying on the right by a column vector, but it is also possible to multiply on the left by a row vector. This is written, yT = xTA for A ∈ Rm×n, x ∈ Rm, and y ∈ Rn. As before, we can express yT in two obvious ways, depending on whether we express A in terms on its rows or columns. In the first case we express A in terms of its columns, which gives yT = xTA = xT   | | |a1 a2 · · · an | | |   = [ xTa1 xTa2 · · · xTan ] which demonstrates that the ith entry of yT is equal to the inner product of x and the ith column of A. Finally, expressing A in terms of rows we get the final representation of the vector-matrix product, yT = xTA = [ x1 x2 · · · xn ]   — aT1 — — aT2 — ... — aTm —   = x1 [ — aT1 — ] + x2 [ — aT2 — ] + ...+ xn [ — aTn — ] so we see that yT is a linear combination of the rows of A, where the coefficients for the linear combination are given by the entries of x. 2.3 Matrix-Matrix Products Armed with this knowledge, we can now look at four different (but, of course, equivalent) ways of viewing the matrix-matrix multiplication C = AB as defined at the beginning of this section. First, we can view matrix-matrix multiplication as a set of vector-vector products. The most obvious viewpoint, which follows immediately from the definition, is that the (i, j)th 5 Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight entry of C is equal to the inner product of the ith row of A and the jth row ofB. Symbolically, this looks like the following, C = AB =   — aT1 — — aT2 — ... — aTm —     | | |b1 b2 · · · bp | | |   =   aT1 b1 a T 1 b2 · · · a T 1 bp aT2 b1 a T 2 b2 · · · a T 2 bp ... ... . . . ... aTmb1 a T mb2 · · · a T mbp   . Remember that since A ∈ Rm×n and B ∈ Rn×p, ai ∈ R n and bj ∈ R n, so these inner products all make sense. This is the most “natural” representation when we represent A by rows and B by columns. Alternatively, we can represent A by columns, and B by rows. This representation leads to a much trickier interpretation of AB as a sum of outer products. Symbolically, C = AB =   | | |a1 a2 · · · an | | |     — bT1 — — bT2 — ... — bTn —   = n∑ i=1 aib T i . Put another way, AB is equal to the sum, over all i, of the outer product of the ith column of A and the ith row of B. Since, in this case, ai ∈ R m and bi ∈ R p, the dimension of the outer product aib T i is m× p, which coincides with the dimension of C. Chances are, the last equality above may appear confusing to you. If so, take the time to check it for yourself! Second, we can also view matrix-matrix multiplication as a set of matrix-vector products. Specifically, if we represent B by columns, we can view the columns of C as matrix-vector products between A and the columns of B. Symbolically, C = AB = A   | | |b1 b2 · · · bp | | |   =   | | |Ab1 Ab2 · · · Abp | | |   . Here the ith column of C is given by the matrix-vector product with the vector on the right, ci = Abi. These matrix-vector products can in turn be interpreted using both viewpoints given in the previous subsection. Finally, we have the analogous viewpoint, where we repre- sent A by rows, and view the rows of C as the matrix-vector product between the rows of A and C. Symbolically, C = AB =   — aT1 — — aT2 — ... — aTm —  B =   — aT1 B — — aT2 B — ... — aTmB —   . Here the ith row of C is given by the matrix-vector product with the vector on the left, cTi = a T i B. 6 Bin Wang Highlight Bin Wang Highlight It may seem like overkill to dissect matrix multiplication to such a large degree, especially when all these viewpoints follow immediately from the initial definition we gave (in about a line of math) at the beginning of this section. However, virtually all of linear algebra deals with matrix multiplications of some kind, and it is worthwhile to spend some time trying to develop an intuitive understanding of the viewpoints presented here. In addition to this, it is useful to know a few basic properties of matrix multiplication at a higher level: • Matrix multiplication is associative: (AB)C = A(BC). • Matrix multiplication is distributive: A(B + C) = AB + AC. • Matrix multiplication is, in general, not commutative; that is, it can be the case that AB 6= BA. (For example, if A ∈ Rm×n and B ∈ Rn×q, the matrix product BA does not even exist if m and q are not equal!) If you are not familiar with these properties, take the time to verify them for yourself. For example, to check the associativity of matrix multiplication, suppose that A ∈ Rm×n, B ∈ Rn×p, and C ∈ Rp×q. Note that AB ∈ Rm×p, so (AB)C ∈ Rm×q. Similarly, BC ∈ Rn×q, so A(BC) ∈ Rm×q. Thus, the dimensions of the resulting matrices agree. To show that matrix multiplication is associative, it suffices to check that the (i, j)th entry of (AB)C is equal to the (i, j)th entry of A(BC). We can verify this directly using the definition of matrix multiplication: ((AB)C)ij = p∑ k=1 (AB)ikCkj = p∑ k=1 ( n∑ l=1 AilBlk ) Ckj = p∑ k=1 ( n∑ l=1 AilBlkCkj ) = n∑ l=1 ( p∑ k=1 AilBlkCkj ) = n∑ l=1 Ail ( n∑ k=p BlkCkj ) = n∑ l=1 Ail(BC)lj = (A(BC))ij. Here, the first and last two equalities simply use the definition of matrix multiplication, the third and fifth equalities use the distributive property for scalar multiplication over addition, and the fourth equality uses the commutative and associativity of scalar addition. This technique for proving matrix properties by reduction to simple scalar properties will come up often, so make sure you’re familiar with it. 3 Operations and Properties In this section we present several operations and properties of matrices and vectors. Hope- fully a great deal of this will be review for you, so the notes can just serve as a reference for these topics. 7 Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight 3.1 The Identity Matrix and Diagonal Matrices The identity matrix , denoted I ∈ Rn×n, is a square matrix with ones on the diagonal and zeros everywhere else. That is, Iij = { 1 i = j 0 i 6= j It has the property that for all A ∈ Rm×n, AI = A = IA. Note that in some sense, the notation for the identity matrix is ambiguous, since it does not specify the dimension of I. Generally, the dimensions of I are inferred from context so as to make matrix multiplication possible. For example, in the equation above, the I in AI = A is an n× n matrix, whereas the I in A = IA is an m×m matrix. A diagonal matrix is a matrix where all non-diagonal elements are 0. This is typically denoted D = diag(d1, d2, . . . , dn), with Dij = { di i = j 0 i 6= j Clearly, I = diag(1, 1, . . . , 1). 3.2 The Transpose The transpose of a matrix results from “flipping” the rows and columns. Given a matrix A ∈ Rm×n, its transpose, written AT ∈ Rn×m, is the n ×m matrix whose entries are given by (AT )ij = Aji. We have in fact already been using the transpose when describing row vectors, since the transpose of a column vector is naturally a row vector. The following properties of transposes are easily verified: • (AT )T = A • (AB)T = BTAT • (A+B)T = AT +BT 3.3 Symmetric Matrices A square matrix A ∈ Rn×n is symmetric if A = AT . It is anti-symmetric if A = −AT . It is easy to show that for any matrix A ∈ Rn×n, the matrix A + AT is symmetric and the 8 Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight matrix A−AT is anti-symmetric. From this it follows that any square matrix A ∈ Rn×n can be represented as a sum of a symmetric matrix and an anti-symmetric matrix, since A = 1 2 (A+ AT ) + 1 2 (A− AT ) and the first matrix on the right is symmetric, while the second is anti-symmetric. It turns out that symmetric matrices occur a great deal in practice, and they have many nice properties which we will look at shortly. It is common to denote the set of all symmetric matrices of size n as Sn, so that A ∈ Sn means that A is a symmetric n× n matrix; 3.4 The Trace The trace of a square matrix A ∈ Rn×n, denoted tr(A) (or just trA if the parentheses are obviously implied), is the sum of diagonal elements in the matrix: trA = n∑ i=1 Aii. As described in the CS229 lecture notes, the trace has the following properties (included here for the sake of completeness): • For A ∈ Rn×n, trA = trAT . • For A,B ∈ Rn×n, tr(A+B) = trA+ trB. • For A ∈ Rn×n, t ∈ R, tr(tA) = t trA. • For A,B such that AB is square, trAB = trBA. • For A,B,C such that ABC is square, trABC = trBCA = trCAB, and so on for the product of more matrices. As an example of how these properties can be proven, we’ll consider the fourth property given above. Suppose that A ∈ Rm×n and B ∈ Rn×m (so that AB ∈ Rm×m is a square matrix). Observe that BA ∈ Rn×n is also a square matrix, so it makes sense to apply the trace operator to it. To verify that trAB = trBA, note that trAB = m∑ i=1 (AB)ii = m∑ i=1 ( n∑ j=1 AijBji ) = m∑ i=1 n∑ j=1 AijBji = n∑ j=1 m∑ i=1 BjiAij = n∑ j=1 ( m∑ i=1 BjiAij ) = n∑ j=1 (BA)jj = trBA. 9 Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Here, the first and last two equalities use the definition of the trace operator and matrix multiplication. The fourth equality, where the main work occurs, uses the commutativity of scalar multiplication in order to reverse the order of the terms in each product, and the commutativity and associativity of scalar addition in order to rearrange the order of the summation. 3.5 Norms A norm of a vector ‖x‖ is informally a measure of the “length” of the vector. For example, we have the commonly-used Euclidean or ℓ2 norm, ‖x‖2 = √√√√ n∑ i=1 x2i . Note that ‖x‖22 = x Tx. More formally, a norm is any function f : Rn → R that satisfies 4 properties: 1. For all x ∈ Rn, f(x) ≥ 0 (non-negativity). 2. f(x) = 0 if and only if x = 0 (definiteness). 3. For all x ∈ Rn, t ∈ R, f(tx) = |t|f(x) (homogeneity). 4. For all x, y ∈ Rn, f(x+ y) ≤ f(x) + f(y) (triangle inequality). Other examples of norms are the ℓ1 norm, ‖x‖1 = n∑ i=1 |xi| and the ℓ∞ norm, ‖x‖∞ = maxi |xi|. In fact, all three norms presented so far are examples of the family of ℓp norms, which are parameterized by a real number p ≥ 1, and defined as ‖x‖p = ( n∑ i=1 |xi| p )1/p . Norms can also be defined for matrices, such as the Frobenius norm, ‖A‖F = √√√√ m∑ i=1 n∑ j=1 A2ij = √ tr(ATA). Many other norms exist, but they are beyond the scope of this review. 10 Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight 3.6 Linear Independence and Rank A set of vectors {x1, x2, . . . xn} ⊂ R m is said to be (linearly) independent if no vector can be represented as a linear combination of the remaining vectors. Conversely, if one vector belonging to the set can be represented as a linear combination of the remaining vectors, then the vectors are said to be (linearly) dependent . That is, if xn = n−1∑ i=1 αixi for some scalar values α1, . . . , αn−1 ∈ R, then we say that the vectors x1, . . . , xn are linearly dependent; otherwise, the vectors are linearly independent. For example, the vectors x1 =   12 3   x2 =   41 5   x3 =   2−3 −1   are linearly dependent because x3 = −2x1 + x2. The column rank of a matrix A ∈ Rm×n is the size of the largest subset of columns of A that constitute a linearly independent set. With some abuse of terminology, this is often referred to simply as the number of linearly independent columns of A. In the same way, the row rank is the largest number of rows of A that constitute a linearly independent set. For any matrix A ∈ Rm×n, it turns out that the column rank of A is equal to the row rank of A (though we will not prove this), and so both quantities are referred to collectively as the rank of A, denoted as rank(A). The following are some basic properties of the rank: • For A ∈ Rm×n, rank(A) ≤ min(m,n). If rank(A) = min(m,n), then A is said to be full rank . • For A ∈ Rm×n, rank(A) = rank(AT ). • For A ∈ Rm×n, B ∈ Rn×p, rank(AB) ≤ min(rank(A), rank(B)). • For A,B ∈ Rm×n, rank(A+B) ≤ rank(A) + rank(B). 3.7 The Inverse The inverse of a square matrix A ∈ Rn×n is denoted A−1, and is the unique matrix such that A−1A = I = AA−1. Note that not all matrices have inverses. Non-square matrices, for example, do not have inverses by definition. However, for some square matrices A, it may still be the case that 11 Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight Bin Wang Highlight A−1 may not exist. In particular, we say that A is invertible or non-singular if A−1 exists and non-invertible or singular otherwise.1 In order for a squar

本文档为【Linear Algebra Review and Reference】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

Linear Algebra Review and Reference

热门搜索

历史搜索