为了正常的体验网站,请在浏览器设置里面开启Javascript功能!

Talk10-SOMS-PCA-ICA-MDS

2011-07-08 44页 pdf 1008KB 28阅读

用户头像

is_786373

暂无简介

举报
Talk10-SOMS-PCA-ICA-MDS The Elements of Statistical Learning Chapter 14 Unsupervised Learning Episode 3 of 3 (§4 to §7) BCI 删划线 Unsupervised Learning (§4 to §7) 14.4 Self-Organizing Maps 14.5 Principal Components Principal Curves Principal Surfaces 14.6 Factor Analysis Independant...
Talk10-SOMS-PCA-ICA-MDS
The Elements of Statistical Learning Chapter 14 Unsupervised Learning Episode 3 of 3 (§4 to §7) BCI 删划线 Unsupervised Learning (§4 to §7) 14.4 Self-Organizing Maps 14.5 Principal Components Principal Curves Principal Surfaces 14.6 Factor Analysis Independant Componant Analysis Exploratory Projection Pursuit 14.7 Multidimensional Scaling 14.4 Self-Organizing Maps (SOMs) Constrained version of K-means (VQ) with prototypes on a topological map (grid) For K prototypes mj in ℝp placed on a given map - Find closest mj of considered xi - Move mj and its neighbors mk towards xi Running the algorithm moves the prototypes inside the input distribution w.r.t. map constrains ( )kikk mxmm −+← α SOMs have two parameters Neighbors found according to prototypes parametrization lj in {1,2,…, q1 } x {1,2,…, q2 } Neighbourhood relationship limited to a maximum distance r between mj and mk ‘Performance’ depends on the learning rate α, the maximum neighbourhood distance r, (and the way those two parameters decrease) SOMs can be extended Presented version is the ‘basic online’ algorithm Enhanced version : Batch one : Further developments : WEBSOM, … ( )( )kikikk mxllhmm −−+← α ∑ ∑ = k kk j w xw m Can SOMs separate sphere data ? Well … and so what ? VQ aims to reduce the inputs (K≤N) Vector Quantization (VQ) from 14.3.9 : Neural VQ algorithms are an adaptative (iterated) version of K-means We aim to : We use : SOMs are a constrained VQ algorithm Intuition Input space (concrete) Predefined map (abstract) On ‘simple drawings’ and ‘long speeches’ … ;) It’s just an algorithm … so run it ! Source (modified) : Growing Neural Gas http://www.neuroinformatik.ruhr-uni-bochum.de/ini/VDM/research/gsn/DemoGNG/GNG_2.html A ‘dynamic’ example (better) illustrates the SOMs 14.5 Principal Components (PCA) Set of q directions towards which p-dimensional data are linearly projected -> dimension reduction since q ≤p Data : x1, x2, …, xn Model : f(λ) = µ + Vq λ Model fitted by least squares w.r.t. reconstruction error 2 1,, min ∑ = λµ λ−µ− q i iqi V qi x V Principal Components are computed from matrix operations Considering and , solve Data are usually centered, and VqVqT = Hq is the projection matrix Solution Vq are the q first columns of V obtainedfrom ( ) ( ) 2 1,, min ∑ = λµ −+− q i i T qqi VV qi xxxx V x=µˆ ( )xx −=λ iTqi Vˆ TUDVX = PCA can project data PCA can separate sphere data PCA can be used for handwritten digits Data in ℝ256 2-D projections ( ) 2211ˆ vvxf λ+λ+=λ PCA is already known Chapter 3 : SVD decomposition of XTX = VD2VT Chapter 3 : PCA regression and ridge regression Known tool … 14.5 Principal Curves (PCurves) Let f(λ) be a parametrized smooth curve in ℝp Each coordinate of p-dimensional f(λ) is a smooth function of λ (e.g. arc-length) Then is a principal curve Related concepts - responsability - principal points ( ) ( )( )λ=λ=λ XX f|Ef PCurves are a smooth generalization of (linear) PCA PCurves can be obtained automatically Algorithm principle : Consider the PCurve coordinates one by one Those two steps are repeated until convergence ( ) ( )( )λ=λ←λ XX jfjj Ef |ˆ ( ) ( )'ˆˆ ' λ−←λ λ fargminf xX 14.5 Principal Surfaces (PSurfaces) Generalization to more than one parameter (but rarely more than two) Algorithm for PCurves can be applied for PSurfaces Links with the SOMs ( ) ( ) ( ) ( )[ ]21212121 ,,...,,,,, λλλλλλ=λλ p21 ffff PSurfaces can separate sphere data 14.6 Factor Analysis (FA) Goal : find the latent variables within the observed ones Latent representation of data (using SVD) with andUUUUSSSS N= T T SASASASA UDVUDVUDVUDVXXXX = = N T T DVDVDVDVAAAA = PCA estimates a latent variable model or X = AAAA S Due to hypothesis on X there are many solutions as ** T SSSX AAAARRRRARARARARAAAA === pppppp pp pp SaSaSaX SaSaSaX SaSaSaX +++= +++= +++= K MM K K 2211 22221212 12121111 IIIIRRRRRRRR == Cov Cov )(*)( SS Factor Analysis avoids the multi-solution problem Idea : use q < p or X = AAAA S + ε, with εj the unique information of Xj Parameters are given by the covariance matrix with pqpqppp qq qq SaSaSaX SaSaSaX SaSaSaX ε++++= ε++++= ε++++= K MM K K 2211 222221212 112121111 ε+= DDDDAAAAAAAAΣΣΣΣ T [ ])Var(,),Var(diag 1 pεε=ε KDDDD Still many possible solutions since AAAA and ARARARART are equivalent in Presented as an ‘advantage’ of the method : allows rotation of the selected factors (interpretation purpose) ε+=Σ DDDDAAAAAAAA T Factor Analysis avoids only partially the multi-solution problem PCA and FA are related FA can be generalized PCA : all variability in an item should be used FA : only the variability in common is used In most cases, the methods yield similar results PCA is preferred for data reduction FA is preferred for structure detection pqpqppp qq FaFaFaX FaFaFaX ε++++= ε++++= K M K 2211 112121111 14.6 Independent Component Analysis (ICA) Goal : find source signals S from mixed ones X ICA helps finding source signals from mixed ones Random variables are - uncorrelated if second cross-moment is zero - independant if all cross-moments are zero pppppp pp pp SaSaSaX SaSaSaX SaSaSaX +++= +++= +++= K MM K K 2211 22221212 12121111 ICA is based on indenpendance (and not orthogonality) Thus solve X = A A A A S and find AAAA s.t. Sj are independent Then S = BBBB X with BBBB = AAAA-1 But alternative formulation needed AAAA andandandand S are both unknown ! ICA is also subject to two ambiguities There are an infinity of solutions (AAAA / cste) compensates for (S *cste) Variance of Sj is thus fixed to be 1 The components order can not be determined X = AAAA S = A PA PA PA P-1 PPPPS = AAAA’ S ’ Does order really matters ? ICA can not cope with Gaussian signals Suppose AAAA is orthogonal, s1 and s2 are Gaussian x1 and x2 are therefore Gaussian, uncorrelated, with unit variance The joint density is which is symmetric : no information on AAAA ( )        + − pi = 2 exp 2 1 , 2 2 2 1 21 xxxxp The key to ICA model : maximize nongaussianity Let X be a mixture of independant components A single component can be found Let zzzz = AAAAT BBBB Then refer to large interpretation of CLT … ∑== i ii T xXY BBBBBBBB ssXY TTT zzzzAAAABBBBBBBB === Kurtosis is a mesure of nongaussianity Simple but not robust to outliers Negentropy is a mesure of nongaussianty The differential entropy is From Information Theory : entropy is maximum for Gaussian signals Use negentropy instead but need to estimate it … ( ) ( ) ( )∫−= yygygYH dlog ( ) ( ) ( )YHYHYJ gauss −= Mutual Information a.k.a. Kullback-Leibler distance between g(y) and product of its marginal densities Finding BBBB that minimizes mutual information is equivalent to find directions in which negentropy is maximized Mutual Information can also measure nongaussianity ( ) ( ) ( )∑ = −= p j j YHYHYI 1 Preprocessing improves ICA Centering : X has zero mean Whitening : X ’ s. t. E[X ’ X ’T] = I Others : application dependant ICA works better than PCA ICA works better than PCA ICA can be used for handwritten digits 14.6 Exploratory Projection Pursuit Chapter 11 Find ‘interesting’ direction on which multidimensional data can be projected ‘Interesting’ means displays some structure of the data distribution Gaussian distributions are the least interesting (from structure point of view) thus find nongaussian distributions (as ICA) 14.6 A different approach for ICA The data are assigned to a class G = 1 and compared to a ‘reference signal’ of class G = 0 Kind of generalized logistic regression Project Pursuit Regression techniques can therefore be applied ( ) ( ) ( ) ( )XafXafGG TT 22111Pr1 1Prlog +==− = 14.7 Multidimensional Scaling (MDS) Aims to project data, as SOMs, PCurves, … but tries to preserve pairwise distances Based on distance or dissimilarity (only) Goal : find values minimizing the stress function Least square or Kruskal-Shephard scaling jiij xxd −= ( ) ( )∑ ≠ −−= ji jiijnD zzdzzzS 2 21 ,,, K MDS can be used with other scalings Classical scaling minimizes Sammon’s mapping minimizes Shephard-Kruskal nonmetric scaling minimizes ( ) ∑ ≠ −− ji ij jiij d zzd 2 ( )( ) ∑ ≠ −−θ ji ij ijji d dzz 2 2 ( )∑ ≠ −−− ji jjiiij zzzzs , MDS can separate sphere data Unsupervised Learning (§4 to §7) SOMs : is a constrained VQ algorithm PCA : finds (linearly) orthogonal componants PCurves : is a smooth generalization of PCA PSurfaces : is a 2(+)-D generalization of PCurves FA : finds common (correlated) componant parts ICA : find independant componants EPP : can be related to ICA MDS : tries to preserve pairwise distances
/
本文档为【Talk10-SOMS-PCA-ICA-MDS】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。 本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。 网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。

历史搜索

    清空历史搜索