From 509f460d3ada50098cf63fb3de7c5ad25e4ae91c Mon Sep 17 00:00:00 2001 From: NotXia <35894453+NotXia@users.noreply.github.com> Date: Sat, 30 Sep 2023 13:29:03 +0200 Subject: [PATCH] Add SMM least squares problem --- .../img/linear_regression.png | Bin 0 -> 11103 bytes .../sections/_linear_systems.tex | 17 ++++- .../sections/_matrix_decomp.tex | 70 +++++++++++++++++- 3 files changed, 85 insertions(+), 2 deletions(-) create mode 100644 src/statistical-and-mathematical-methods-for-ai/img/linear_regression.png diff --git a/src/statistical-and-mathematical-methods-for-ai/img/linear_regression.png b/src/statistical-and-mathematical-methods-for-ai/img/linear_regression.png new file mode 100644 index 0000000000000000000000000000000000000000..ea712999f56594ea72bbc4425b60e9021da4b302 GIT binary patch literal 11103 zcmZ{K2T)T{w{8l7&_nMH&UxpOCzlR5ijo!!=6`&-|)6K)yn(bI6#fIuMn8~QqCAP_hO z_)^0ufs)1%e?8!b!d=ry69lSCraf_j0r!H=`esHTP^bt96d4Ty9RWp=OCV5?EC{sX z2m-03gFqbUtVUBc-~r{mn|eB+v-3}0b5SY?BoTZ=N7Eu?VmabSOz0P6;O5`pO5y{9K6-RRW<2pU^mAy0q6z%o0UU=ZN%blP2D{=F(uoM*+ zi$U@Q9p!IFv|j5g-<=}`-g!8R{%{5B9?XoU9MrID^kE&M&EKL+y}zIAoz>+W;(3bnD6z8|jE00lY0{OOdQ0 z2&sb@*G6(W8!oQgcnk}KEs<|OL0i-I8;SmAb$rq5Xc)(yd{={BFs60;niOydrR#WBpH&6`DcNScM4x|ZYoL8IP*M5qh-wJ`D{n#Zv&kGFmf;1ST zW-c;B`_c1K`OpqHuxzfwSNBlOjtP=!>P zv2DzIyWx=5f}!9VepyidcyH6Da^5J034a$6e*b)2o^?NP3 zI!HP0%(ndZp&A*M`IR%(RW16J(Y}zq;Vv_}rq>DQ&q!o#lP6|={Vr&nv9C&IA#V`F z7(lV2IoBAb+y=Rvr3AAxmlL{@5kn-%TbaF26>x;Ty z+UD=dWa)3q;0)#Q1B-$`MJC6`RVE1~B>=txBg#^6(l>GEBvaZmB)cs%$HguIBze{>VwqhBNADsf*kBJJqS z=KYf8aw9ZYD(ESsZwxJi4>WK{k+7kQ+zbYfr;Gi|qBGaW>?Mac-EGnvsX4`abW&L@jlu@58$ouzE0hY!>+fLF~NsoeIL- z&mGGgxwmt)og-~A^cTmJy}oz2pF0Q?d;jS}#cZ&eepBMUi=#fp?9=(^kG`FnfzG() zkhTF16cdVm$*8Gxc^S{fBqcubNJ_AibI?E9Sqx3N#g`|!rG&nOHnDZLu?A`?bf)bZ z!S^S)d*~9w;MuH%Pw&+#&fWLJ9aJ!t%!Qek#`ti22~(4G(vE8UeH?OFnENg)j(P^n z533WhkGXmY>;H?H@x0N6q0t45l2n}>Aa}62kdGLL~8qIN1XHnrq8I;u+PsFfA)nBS6hiS`M%ThxZ zx$_=T9~%LVUBm9apg8>fwYWgdby4=DnWuj6Ym^%xN{Bl&N(d!`eN&R`L$Pb3HrvT>k;^7)`m|%ZZIUu^xJj_woG=1JPMKP4p zP}KQ7+B4_40b0Rl^g3yZT z!raAVV(5j&E@LD9W^e<;F5Vo1Gu}l|G=Ay+<-qbSn|`7yyUkQIT`Zk1_QP-LClX7G zs~*5`zjV;3GobF?G(ac)%Tu~s&R!~D~(dmNUnFS{GBLvR7&qoL6Lgb4x|2+U_pAWdK)ACu4)!N)NWNTx#>jn~ru1(IiaZRpk+?-ADH*e>m6O z_!{G^O|T$TBW6Q(z%7r>bhICcjM3J;tcpx8`|_#Q%-qVM*2qAFcQP6uB16%Y%e00_ z8o|uy+r{mCILzFDPmJGb3ZUuE#_KpjGYbYA>b?>@rNMmn!dNX8ga}rI8bRJ3m@;jw z()I?%AEM27>*S?`>v3nAh|U&GePo}gAP2M^lcS_p^dN|bc&wQ&M6IVxFm)n0$Vh>j zEJ&D7P)W$JD20o47`sWv3M+!qiEA);Zb=e#bITf$lnlGZ9{6JVbhkQATUxL)eT2Ge z0xOOc{)=xQ-wJ9RFP?x0N-imp3h#3PF7sjFIX&bQ9!ODK7u{A>OK#C|e4K#?y8ywl z(4eD0a;cVh>zk`l)1nZz(j=Ft9YSg!1m?%92+!AFEw{nFV!*hNyUQ|b7@oqFh7M`^-Opz)35FIXMJ>zQ%OAO7wb0%kxbZ5Hv~njA6dZI6CA zz>9XwArJ1a{2daxI#^l~N~24&L8tof7j(fn0=Q>d;~?n!l)2 z+|J>y(Ae$t(kYVxNOo%5Bso>Vt)Y%&i+0cPh*cXw6#Og&VW3(}GnCDEmr)2xd(e(L! z4Yg?3>Pv#kB5uLG$*}j=fn{chXkv;jc^%6K8vUd`4Q!Bs{B?$E8+!7ezEA{JR1RA| zTWYSvEF~l8!7`v&9kdIDz*yFKSH4Zsa>5$<1D%>?^2-*lDwT0^ER|TwuYOp)B*QJ< z@D98ZFth|689r#xowj})RgsNkN0?vPUZT&)n9$&5-gX6Apdx4=l?1vOXx!i5Qv!nUQ%Rv|O)3n{n5PL;bCWWz ziF4N>fiu=d)#B2U2oMU{M$l5sc0yJc*Hl9MP_v(X5A7X5(nbm0&yAY=WIfb8!s}rG zyeSO47ZCF8!nMrMC#IOT0Tc9VN6qx7s^(~HUfA>1uzMe#w3_e#OV=%;vPPa9>dgI?vk zR}iKST7=>QMZX4~L$&RbfuU_`JbtTzWBVT9qf+&dok&W@ilM5#ggZa4!kQQjgBgi0 z^Ee)eE%B4CI7R$n%G->p^U+1s0XynXE!+YZA?bQ?Pf)^4$D7he{oS(H|E7o}t-o4% z%bSB5;R&Q!vbg2o@Nkfc6H3RR^_)GWG*Yr{0gTtfD*=O#EWu2-#~NKBJRtN+*lb#b zn{B6A`${dU=Ba9me&kIb)lR4u*dA8kODLppVWqDvHpZb_OQm)Gy9K{aiOLCPYk5HmfsGQ~^V z6Ibwi;^5D`oY{wy_(Fy2%tJmyl=1z_N<%!eODF6V5xp7N z-tdtCbZYS7omnRR+B2Yq4Dthu5fWB-E5PscY;dikl4qE3gGQ zP+&dN!k>!l-~hm}dHaG5S&`fW46@5yH&rK3+k9&nEc5kec8+&$CTwr-dfz3b&4=jU zSW&yQ7WRjC30{B6oLU5252a05B+nW)B7O*M63|81~?VW?m@$y_1G&xv=GcCzOmz3l>VZ4d}_<*z!aqnZSxl{iRx8 z)bh|CUGaG6Ant6C3?s5i@M=?}!fs;Gh8&OYY?F$Sz(V>i(Nk$^=o!tuU5$; z7_D_v)m-p45NLlqKDvBY^W*OU2luUjb)x*FTV8OoC~`JuV9N= zqv@|#Nz~EKmE=24H23c#g94I%eSqDyngJbIQhHgXF~{}a*=ebW4ex8+2=>9j-ZA%i z1%9)ZPUd(Kq(3*NkMU zM}98IeT8s}#r4~nZ6eN4<$;oIiAZ0zl%4p8@@rlJ9$G@K2MZlHM(_InSET0GaV~NE z;Hd84=WG}T`)NuRa+J@6YOF>DmjrJkhNwko++yB-h`~OCo%YzJV)ANf9?z~{@SkIx zW=Jey00T*t-ivEi=4ed>&Tfpj947II(cC@qA1s4=lMbGjaN+e3NToEU-yw(l>Qv{Q zK_E#KPe#*mOflCb^u}B$-F|uyExtG2n*QM5HTsJF8%=d|L(OmGo@7cm3pfn-#nihf zWD_hOLiy+>B7U7#O9EI)qOazJeanbU?N|g?dPdHHTf}dS za-+K835ivafjZkux!i7V2EX->M(Sdj!`zC=K6Tjp7i6!> zQ*?4r%b2P?lR4dJ6<)bgOIKGye}g`(G5b6zELHgm-_>nH&W;*T9itbXdactRUG;E^ zXIq=Zp?BG`gsl*vyz($n_-z(*61Vu<%yn;f0Ki(<>%_*?aJ7QyJvtVHjjrs)Y2IY+ zpY_o~#l|S|^kC6VTqA+3h_HcY3Dps+t8V8ex7CiimfI$)g>2e2C7F#+V_!x#Jx3&& zVyU7$6ryWZRO|jchH4f?8T&lhH#d#hyK*aB>5s>Gt~}0^{81xUK_%?tK&)I6$Ot;k z-lZAH>T%UW#G}mdpwh`)#;~jT?k(YxUqRs?;%ZALalmfKqT0QyRbTMbbftB?*ckoN zQ5TUEN_dY2Te{_zh#!YIJOExTg6cLmV|Y*z{E9MM%kkusY=Kv~g4fXnVoI6dSZOrB9@QxkK zwkI~*7r^X_wB4WZ!+8D7f=g}B>2MW@Dao-RfQA5YL?1)wJM z=Qehq<()zA(sEA`VhSF@qa(PZ7t9IEkfH-wpanL|X2%%zOKvi!Em1UGi2n-!V_=<% z3TyBE$11VUe{g7MG)*NP?IYD7O;q}W*{=G?I+o15zm3jE78cOND?zg9k3un< z$;`|B0eJIc+>Pg42%=)+Qrp+Lk1a;XOcQs#Y zo}$(qBtu<6 z^%N79zMt>p zP?bf@iwB)eS>yP}`p)*xt%S{XxbLs8uY}9qUQRwz%%g zy!rQ>d*mijc$cHdLb)^01-#77>+oY>Pjc(iEF_#RRCLx^pLdmi#DbMquh{u(1NGGV z(Q%6_UNu=VpDUr{(D7!MiEBPllb}YyARmLa!o9rOX<{OhPhSJH8~3U{*!SV1Nx)xW zq3|J}EUjCj=?&P4w24u%jHOyM;70-nrpU= zjV{P^%y)FNYOfNLqOdE-7EU(o<^D3X-orWg9QMxS+j6*t3;46B={}5&J*frziGo%u z**S^ah26NWqOB6svwq*9Gd)R0=*}hf%xpU~@Bsd$_GzV78`2ORr0%~| zL;A8KK)wu(7s?*X^eEM0&r#!NS-yi{f=EkrC#Sh3#H?;c7s!Gyu-un%>1?);C8hPH zCtc^r#IxQ&9f3r_MNFGtkOC-{1~qI4?6n+d41P$ZbJNEXpsi%p5xxR`x8B)P|9?^H zTXj9S*ZccKX-mrxKw({=(3v$clz-wW;+@Psl?KO|u&n(^`V*0wAs*atxfVIznXqK| zmX})R%MQihnA4EW<6Yj$%V>eh>lCUwTcg1*&+KXY6M;FjWz@M9AeT0MrquqbKzDDK zBy?maDmFjEw!)CXJL1Q?+2eylzGq6Qc3EKCWjADgEYUqY=)8S~o}$g%p5Skx1Nz&b z%7=h7-~BWU9XN8*&cBhD13=gd*ss`*3!U5d(=%*WgB@|mj{zmB*xzyFjhc`KVOmJO z4aqhEzazcQnv@8uk0HkX#UENZL^$I-zj!xaQPlRZ)1`2kwrDP_H$`>AbBxyr{;iJB zf+xzt^i4J`gnoGz%y{&?%%Jij34ulta(pPk-4vMw4&P&y@>+Hz8Cyl}{)} zuUF ztF*q1f$PSpqh4UPKl-#JoqUli3^;=yjZyk?q#!tR1w+)~U2*`86snYoEF5!@$_}d` zr9~HD5?T5kpoMj$$>NKBmO)o5MQ#D~z~Ob)BFpn7BO^$8Na^1+U%U0#RK5)fKLl@S z=ZVVFqA1h32@f)V0L(uJBhth*G5Nzj(*6bAsLg+ZYuL`)Do9c9k7Pfr1}K;8Slp=BA`c)U!UQ$CnMDh0&&m znJG-R5wKnEC1FzE{jkn-FZSdXr!Wf)JJGiK_Kn`RlXj?oR4KN+2TJVJGJIh7!8HZ@ zaPThgQud#Xm7nE>?djH>5_M;2I`2Wk68-wa|ScTf_^2K3vP%B5)757 zmHQm=#><|{<~8a-s{Wd@iV|P=&BXrDh(A=EhXpmx1ZfEd}VruA{3l6 zA8*GQYt-mkoA}ILRBSY-vS=$t`1aV-VW$Bk^Iu^J_P8d}LS$uTAJ?gJRdJ1DUvv~i zO4jTmF1rH^`%k{bO|ztc$MT!um-l9BGnY$v!75Y=79?pOdew_?y=Jl*0e{<~#%Mgm z)+sL4e0Ke(mEhik`C6VSdtmqm{ByF=DLtCKyOg_=#IUpGBc0$QYW;AlyqzoVn$Q7v zC&r}D*c)E9BXef7`VOSjxi9KmIgB?FzER!348*b})R&l?w)MPd+ESZruT-7BXYogtg zir-@N2gmHgwwAg|-FQaawi^?rXu2h4KhPr)!>oQLfr+s2b{-KfXM3mQ#tqCvCS)*s6mn z0EE%`_3^bimlaKO^M!rk@g6lGI7O1zfn?NxFd_X|?z30eb+v#G+SJ|j;>Qv1`COad zNO%`denGw}x;XuH{dvd~G##uEy|_nw^!}+hZ!??fpU-$cJU7u(m-0H=jzXaJ7V%3$ zmjKh{mR0?C2f@}uN!2+fYX4GR`vPjr#|O)Cq8qmsqej5%@FTFtg3vjdbRo=>7D7zW zH8#!lPnBTPutoueMmB-6*Zkd%q^P`P4UW1`Wi3ves~>*5cX2tZ6!g1@uj5AiY%;>Bw}M5@Rxcaa^Mg4*WQZP?k#tt@wY!(_{YM%use7q(S)Z%b_Zh2{${J{5SSoz$0Rz3op@Dac*rC5J1 zF&(dI27SzpZNs9+eRAZyaV6>T zJ`?-=(K9B~uJw{dW^i0GZ8MINAV)C5?iuV(d(XeaanH6W@EcLXQ?<7K-pqUMTRh1o zKKZ2IcNoAx#Ko`CQJIq?L6?A*mysr|#2Tzft;D@nCDXH6b_b0K%X(*)AaBt6+P)iZ zSsx-NWVm?}!0X{#i>6~fF{~7Xcg0>=v}fYYH&khDGej&r7T| zlUC;P>WCTY+VbG|l*EnRhkw!i-2_wHC?UBirNpqEWKt zguI=$wt6|&qT!d(hv8N&J2vOS+-Cf-a(%b4tO42qk$A$-tLmCJbGb0|*2SKX2Y_JV zETDBq$xTb zAR-_}ux&K{a1c-97(9Pb;PB76{=e?~A0;pwj&l&V3RBD-1MiA0Igv){@__cF_lfd0 z!%``7qu+iofw{v&CsfYor7Q2|^B8AYv;A^sYM>SDTU>gyOV9(^xtbdt9--GH>= z*~FwKKuBayuC8Y41El3;kUGNrNg?>mTb>h0ezR$YK@QZ-t;d31l}12bpr2@TE4Uo? zFBv6#Vx$yzPRr3OhwI>-5;$7G4=$6=84ziv1Qdr=PK3r!*z3;p55$M9zZNy-VSZ=s&k{1rqFv}74G+kA~wM6)`reqzbfoN%ZA4v#tpV>w|w zQq~sw5IXXqEp!yJYT?v;(XrWpruzy0wWUYHW$>h>Q*#`o{X^wUL1dMs)us`~AFyF4 zX$xhfY`Gh=w?S)IW=6PL^5Q>-Mp(?K!JmuSHnTx%Lx&cVQGKv=>*(s6)|=tlj76-j zY1K2%k*B4=yD(9}H500L`~2-^mN9$wEQhALE8!QmEAL#`h4gt|A73;pblT7gaN6_M z^$BI_Vy5An-I#rPP!5(#JiDgA{QFdR$a-svP+Nklr_+b#H_FzSis4d22Fr_scC@ABt~vrN6KC)>gH%b9Eh! zo((VgwvltR&RV)A26uStdH}iVcYR#lcKe_GTYd?%ac9gU3*`y!8s1B{pdJD$qD7$d z44B4zK5#=N1(mdY2`#PT6=pp!tEtM;e-sMI?|S?f06*IKcePD4=Qh!kL?AL6cD*f( z&Yh)v@RhTt&eA2#Mmf>gcQHfa>QU>zVoA4O ztbv-iR!(Wgtt;(VRx(bGZ_gj95Zm`ZVHhzLZry#*$jOAL-??CA;iQlBYR`)gMhz?z zE=JDHZ5QH2@%#xieN?*Ir=hYdOq_qd;5hwm&ObUx#RlB`5f>sdC^Prhh5VZCAL)ZD ze3eKpUcvNcpxMt@Hn72T#ua{)D@_lJP3X64zP2*=m-O!_lcTo|ZSmf$S5vEA?%ss! z-XTknbBNn8}%7^r9C<6}B@!zKeu&YAw7RZzOKpMx6;B zrkW(U2fAY7w2!+hSjNTt*WAc|wn7#kj2vIaBj>gcFxQH*_O+^OS5EUpxEJOKPCvPH z)tamOAU|ZTno#SH3QF*Z+J~tHQKhjxVb0S0r|apIuRav5S}Rr9enn^sytpZmM)~~f z&TT}Glq}-V{c-!}7m;g-waf#aAfmaKa$id1uGhVWBE8CkN}_i^D9$Z!#ujF}G(e=+ zIUsMoP#-PA_|>_@@5@2PD7!DOOv(nT?lf`Hsf#Tn$=uz1*nT%9uIg#e1cB>PoaDc3 zuLkQphC`LDTlwKldJ8tGv0Hqxu1n=1%xM3m(`+Vt0RuIQL+fud)+Gn5WOw|O$dhD0 zm@3QKzpLZlplJe~t&CR6O?Ad?5iTFY{g`}0!ion2$&tj(>cZMR86AyACdNDTCY<6X z5oAU@Y|e$EN?&Hl#<;1wa_WuW7m<~_`gLFJ%Np~%smFW9ROv(Np29B8Ugc)47MlGSqom+?W+yMu>;0?x=Fzfl#0INrKAN|n3KlS*O1z{%J z77jeG$S?EWX4ns=o%Ijy%xX`yoPoaKYb2?A*!ilZpt>{Jwi|?|0v}6nM`alC&fYu| zba#C>L;v;wyd_GuC)aaT97M2CoruO{nzk+Qv~O0Ggu2fi6qAkxYvyP%X`Z|zYb=jv z{^&G2^lCJ_Gj;K78Xdm4LVQzw#;@~^C-IFKC1tKdU}!AuS*b`;B7O67G5@sf3fz^d z!h3>WAOe>{UBjYvnVlxIRVbNb_Nz?tIZkB8-*5#oZLyY8G9I%e6BY-T%a}Gu)eR3j zf(fQS2fZH~EYKD5;~~{dH&HV1&P!wt0@8%$;r=Sc@$UeIj~r{Wm!tM)lrMDu-EY#H z{~JftTsHXhq|$SfDAp@V+%6J3{G!gXm!rAxEqp)E;zn6k_`c`7r|&G?L%OcXv-&ri6vr}98K+mty~SY zru|PkUyLc4iV!9NCzSp^KVxxYNAP*4pBLi~fsMH*e?vF5QMy*v%EPHz`9 zgR%|P#P?!!(T-Rm_3uCQe`3&G`@-}1-2~R0Ta_<&8Typp82*u;x7I?KD6TVae9M9K z6;%3sc<%<@uH5dHrSNA%2hATYIH6qWwwV1R!>@NvJi~rWHD3KkS{OsZzjm;`7OWhS zx^b8=At+KWoMdt_hjpt1edi2HW3p@QwL#mwe7CJ~_^Sd#tvc0X-cXal_TFx@|K)fm z049*u^Jh>=IMH7~!A%%AlPVl|-73( n$ +does not generally have a solution. +\marginnote{Linear least squares} +Therefore, instead of finding the exact solution, it is possible to search for a $\tilde{\vec{x}}$ such that: +\[ \matr{A}\tilde{\vec{x}} - \vec{b} \approx \nullvec \] +In other words, we aim to find a $\tilde{\vec{x}}$ that is close enough to solve the system. +This problem is usually formulated as: +\[ + \tilde{\vec{x}} = \arg\min_{\vec{x} \in \mathbb{R}^n} \Vert \matr{A}\vec{x} - \vec{b} \Vert_2^2 +\] +It always admits a solution and, depending on $\text{rank}(\matr{A})$, there two possible cases: +\begin{descriptionlist} + \item[$\text{rank}(\matr{A}) = n$] + The solution is unique for each $b \in \mathbb{R}^m$. + \marginnote{Normal equation} + It is found by solving the normal equation: + \[ \matr{A}^T\matr{A}\vec{x} = \matr{A}^T\vec{b} \] + $\matr{A}^T\matr{A}$ is symmetric definite positive and the system can be solved using the Cholesky factorization. + + \item[$\text{rank}(\matr{A}) < n$] \marginnote{Least squares using SVD} + The system admits infinite solutions. + Of all the solutions $S$, we are interested in the one with minimum norm: + \[ \vec{x}^* = \arg\min_{\vec{x} \in S} \Vert \vec{x} \Vert_2 \] + This problem can be solved using SVD: + \[ \vec{x}^* = \sum_{i=1}^{\text{rank}(\matr{A})} \frac{\vec{u}_i^T\vec{b}}{\sigma_i}\vec{v}_i \] +\end{descriptionlist} + + +\subsection{Application: Polynomial interpolation} +\marginnote{Polynomial interpolation} +Given a set of $m$ data $(x_i, y_i), i=1, \dots, m$, +we want to find a polynomial of degree $n$ ($m > n$) that approximates it. +In other words, we want to find a function: +\[ f(x) = c_0 + c_1 x + c_2 x^2 + \dots + c_n x^n \] +that minimizes the residual vector $\vec{r} = (r_1, \dots, r_m)$, +where $r_i = \vert y_i - f(x_i) \vert$. +We can formulate this as a linear system: +\[ + \vec{r} = \vec{y} - \matr{A}\vec{c} = + \begin{pmatrix} + y_1 \\ + \vdots \\ + y_m + \end{pmatrix} + - + \begin{pmatrix} + 1 & x_1 & x_1^2 & \dots & x_1^n \\ + \vdots & \vdots & \vdots & \ddots & \vdots \\ + 1 & x_m & x_m^2 & \dots & x_m^n + \end{pmatrix} + \begin{pmatrix} + c_0 \\ + \vdots \\ + c_n + \end{pmatrix} +\] +that can be solved as a linear least squares problem: +\[ \min_{\vec{c} \in \mathbb{R}^n} \Vert \vec{y} - \matr{A}\vec{c} \Vert_2^2 \] + +\begin{figure}[h] + \centering + \includegraphics[width=0.40\textwidth]{img/linear_regression.png} + \caption{Interpolation using a polynomial of degree 1} +\end{figure} + + + \section{Eigendecomposition vs SVD} \begin{center} \begin{tabular}{m{16em} | m{16em}}