From 1c9924dc727651a3660d3b3ed9731ec2b890b04f Mon Sep 17 00:00:00 2001 From: NotXia <35894453+NotXia@users.noreply.github.com> Date: Sat, 2 Dec 2023 18:20:18 +0100 Subject: [PATCH] Add ML/DM data preprocessing --- .../img/mahalanobis.png | Bin 0 -> 35665 bytes src/machine-learning-and-data-mining/main.tex | 2 + .../sections/_classification.tex | 37 +++- .../sections/_clustering.tex | 137 +++++++++++++++ .../sections/_data_prepro.tex | 163 ++++++++++++++++++ 5 files changed, 331 insertions(+), 8 deletions(-) create mode 100644 src/machine-learning-and-data-mining/img/mahalanobis.png create mode 100644 src/machine-learning-and-data-mining/sections/_clustering.tex create mode 100644 src/machine-learning-and-data-mining/sections/_data_prepro.tex diff --git a/src/machine-learning-and-data-mining/img/mahalanobis.png b/src/machine-learning-and-data-mining/img/mahalanobis.png new file mode 100644 index 0000000000000000000000000000000000000000..fc5520df6fcd91472f5fc8673dd30f781ad5bdf8 GIT binary patch literal 35665 zcmeFZXIN9|7d;wLWCSaU4Jk?@QWOON>CA{iLeqpIT?Gq4KdCLmx5Xiz}F zNC#06gA@g%S839FZ+D-d=x><0&;Ne8AMSH`o^e1)a?X3+cki{=T6^C;qo%ltc`q{x zh1!HwQoy57Yy06plRwtO-yD{FBm@6jV~tn*3zb;5uLpiucjZsjKT)V;f0p?RjPUb@ z>qt`%YEo`h0 zNa{Hsa5x}&RPcZ!{OKx4|LNG#qsRaKFHVS^`0+c-uRwk_aE({hDng-r?6C@eYB(77 zH!;}+pPDKfsS`9HGgm11Ji7R}cIz2Qp95lNE30EkTOOFR@5y*Q9>P+Knk{E$lyo1Lcy zn^a?~9vm}9v+Oz|NXh7q3**@kiB-PK)jA8m}l7uJ=BSl=K_jfi0RW&>H9j5Iv zayg5yjQm_EQbu1zkAgypCN;R+DW4J{D)ylbD%<{C&Zjq}*Bl1?DR~6|q)pK5|O_ z^N1WO7mNB7FXKF^l-WLZ>fTOK&BX|D%V_a{bbxg3AW3nPvS^14clP=zQC}&C~Bu^{J(40SZ zr{KBs@gK8hx*{=L?(;XW1BqFK!feQkxT9C^+b6QDat_1iX}VM*mS~V=C#vZpi5$-u z*GV1hK&YO@G32El3dxr~?WEmi+JYnUMms#AY)x9(rz=OmN}6?O(NZ?1T%*pu_3avh zr$XPp$@zCm((ncW>5~HuSNLOnb_rR;2h^orQ5G|+R$(3S3UolO40Z8b=+1tp!lb0$ zuP>e%o$}d%%`1~za>>Yrvj^KT)28QucA7=g@(%XahK;A1_l{39K3E_1b2G1mpy0MR z>lpXN3`#JuGS1heGENmkJ*XP-07tQJan-gejSwZ(LwpdbOS3eh-Pd>Q3?VVH>);6!MBiA;;>Gc}uQ=)y!xoNw@ujud1g*VH`wz_@Zx8*oH?nuM^gp&yM$ymQ0qs zy*Qne@a&Q{&pY63VyM^Ii@#jh^5bCMy}5ZgI3BzIGfD=xDa}iCKQxa2d}!{ASb+7W z-ACm|d!EIFR>i4C9;}Pk6u|lhPIE8ED(++SeM#SX*Ygc2_}Fo?>X$lJmx^9pc;C@m zMIp?_iluyKH46C&W!Ha*@?)bF2>dt(1xHsdB=q#R1lULsI7*b1J^abc#{YnSZs!sGLm9Bg zunDenlW{)k_Y0PT+eI`dvg#hYLlZVj9k{)`lISC=-u7QI(XD?*td*U5jvhHtt$+Ru z3*0}i?*D(A|1YnNl?f7@XIyf}`=(ml!&owk{;@|s24>p!6?ew#QsebM%KX=+;8yG^ z3FiqjgHvL)f9jtLCQ|Y9B^xM@oc^!&1 znK^&H%)1|mnqMvUdFHoR6i9SerkfZq}oW`%x zlzIUTc9ca)VW_mAD772=idDU2hNlT6N<*45F;^@Bz1HA%j55ISO!CnEN{k38(}Y5) z*e89cO=?IVi`ttn}6&(L7Yc7H1o))-04ToYX0B1!jV((SyR4oi6?-PST^^upK` z98q@c1RXX%`IWHmaM;xphX`PtC;Ry*rQsq)8aPIVHH1s5FSPPiD_$=y+>ZLzRT>E! zQ{T3>^uu+&O9D9J;s86w_0&Ba@#a}e#<4HFXwjuxt;St1G_b75-32TygId(q_bbOr zXYbuDyb=<18n^_bJi`hL{%0JrF6UB6B5nq;I5w=kb*4gaWS~Qz(t8$_*x)5f5j>1L zZl%o99J5jkmW<_Jrrh_;-dJxGF8p&ttURHg;s4P-{{qG0Xc=cT%Z@|b6gFEssdDu! zy8+=4JiGXMd&6>;D>1HbItc1A7|KLXWs!zXkd?X|Ay&q_h=VtL{QUQtcUZ^kk4$>+ z>yRScC$Gfe=B#~oi7$h$q?PP_&&eytrl*1g8!vPjZFDAWk)3|4frI@2RwHn#FDVbV zdGB#R62d0ll$3)7Db=%jvv?F+w_Rj76Xos(nL{`ThDrOW_# z7el{`<4*>)XTo)$@#HzWdN|+ z!aVm2m>f;f%^{iQktVqI%W$${n)9QapC8|bGBL|OsbjpS60<~G+%_0z11}8OC*$;*T%DjNPVoGEVaq7ds|mNT zU{Bx+sW?DF zDO~1jC2Bm@tVPGT&6+Ns`~+PpscQG#Eu*ta*#fqOmZZu1c87bu?z-Lcum-6k9XvVw zmlo%SFjTvN`f%otM$3>4j;3(`Dn->?ZEK*z@ z;`P@r7yP3(XN6>5g}{W%I5gcvBX7tb+H8tFtoHz+T~O!$HIl`gxbcJ%oVhsLx~Jj) z6r~2VMXxjThJ0B72@1OTC zmy%Bv{X_d}k5z@ozURzhh;bptzs_t>B`wy)V2Qbvz4S{?ewXWCHqozPS5m;yz4O|l0~;G+`1@)JBw zjIn{;F0wI{#WJ^zc?*+~=&h&3Y*Nctas z0;CFLRiCaLy`{wkjHtbJ@6|{#^9We)@d{H?OWH6=0DW?~1VwF-wow z94*S>q0Drb29JmE@zC?$$o-#{+{#@;K{l#cv382qY36)(HpGh7X%#!z5axzF7WpSD zWJU6MMUCkgw*w2KQEn0XHw7f)j3{24V~yaqPyya6Y2as20jTbsNv<&pRSw|~cVm>a zSIxr{@Eu65_fUBE3(xsIAH!g1A>(DWZ_&gBWchkvK~yvqt=WmSP3JUb4A- zm`(738^(3ma5f~<#Z&*L^X&3MF2?5Ti zLqG7qsR|*Phco5M_SHR!Yvnust^C?1?y{-(Q)i6yu?1$E?}FOB_ASqlC!gH=`SHn~ zu=4I8v*v8ao?RZ(E06ZuxwJ#A=hEi4PfrJ(pG?=pkCiW~RA_f=KsNL^cQ)6>5sr#^ zgBZbcxSG>@)4o20(fyqrLpJjLg6_PxQbQ5^34=986-$MJ*i!9;2zUgjY%?hOA@~Oq z)B;_G^O-QzB2MQL4OI584bCxn=+S#lgPYohEM1l5SmI(*7jD*1&XAr5N10b0e>e!N zYWSh({Jyxs9&Oxqhrho7SqP-MFjet&vFcAQ z!pY9{r;_S{AH2w~Kp^6LD8bD7m6&sA&)lSt@xc>>*d}SWJY-c+u`8roW~h)6BuipV zqB%c09;yY{YRSe$xXirED+LUkgUwB%Y1$QkF9j(NG={+G%Q?J&oKgM5x~~HS_BCmy z?F!KR?Mx$|0eo^~*{L;=Wrd+t3v&>TaQE)1gxHN8H%0fjcVUs|t8hMS%T$wn4A~AS zFZf`DJYbeTU2C(_ z{i5TVeLsoNUy&LfRlufPd@LOA`|2Ic*%FdAIinHC=^p8Av@Q=}k#GJLxO((=yz0?+ z!5U;$1NJz6jQxwu!~=3@RcsI}nYSg+>HAQr**DpSbwC}1rndV@m1%f0MT%RBh2T3s zg^6p{JIA1%ZoU2T>L;tW0!n@y%shP90)KeLODdf2%h$JOlI_Bj7$D9d@&7M=jpqR0 z#LVjf*Ca2DSE|FVy)RN00ANK9LxpPX0rQB@N>pCnv`hHBxn=|?xrCG5M-{N8_HB$1 zWw_n$?-ec9cswOGH#HPi<+JlJ5#G}?^7M^lg?sL+fE^H1Pdx&s3s@hBh(F+vR3wz*HYF<}$jT$_15NUgwb$Lb4d?Dqxr-l*UO&OZ6u z#CNYKS~Rc4GQQwXFPUSEvdng0sc%bj&@8SA_$)W`%j{`a7dzi;&oEvfN}+@8`A)AFZ_?) zcNNb!O69v;4P_>LCIHpO4VfA4>gk}7Y?Ev7?;`kW%3UKvPcNN;f;CF zL&%*Saybi=4I~+-@qjAbH`n;;VwHDc&4CyvZ9k|IP~|p1qDTXJQHeOw`&C5K1@fxE z@Wt;^i#N9k;bwQ01Xj6peu@MeO%NF?7D!UcU7YPB)jv6PPlYW87V0v7 z(b^x-+`=FecIgTil&lvERgiqjz4I44UU6#y;Jy_UX(tw}3;-uunb*GBzfc@Ml(%G< z2#4Uv0t2f~g>^aEC^v{$%$D9;KQ16TuvobZ?)JFBjY77bFj=M+Ljmo?U>5no-(w1V z5|DO#EF^1koB2ji@7hl&}Jw4l!@;1BVGXZ^2g$+ zY0XZ`qlLSSf~&r{t`!rDI;j;KA#!Q0wvhl|iR**oZA)lTBi0sY;2dX+fiNb8@N1Ih zp8*+*D+?D`-|P%9i4r{98D@muC1eP)hhkElkx`1tU9uRl>JV)+hollQ zDQ7amB*`!bE$Teq0D1$4NOU^;UFs7;p1%Wa?G|b7 znT{ZJ91-Z9Xg{c;}dmy_oS4tl2r%5vV;#kbm1Lltz8r=>L@@yMo zoFyOB*afdcaZA19Q0*%2ZNN@V$t>6mQz;;VkaXdgz{Vc&U4@u-Oax8?l2@*d*Hp&$ z)l*-9bl3fgM~=KWe-BQEY4d4e(~?Li3mfji8_0KoqPT^?XZvmguaetH;jSM=ti&9r zQUg?gl)pf;Ijoc8WQ#?4Yp=lgC(Res6>Mq)Tkv@^TEd+b0zi{=+$H6*L_qXs^>Uq? zs!$ff5_6m;NH+fUzOSE}8`>j<%wL|&SO)p*f5-0k3wU%HM6QGbr&TQr`aD=@di)_u4$nD*Ag3%$V71xaAL+EF$Xce4w`ZOC%jV`V`lpR7UC`}3M4 zT`mPx?$vu)IQg*F5FM2d^71MZLE1d7SsyRQH`TXTPc?(e2dQNFWxrhT(q=rJyQ5^6 zVQAI0v?lGuhlsH4(fD|kl6vU`>yU2pMNbo0Yc$o1li5Now#j{IA?`0Npmmaz9lvjb z6EfSTfn93c0}HCK0m04vx)6v=;22`4u;sC0md*OCI+e)?lWN=MxeM3qHa9@>2wRDn zE0E$6Pz!ryUD6>#!SR5+1(%Ka6na?TZV3eUb0n)d0}@3!6nHfC+s320PfpQO)r^ZQ zVL?x`qEP98ze~qBs^m#wK``A1bhL)kra{R=frluSX?@&~tTQM&9DAa*9g0 zFey=<`+R)KvppXFQZQaJLvse^D@k8G8$ZT#?503z__LyY+e?mP+iyqaO_Qc&xna=* z_N>4HK!YZC%ZJqSNX;?a+P!h5J=%}Tqpa4v{qns{))^5%tp=KNq!1YSFT%|-a@=o7 z9&7?S86QCn9jBUNC^KB_M1IsCJ(bTOaUp46mF=Bg(0w)mEtOG7Q=|wP|4a=Eo`gaT z8COcc1%VccRIn0V0*2BHqwbT*`V=k|Y#xKUoWL$rI#l^#Zf)PY%Az9Odde~z3~EmM z?#1jvaWB65=>%A*0k28!V!S5lA<0%pv8uf)K?sC|}J)w~h8gRe@s*V1=T`i!B zL>kSF8Ll@=0_tezVO=F8wp4tIcKEh|rY<|9fUkTy`cV%e66sp?%W!h_3vD678u`hU zjC6mq^N!CK6rU&=P;tVYw?av}6=Uy#gtyPG*FXix$u3SC(_rIcGq0r?y#Qr{67c-P z{XNF~v4C$tjCCCA!Wv%BXC$ElIQ|3aIiMq|xZHsawz)axD@1EzL_tITv%e87g+!z$6$dQhqa|T2HM+sHg8a8EtBBK4p;>uh1|BaYlJI?;+4ANLc{* zuVt_)3z8j;gn@I9m?t8Sn;w8AUacN47YAER)QcZ^VF=Gu!>Z;}<$}gT4(;E7Qj!J) z5BU$c)6j=f_K9Sz13@)^#FSr&A9lF6ky~hW%vmmn*GdFgHE_VgmTI_+9&OLAn zwyDy(LLWAOyG)z+9o1C2Y-z2XsBa3@-`Ds{{ISU1cdS$uhoob;l9QK9u(cBHvIdNa zeLDpw`!`&UAa7|78)rKN8p+2-1Wv|QxlT3l&eUDWSCyfX(UwzohLC>8PY$iQ1!QLg`0L}erR4NPVu3)2e2aPudXll2G32lO7 z3urqS;C|5?P=+6VGTO^0hUozBMC+drzOBwmVOx5hK_MR&4WxZ*40w1ThQz7I9GL+Q z7)!hoWgALCIef`>97BUFz)u0hZ%G0YkMy_rY7%qzU4Mqt-)?3H63a}~O{;Z5{K`?P zSo*#d3pvJNn??IaA2tD8z3=X}U&I{*4R7{bWB-s(OFdQN0=5+<84TsJF~AX|@Y+rd zjpT#|Z8EwsuuUMSNFp0+y&O$ouwFRmAV|uPWM3s0Oy_`5YRs-oEEP^xd;XA1InRY zmYQn{cvcB_TX{y#zEzUmYJlXxxYxNri0_rF*D$s1(~4v8n76GreFh;ExamTT{oy(0 z$_CV85T)+OTExFLijh2g?%~$GGE)X_;U@SwJJPp;cve2&laV&F5uza~3q4n`32%3$ zselC&u=w83PF(bqXhR>#NMNz6j=We%7*V~Tv;0zJgFUt z<6ElQf_B|h9<h7c54(l%ExhbL5u(ZJ??%tqO+d8avZ!`d2 zXU=QYx)n8(2e73Uf?=8VN#ri~(PM=D%{BmFt-cl9CFy>FWIr{u#RB&SxA5x9_kMxm z8lDl4sdch*jSyRGn=lKhvLRJ2R~}mMJJcE2HzB9>{vXk)a+ivCEPE^%SoM1IZl0LNGc?~N-t zn@&SYwVL0=2Hkdaey-Ptl+a1=^c{lAyG1m;2R579cxQLJFU`etAo4X8ilYb2fx*2l zervTKlqaX&HK;gYogF(?eKdA_gtYiI4@b;RxdeV2Iu`WwvsK1DDQ8k({jY%Uy$O^= zz_f1T8?Z~aBmjcpF@$^pfLss)%4DYWX~4876DOrFDAWL$f_)2hxGI}4B7jQy+-pp> zoM5P760h3PRdn(P?m5IA4zepbw4%p}#_;K{y!EJ$GXJ>YtZB}ehGwftWW+3_uX{CQSp8#81vI)dtG<9aITUpGsLP3}(d_nKEQr-k} zw#>l?TbPP!M3=%+VU33<_(DjoNh{fDrDpc|sFc0H^Tx8}2x1wfn;Q=N9q7QF`$Oam zGvW|@*z_RiLpGn|>tj0HfvQoGUo~(tqaS%lEIS1Qs+jk6Dk_H_G2rj<+rBudH@wa( zNKFzeoR^+7G}UPB-{C8M$Ig)J1jxb&zh(W6dQV6+1kcgUD5oB~)18N+s+SB*l>t&l z3uwIzsG_G-U_e&n;!k)lSFlZzcogrHO+7?pvA>oU5bI(hD?Cvg1zv!CL;Q-PtHmZG z-wH9?*KpWzyTx*z;0~`iShXh9+fuGNtLmK<4Zx*2Z$0|8rNvms?xEW}$5LQbSJ(!>GVz9Od!HFS zx-}HLS&IF<+tR{R9P`4}e86@Xs#RxkIKObezu<9Vl@i{pB{!!dn2&&9xRpvAq|T{< zh8hjrVVx8OD}c=;-L`k0sv)j}_8!0h3e=_FFou||uqz2-@{A=M5lEqGQIc`R`UsF? zn{`K&(Pc5GSikp;If>6V5=+Fo-eUg?)3nDak&)GNuDTs8no!B_=|ifNcd1D+TrR^m z$&=t8#&LqT4lkhbJir7vAPn_NwBzX%q+XdRj5_gpXf5+j!9zV(&zS3@&uET{u+6%j zjE~`->Fo_6?ExYK>g)Q;(YLP=Gr}dIp5#K8pJ?BREtPc}{5virUr$xx=G?h{uuzXa zF!l>v7m&9)uYiQy;M?`uqA3&GRaiTjexn^d@PWS`04^Aife{!rN)L3f@2_4+-j}tl z3W4bq*M*6iW$rWDebJG2y<5O5Xm)NWuA+UU6ccpkj#wkEcjw*)#$kK#K7hKk_t?e1 zP5e1!&{X9k844*bGoz&XA})79Q};Z(LN;wBKTa9l3NyTR0@NQ;Wzu>n-HV?OhCZF@ zCkYy6r<2V_I}pP#Y-uq~Z`ZPll-cY{rBPDw$VAO{&W>=xV|69%GO!6GPmX@dw{C0t z;;5jVUb<=E&drFXM_Yl!Ne=8t{!x( zrBL1UvWS`-AX8y?Tq%zRk9o(()*P$3fiz`K-+^(WT}oGwx@=(8k=*Hvgj4FhK;ZHD zJ}l*9lHFPoc`|67&ez8$nFyZIOKyuZS7;BfKikoN`sNYmmxdN%nA6S4MI2hs&-_^- zdc9R1v}lR7^3AI$AYd}`97Zkz#0FD24PiEk2+X4dBaFBSY+PU{$*ofijn;~=(*%0_ z9;-!)saD9eH!V1S+r~2@zj&`oX`WhRXcqt3TGv9XSj8Z_0F;E+n=K+j#_g_dJTcK{ zsS=`9E*2?^;bju>O&XaYVNb4CCdzitJD~j#JpxBTh=U4OgXUbsB}9tSHH78h+{>jNB_{KrOY!P)qb8Btb!lPbo#~6G zX=KhOxahe-v5>TqXc@f<8L2#62!GtIu)_wKRS>En43oe-{RZtZU-%RH1AUui)>#}v z8oVQ9-BohzYfGMW6kCF>r}+a&vqKim(jlC&SZ7QZ#^=@luQ zc#BIa`SMVkQd8bdiLqOC*}mj-Zkxu3?zI8u+phKQ03Fz*x9anH-Nj8WFs>o|*aOGX z*$ge)SH1{F3yJoK3e2!3-4Uxrq56EImezae76y_+&8B7s`kxcu&)_1+l{-H z6_G~YI1cXFQ@^p?zI_vEG3~GIVnDB*T21^`D7l=VmBlFn=^6H#{irEm*U*<8_f6_I zVH;Jiv(ld%i#Th{cDl9ustX?;FUhoY%|6?JO@PQ8J1=rOQhP>CPPHP$YG?s$OH18l zVdL%`Dy7|1k<7^@i&sx0_>ZdjLJM2{{On15q^!&|^_%Bdq4hqzUZyv*=CFegq|f37+znr`e3uP*rtzG=$7r7MF4^7J z2{tO@A@)@#kRV27le=;+H-oD@&PUK$C1K~Ft4GukK?v4lF(c2+j*y~Uj80D91gU-z zpqgvV-(sZ#8gZ{pD$>7@bV1uh2d$UoreLVd=IGr5jir)JPuC{|2k#)ny1I1zaUP`v z6foUI;Qh-{_1wQ8>IDtQtd>*4(F%fMt4hh!C4$dG!KZXt6<2t-D1p}*Ph za>==6ldD0OB3!cQ_!tP$vP^0y7_vPVhFYmE*93MpqS*H;yx%5^k3Di*_ZHeKK6+-j zO`0$l%=frEKKfZS+OdsVqeKkz+Z4^{TCKPQpwd=Nl!-`9oHxpII6pmeGvv=%6a#(s+AvZFTh$ z5FWTcuGPQJ<@He;wK>sB*`Swr>GZaj)haVe(#o( zzBDM5H;)%i{eHWr7YBe>gPnxtT*2*UhhA;rls=cDm7q(CaK46@ z{&r5Bd+AjC`$g!sfTFRUdOM0rZ%v^m;gl}R$g-%N{mnCqIsBBFz2d@}SKG5TOq7RK zNrvBP)#DJ_veQ)7fw%U8oJK|p*=`0B6<|q2JlGX7O#Ym@%J+7yU-iVM2XhBsG_@^& z?+!P^PXty+X4?F9wBS@K1lPT1w>sAv0DV`d~?1*TX&$*6VU7?1@7{Mb@*v4*L7?d5`;Yjv$% zHucl|U(o_bHu(-{^*O%i_#kz%eTK_zx~&C22mK1@Cim1PjdGc#y&o;*mgZAUPDh@Q z9VysO0(*5{$Cvag=?zQWGu<&^RfFm6n`um3E@$wmQ<+kBOW?S}4T@cPVYW$?=pP|c zC$NKGkTQ|Dy>a*G+3umN)>|xS0?7yG6Lec{@442R&p8j-2OFfabebV^fu>e1gW{H4 zQ6#`>#vs`*zsOQ^!Cu_ktm@F~j7Zy#kJ|{IVv){3e4wetaqZ^*k1Bk2+DcxQ*x}$B z4Cbh4S11tvZaU`BSjooHrA(|DYUXv!&hsGK-S7D$Iz#OBpN4I$7!|gQ{%O#e=8vXY zm&<4iXBAZzMnL-x>=~rV%nI=r>Vnrrg)u3;FY$Anx!qGm=hsi{Xqd1E{2d&rYuf+b zupKShkn=#s`3+es{rW4O!Q3Nhrj@5Fa(2-}+q;aI=9l)Npt%e`YaLU$$@92j|-6FW;Y}(Sk74!$s>9hrRL^Gs%;h6Rk7DSiZ-Bi`F%X!GdQ1 zZ$6Q#pJ~cI?o{6P{_(M~$6foz_J~{t1Gplgu|@l%OU~5yFDLP-SDt4@ieIkD)^mad zAzCsO?S{(B8tvY&ufB2MCJ-OHHvS=NbTwC`=kLZ^<-t>#Py{?VOtd*P?E&kYWSD=W9nLwn%M+BkNdDu7IRJdHhZ~jlW)j25I_*g25>?c!^!P2d-hW}BE$+caGH6C(o^TbvvrU%P@8;mH2g|*;C=hH({t*Qd=5Kv zE~__!5#^UX_WkBL=rn}RE6}=k5jvw@Br-<}y#xZbOk?EkW+{)x6E@!iz(qxfyHA!O z7(lW2pkqyCU1ey@Ib|!wgflfcgp&cu1~0S_s4GgWI?5K1m+S5XYEZn@ zk~}T%Aixo|jLE$OPkpDJSXdKSj?nDUVgXFMK~3W0v95u+jIS@Wm5J~N@R(?^RrNLE zd>`vB2Yy)F$dzgWIyPDC`uXdED)$OLoT8dBm2avBk_1e(1xnH$N~T~ZCuh+#Q!kgQ zu%(zS43XH?#k%F5)!8g&kPcsVJ+fz$-CF)B(YF`XH54kPc3*Hhmt?5kI>v0pgvf_qn zq~5)Gv+AxpV`Js{#@mz9zcFKdjZ09dw$H2jHB*kWT5jUvl)N_%brH~gW;KZ+;7ECi z7KR7wp)`Y3OMui5Q<~t^#}zl;Ny-h~sQ>vdyq+_`p2lM6ti!6buCKc!WIQeV9A2+M zJUv;zz58U+S49!$#oPU1X@+xFGx=#3U_S0`0XYCY$!yHhG@qs`{>yB^GCbZk(KP&!RW-<&m|L=OKIoaq5yVaTBQ8A=AxBRMea-&J8i5?N}YA&@YX+++%A;q)G!a#@zkAf!eJB6 zCtr;7k)5-w_LJz^z1(C8T|NPg=3?cQr$K^u?2dZOb#@JW_6nR?=3PA;gkPu*f2ZKa za)Yb7Y?xiE(Uv6y z+SbWcCB;RwEHiAQt^Svz*yfA0Vn7SSF`&NvQvj6Dj?{bMXYvI!XR1@z z)r7yGkBLD-H0wQ;q$xLl4lj^=$l<LaAc-((;urrHhYLa+h12Eh}$YEGrN><~XQK2zem zW4&b$ira3bY+2Fse~`FJJ7~^`Te0x6^rC#7FsYs~Z@qn!8t&U%(R5nU+!olSys{EN z+#8)FEb;0*9XLhO1$7BMP@5^hYiYYUKc-GY8iyf?Yl>)&yUg}HBLP-TCX2bw+R*MD zVRn^8p_>1Bm;<~gf!&jazbMwQ?;z$Zw428(_PFC0&KB z0_lYfGjziKPJ<0I|L0*=cw^lp2|nAI4iU7t;b&maMjWrNO~@WbIsg`thVOdww5H4a z3YF!?x7s~)GuSOd8czt1OX?(9R&zc}EtiFM4Od{}NDZa_##b+W3Ka!s(U%k&=# zoyVgDIYxim2JIzjta;PdFl904DGpj=eLAA3$LnXLQ2v6B_&qe|!3;M96=0>(e* z$y4&1!N-fFZrS5X!U8+Np``;Yu2}8mX7vV3FR`S1Zy-g&|5U~!)i&VTbbA= zX^5%ok6H>+K?6?;n~IHJBjBO%B_DD%bhRK(A0*u%U%we%mthvlZ?HSL9dgZ$rrYI# z>@m(b{0sN)0pL;=sF!`=BAtrEN$e3uhbYtsob%iM;7&7iU$6zG_9XjmwFgK4Od&i| zBcWQ^{uzWnMxWw|4tr@cwHEfEu_CrQV^I?8@F^GQSM9GSbnp#sx*W8fEP27>5zC!W z2DX~lMY81^r683W%!@L%HyHViSPU2Xig-zuD6+sQz2EU5$!3I zt(%t5T`Yw6!NW#sc(chq^|;ZvlGE{`&BBfiNGBfMY!7mcIVn~-EEJ5FD^kG^AzOa1 zbf`%&rJmgq{EitS05C&7_^Nnj+o2_NMk9&5SmGM@HMFrBpz!AzZ{>Aqa=Gu5yqJjC z$mk7s-zykU2mX6o1&VY>lmRW+%ftl{;@iEQxKOzT8?1sB(Aj_%0F!A8rC5iNtIsd+ z({2^|U>Q&zTmO*(_F=ijG3k>Bs2UdqUHSofseLiV@vqDK;zdckNsF3z zfv$ZCA=(Iw2qX<6mxA+fXmz>xBW}qzM}BS9rwI$(j*Ky-Y72#x_}>_C5PfmVjOJtH z%0toAT?{gz#q%n4DfCt6%V}pJSkw}_%4zOS;KHI;rxXLkTLP#jsKjq5cTR1zKeZf| znLBC4iLmU8Si?P3XI#`DXdEG9SHdX~ccZ`di>H)iIcMav&aUs-ZSTF5f$qp5CBm2m zNI;_3YCl67O4Qe}LhO!-FLM)sRB%>dbCSvJiZDY{?{4P>`-R8H|Gv3J3DQFVF}V~w zlF3^mdAyW+DKpmn=I5mSSKWqWNne{C^zdV)QTugbZ}33O0m{-O!>SWa1T|HQP;S1GZ`*Rk|*zZ#C{-_~P9S(6TcN@r1Z98eLD#ASAIwA!?(Q z|9^(M=8+m0lAx!y$@j7OwR(k=!e^D}_OIo}McB|jw^Z_*k(pMo_- zpwE%nfRaDG!Y_a$))gA2NZ`Z^s=@^y+TY{WT{i^1lsBNT@Th$?`kn$B{$TEtryZGI zw<5DmA@zLJm@!6N;27o5;3}{q^U|%onQs=ygCLwjLZjJoBnDSSsQ$s(cju)J^zWK1 z%uZmIdJu*0?sW5dRDjHCln>)e{vQpk`CBait~~^0`Tm)nfiI*^i{8T&^Sygq5?5BTaV`G`X82NzP>~Do)?o*LF%70$v_Y;$CT#$^*qY_Go6~79(WM1N<7mH~bh{p= zLI9^Z(2xmYYr@MLJ{>X7q6I0`=}7^?sc6Jg@DcGz_o%gi&k{xl(NDo|o?~#avx^eZ zJ&R|JTtlc|vVI?$lke;LjSVh7z#aUT;*c(~?96FnfPn zJE@oxHC??MyELyu((N3B4lZFoG%r9E&#$)$Ybk1BS#7!@j9Sdd+ciL*22->MZzbH2yGQ2pFr!lZe*&a)cQ_`&{{a7T z%=i}j_Z>)7vG%_JMz7!wgp;N_1L$Py91L1Qdu?JrNXeO~)+0d&$rJLE%Xnp!sjg>1T~ebvADKHeoWNb-i_=9}8#a zof1!}TV-mK(9Vfty-Rmh*by!zn9xY1L6h~Uj`gcB8ltb8blL139VUT!H0NL)I76|e(KWI#e+zqYR|>+IXkk$ z$T!lY;`Y~Dt^hZvWMwXvt8`gcibYt0Ih6j^W_v6olU^zLJyn2s(?eiFcCwa zYTNfYy%j*#mH7Dy$hv?>pG7+rYXpjenH)RSua&kwz*bRe=tQyljgMELTBLxN99p1Y z>tZk>7wuhJ5D(C0caKzce2X$BB#>4S#27RXUT4t2v6Ra zBRX~scG!X1@cfZpeqVMh(e8D#9Qt)bPEwC&oR2x;-Vecp0R}=oZ;T5#1CQjM7=OC< zOIP%f0B_zFKrnZH-pIKsss1;q{TA!0sAk&GXab&fprfIx4CfyOLX)>AddRtm!yX(& zb}A2!=4;?=dn!(i(uIliv~Mkt@l+g;#(okX zV5Sv$7K{ePOEX2J+#oeli676ihx>bCwl*G8AN!hgCBD`US}*`r!w4@NJ@QyfL3S#T zT%Dt!FJ}9;?pM$fdw=(_cN(}R+ZqFaOUPgY z!;SS>ZCL6rNBbW<`i&82eETNe9=UX7aOxkq7GF|lT>^5L6}EKoqxYOKFswuUq>zo#Q}3#3cM!GQ{YK$!7<2nE}w!xqS{y*l~JRFhD8c_$Rho~5mT9jLG8F8|_C z^ZB@uC}%&D3J)fwjYgCoiN5>`Xuofo`Bm zOS$w3N5o?DhZy@oO2O-8*(G$)O*E#Jao9}74;k+2(ig#D>@9Rt9tNQ-OywmQA;B038-w1!I_u3f^>BTdFf&~@+-GHT#hxj}6L@;EGDvR#Eh|JcIwvro}f5Oe=baa)*$ z#K4_<NdpxKnV}r&& zafk@%r@ulPr;~ktK0NxpY^hMp=~3p=yOM;6-$$SWX!pOq5z67S&Xp)knAzt?f5Ma) zkAknqUM0=&d;GX$WXlc)FVL9IvAOfjLsz)z3t z|JikL_a02EH`eb@9e(9shr#~-$o_BMMm`gokYbHslu#lxgvP-D2=1l-^Yw$~=T?#T z4KU`xDhZ5=kVzqckOq@&NAJxwEHZ+jmE&Jm zM~NF|ZejDrqo2<550H{a;LYhW}k&5&=uE}0M&iAjdG8)^fGx@ak)rH&O41z9Y zgf`;uL*i!#`NWo(yCYxz4!A1g8wJa!js4MLDUD9=IqluCeSyA7M*i$ z^6iF)#Uo)_fd*JJ0m5o<6$o;s@zV-k9P(r+&+OyyJ&Kuhh7s0ec;~7^fC%1j_i%`i zDDm)^@g2m1aRD4j%T+D#z8J8m2~z5Oztgz^CvNllVs@Ru}t1 zbs1-b3??#zvbP6%*5T{!&&evk*0uP0K1}a@I^Oq5C@)w*s02)ai7+jVKHLOi)lXFT zYl8Bj_tv!&gTtYm_i6xB^HZQ_13@8|OCv~tV?#hOxYfb}Q(!>W1sRGq(q5zt<4skA zY|0(%m!chDx)*XK0qG}xDtdrqn!w_WTW+U9ypADwB**Fi$0YzZ&^&i{8ht(+SZ9`B zx*DVW$LuwjdrQ~(&`qU1h{tgG<-h2%^!J>6FcvOQ@XR9*GSU>h0?FyUIf3?dm%^}g zpHMxAed7fn_h6$)uV!k&wUqnMfoXsO5pO5VyNL72M*RQuvC;Lg8Ls?zpA|zdbcfGD z8Hq9b8jqpa4%C0u80mrG7-2U4wyDt70bkxf2c`b6#g6yDGQrM$;}?Yfc*{V%X95Px zycstG3uJ#bf+KF{RXeWv zJ&#MmN})m%8fpL~CFqLM5Vs2P7t>1dj@#ymDD1RWAUl(3nk$& zBmxjd1k(J>6c7$Wo}oTMn(uKGfSEY0m*>tdf5z_s0QoRZKwK_hp$3QqR3@6g?%LlE1?Jub7*wl`#K@KQnBUTnaHCBkV%Emfy_m@G zuKQ$887Czex<#mr32Xy273p!|A6lL(IZ&4h(40Puj0NuuBmJ7`u1vT ztyQen(uzP85GSOF7!e4qMMlAjGYZIHK#(v56EFm>Rf>p$LQEzM zj35Y^hzt^xBow2GcnEyY4sCDmcl+=AzW<=YPjYgy&)(}<&syst5}_^VetGq-d%MpZ zv7;z2*`xOg_s@3M@+^gfqlsiffcGu^8L zXDrDJbdrk#cAlW?C)pi-s^H*2&@J4 zVNqWDWp5x(td<0JBf97^nS>?4`8@NnVkmM6M}+{(G09cGSXlZ*OnYY)k+3%x>l`E~G<&BobZ-Y(B-zCLk2hmHf2` zobjssL+{U8;C~dC!)xonGrEZX+mA`NJ1m3EE$cN9Sh)vfbFk~rRTHL}BxS?{nKe1S zKYA#9b>wgHo*$#i+uK3+#6Dv@{WmXoL6z_pp$w(LcZzgkQ~MYu4UY-Il#Uk^G@MuZwgozkkwsaVpexYO8 z_Cn6~22&3t zSM()K)7sE+Z^VNXCP<@u*;4`WVFCZXk#J8nt9$$O+0Etva?J&e$HUze$aZi{+)!1d z2H&bj3+e|6d;F6u4zP&6Zl|3FRHfSnNDy-ZK?Fz~;g(P7i>JFh0i+DJs_;cgAxW36 zlVp#;rqtAp_jjRZ`^_5gC1l*p($A>42X`1JjyhF@`LO_)=xU)Zz@{b;i+XqQ4p$-7 znUr+eg0mu88Ybo>g&12?A_k{6pS-#6#{T6@2gnJcBc8LX1ykC!iN)6J_#HO{nqT3M z?K6lTJ@iI}eu)!Db|tqkMu;~KOYtw&2_1-kZ6a5~!jc1(-Z7r7%`@F^JR$~RTB?t$aN<63Kw5-LM4Usm~u^8zG0?jwXITEi=Lxv#d>0*;w7;aX_Cdu5v#hOgF z?N9#urD*lQFbOK0vUWsRB*=h}A>He+9Z+O0Q?0GIZpO{FG*DLor>InFr8< zt}Nb&KzetpVE*=6q)l!#@$<~xXthUu*%Abh?(>&z{;pFJ{1_im8F2VQv-w7V7(nrr%m{JA z&T1qWTdE0FBLiDCxH)DU(%dAB`6@+VNoP|ZHFC%w%E06T6w++P9X-cOP4en? zQ_}5uW2rCxn1c}88+3{D%{@5IEx#yVd z$f4ND0A&envlYswqrOVqD~p zA|`v{*(nf+)CX-u-hEz+bAy3TGc0gjdF*s1?@Tl`nAvHr6&>_%!@jc$X2Nuq9_7~gEnvC0fdhStiZSnj{XR*D<%fP^-coOkfDQ*3POYo>aHL3r8a1Tf8-)q#a(R}2I_ zK(2n$V4m(a1#9vj%W0VxskezYeiD$R5|sy(vqwsk7%)X>CXS+rcFu-!%(BA?mY0AZ z;f1eYiDS}+bnGs4e%%}S<7z)L?tYD?P;$jZg+X|g^Q&;vK6E^+Y;_)BM;5j>XBfe( zn3ZZh3P%jp2a6Epw}5J%Rfa*^i+Urk|?0n@jV*`q%9r5`7M5Wc1dA zWCS~Z1R?4<-Rrj}9`99jtkRi0z%evu*O5m?Cx?u}hZ3``pFSV1azNQWbLKBKWjg%! z%dFyeEaZJlZt)l;#>N1a%~>=06pW}7uqYw6(2kO|4jCxgY0DHlPbsbTLH;L3KPBg*LuhF56^Yu#f6= z8GGBX*l0Q!ccYVt`r?~bw&zd2tfcnaeAzLyFC`D#G_B(qya7I=N)uQzy0YUg`;$M` zj_v@44Nhcq(M~tJAhhErm|fu7Wd2l+HHn+d{~HEFIL!A5Of(woJ{`nYk#Lmu?XaRm zBqyCiF#`4|@Qy=yVkWWr5t_D)HEOt3b%k}xpGANJj+Q2oaJ8GkO=T*XAEDFCKhJ=N zb&(mgDG+8rV(GKEh+iAhn^aq9v^KvGW?WJn*Jd>Y`$I z``@23|}nr{u;e>xu3F*mh-bggJ}ScY1pdOJ`N*J%@Qcu3<#|HE5mG6E5RVRV z0MKEx+B5R=8C{S7FpdDJKt%C8l8M-TxOXTRV&6?4GbR|cAMPIK^)1eAwiGWoPS7O7 z6NTAafMo(Kd+|pRgsg5%s=%LHX6xOl_@`;LYm#4`IDv@Z_zPmuE`$TL4D5I)jvb40 zIx&>#vOW16tP9D^gR?~rK(bx(8;`g~hOY;tP{Lec`N!7VO2YLY$6W~rurY)>cOWr2 z3t0mz#`E{rjHQuO`B(l1+vC%RdR&_vLK=a^rVFpyl5{l03m+B{*|dR%9kLs1GGPj= z=P%APd=Z12F9c2@xFIu4(?TbfCzw!iR`w z>B2uy&V&tCxVirvGB{l$6*#_qHp@+Mc5V4rOE%E`O~(D==DTj3uvx^XF>vyXrMm64 zt1m(2*=7*gv~xqnHng)$-=P@ zT`xx+UMj8MKgBmkO?dx4` zJTJnhtG%7P+uM5}##OX{y4U!iQglB$qX`v|Lm<#t!%U|3+2@9rvz>J|1)sz|)>4g9 z3RLCZ7&L<*mlGep7uF(Vf9Rq}>fjnO*3h!(0E9h>c*7QQ<$RsE=O%=nQZ&`Ydlb|Kr$<`%)AT6~V1|6W#1xpWL zCEAkUG)8;@{0emHO+lA{UJmSpPQrE1RhK}Q`k+s@9%=o=^qE`k!m-qy+LXyi;5v#w zvIWHu0)T6of@q~qvP3*Rlzz0)@%Hzv?d|;Vy~d)izx>kn@Gt*xQ=svG+Yy$KlTK2J z;N%N3(?u!XXj8a08B%G(IU%Mt9NUg6uF={T57Kn^H}fsr95E|%FnycL%Y@@pB@GQ0 zteb3#v{6c0ryIV7rV8W|^!JStG*vv_|A!M|dq3F;9R=$>Bb;QN$U-d;krcu@dTw5R z6wam1LoI#g71iq%jm#aHwtyU^-|)w<+r+cg5MkwZ6IM#`RtKgro?Td4px8Gt$(&Df7b&W0ST%5IR8DT?|+XWxL|2>o_X^0Z^+xL z3qKO7ga=|L!gP-pWC_NHrE^`AmBvJG|S!=SCRamh0ac=DJ-kkGr1iKYux!#q0r z{T3*MJ*(k=(x*2bf4RHO!?Jnz20N5D&<30eh=oDnG=vJc9Dnz$bwJvEuwdD$csNMi zhVMd>3uuRz>^l^o%g8o8aBb`RZj&PTup!#bDDiE8&NQ-t<(|FXV&a!sdahx}v&;;9 z(&1MIIE>T=?Kami=zGy?$9BEi`sKn435vQN(>mD1I(EA+S?zsy_djC(U`+qr_W{lK zTywE&0*X{ca*eigH+1pfpo{Q~fv6I?FlkC<+yO!K5RbZ;tYiZ@{3+qP=JWz!rJxT1z|m`2DhqZWN0)1l*%Jn)&wX5Aot( z27{D84D#gZ3}#;M44jDt&C^Ap3$|Sse^q4g+9+D z+OICT8d$o}ZqrdmF9ig!=u$w62#$CSD7E|@+m?U(1Da%&7uyLVtT^#M8WzxWE^$~& zzOT?^Ffb**(N!BT#q@fScKkyH?5~-%GVgG*$|BoAB}{A*c1#t3T)_1eY{B#{0$4@a zNx-2fbkPbx*_hU$z;cMkcMscPKYlv6p>?gfu6SSl22GREd8B-R+clL`cw&*qlMPZ? zt!k*2Ji=hn-(+s}_W-)+Le~aPcKtCyVCiGqN63B&Z^9o$pLE#MQmB1ZG`V8193T>3 z9^E4kiZrHJh*iKPI}bqFakTVtQ2&$w7M=|hvzQZA#lxAE?PjiEB0K@1; zsAQl4T64V)8k8Q524@{WqM#!qV%a9+kON#%gvEhj6TmIq2U=e)Sv;1wIOmkn4-VDP*RgTJ!M6Y{xYnC?X@dSN4e}Xeo-!z$PjSOn8aS9DZF6 zjn^17rLMb6^W6+bP!X-=v0SQ1oI=wIA=270E68}BW(>i-BJAoRgdCHd2C9--zgrOz z`ZeiyD!tPZ+_q?DXMrb}T_j4a{6RN^RtfyuOmKL;flh77BX63*q|@$U{Vj{_kM&lz zULPilc0ilJjWsVEcd=S}t2^dN0BhP=6NG5}U<6+3;M4^M5YcNoVp7O=oQCWgkwp&b zdlT$@(KMnWLMv5H&XETVkSFAnOM1&Te+lqqmYv{PqP~!Ph;J)f=Efs$lg16scZ+yE z(m~geT6@8FIWs3mNzqtf6EUrc;`Ot z?XKlwZ-Xyu#SMlLqLZZ7!pPP>N#R08*J+U%L^ZG&O9gxe!A_m4wQuy@j=e#GF*n6; z{ded{*wl55dz*a12Gg+fJVbAfPcIPp)@5+A7+ib}*S994nI^6=Ih3M3rHB7&&(9n= znV=|g+N@MvYaWvjPY?{%kiwO-=JlZ=@_?B8kenf}r4x+Hq6++3cZek>^^*X;l2l0Z>ykI?E8{y9n?(E+&hj5c*XxX@O)~6PFa_L69Uv0OzUJZe zgfm~ULpA(UYTNHftzH)JdyUg{r<}Tlk}<0`<0Q4|nI}#XTzUa0I*-Qbz1Y?4ZrQgy zytxp*CMt%9CLYK57e?fXdS1dOb*QshBUjPwCW0IjHR0mK$phM6ioAC@Q=r+X>?y;1 z0;ijaF1^HsFTDu8+zxW74Yu?de#a#28Fb|+hvpk9s=ryZ7iEcSRL@&>RQi`FH)d%Q z1k?L!o-9;qn_-f!n+9Lq%BQs+o<6VKecqSIUhJa`PNjK8;a=48>vS=6^1I1epPm)t zy~D$!OXL6W2>sUq*JNlgkM+$Nj@voAl}(A-6QG>W!P6RpA*L8WRq8oTRfb_zjLV!m zRYAFF`j!B~LN9UNJ8#$;i;Jvbylwza&T?0>^BA%&xJ7`>qxb=yBSQ3MEQ#dX) z2KRo=^gR$7fGpT4=HV49x)8K$rgtLl4H0)Aak0qQCbSxfJA_kOjPX(`CWR{C z$)|-v)_++M=aW3FP%q1HXPt>K*KR;Tj13ahluP*6B){ErC=gE>NHVvF>&n3MH$}X6 z8!5A`%H)JC%8^RDrM((SGMAPf z?e8L2{fJE*cNy?N{w4gh(3lb-`aaCABVa%(9Q2z=^ z#XRf;TAw!{2QEWOYknYq^DNYux|bioaxA|lzy#XplqiPL4^+Qu$MvzG1#%7R_tTe$ zHe!Pd)^h@WvMY_`zEgzP5x4@0@wc@GL+pyxS=g0Yh{}O#l{fYIb4V0h%Vaxx$Go_n zoxA2P9+$OtpO<$HV}kAI{>cmOXMMlBJ`w$$etL2F9b)Ujm*gkMN`S(i0+i&rWJ?xT zON@lN)N@H7=O_>;7rW%qM5VJ=e{Uf}Y|DhB=}lzZU>QGaSidN|}^+PG;0UK(3RI9mj31F;K82D=`Tf`#ik9P)1B7;D$(! z#s?f0$U~2rizKcOjcz?Bzb7Yrqx@hCyiO4cIiV|ux30%t5jb(VlI`t^5z&0=F_-=G zSgUQFvsgWcpmTgG3oA@g5?uUT3q#-y?r<|LAF-UTI zu0}j!wC8Anf#k({OguvFRs3*ODO6Fo(Aw=s;&_V7Mr|D2*u5+sp$6E>saAJz4-_R+fCUj zV^g$mNw2Cx>h2_NRJ-1e;7V-sQ?B2GOWB{o?W!4f(+KWZ6Uie{D) z;1>iHu*63*5&lm%zIxH;Ii2hPmkko$cKOJz7|2EWjJVh=TQYLbQ1g)II*FXZl23@7 zqiJ?6exR{|WFcBKuDgbz$7W0j_<43887bw5*8-@rTeFmS(SSPKOyKgx^MBc3u#W`M zb{|$z36n_A`hvJD$fXDA^OyqA1$6NYHMPbyoPZ4NyBK4acEfA6gEh)kk&y3NM4c*? zG(0r4@p6AKp&=I}ymH%>3`NQj7nQuBiGYhBIjW-yGsq@5ja(ch&furpH~N?8ABs9T zgm`==G%eb8=hPIKYeu>Gdyb9GwZLUGV|7GhD&_QMaZte?7uebc`%tU7h`_xr@jNdd z3~|iPmGrJEGZ|O1?~{XEDNQtJB?8A!RTOW}HMAfnookDM?Z%m?%-D812-iZIzDYK@ zBR_XHM8ocoeB6%5H8QFNGG~fu-xOZ+X4NvvY|T$;>*+wC)jBR6N`EGl-Flcm%LcO# zEkIvKTF8wi<7)1r+j&f7wal3lvR{PV&YxRxWYeMo-C-HhU|=hxI+*iQ$h)j5v!yM5 zGIiY+xI*7F^75Fo%Dsp5gt4q+cZ!me`d>O|@YvOcHxOe7rxDKUd9Vi>dsn z&-Z%uYE~$*i&rbhtR=>$(q0G68|&}TMXb#Lv|tzt7{jTpPm6*+6to%)cEb#KH(k>t z7B^LC@ME&w(xFhDCl-16V67? z#Dn=b@t4daPa&M@dX-^z)NSC!cLI8P2I%SiT*c=NE;6>SFllBiLzu9ZStP2fmcC$3 zd)63S7kJLL5k(||N6FMiGulp!8o{gUsgW7gwo66Y?qemO(2zoilI&66Vqu3e?M8Lp z7RoGmRO1lb2QZzB)qPrik>1UFnhs-^481iC^}6^a5=)xMDZ<`-Uf!O=;Py#+JMn6D zbWioW-js{o3pkwd>yqyt!$YMyFM7Y$pxf}~9U@K@+xj#@VHE#7ae?CXNy%o<8qH`2 zyk+W`5l-K$Xp39`rxEUYHUGTJXr;)cP`o-h3cYFuQ)E+D&@=PFkKn!f8}o=M5@Mea zq)xE?$ku2^%Ay$caEJ8F#kxI1(nG(|vg1SUl&$)WYfVKDn})Uxhh^BfY$%8ymZF`W z8iScV*qdWx7CgM;Rn)o5mE7PdIz=XP6;%WuqMtr`W&evxRVqgiNsyB`82*XTu9tYd z27Y#by7kB0DLmy#Q&fzq_cryvRC_StDl-a(;*M1_o;M60ATM4?5pNRTXBz&R087Kv zI*|YDI9J+-Kr-c!C_Rp2aD*N7a~72&8uy?qvxGs&%9RvyEH9lAMwVNfs8xr#D{So4 zGUCVC#_+Ry6d=>G?S--@I$HU97sT*cu&Ej%6vLJ5!glhL)s&!Wuxsq^bl`{$4WYij zUS#J@VQg&d6nXdG(?YAgYt|n1Z#!o<>Tu77?*g3YtUxphh@zy z@%&%q+#Jx0)swj$3PG3h3uP8IQ9Xz$&oolg>o3Enk5}eGDJlDwWQ@rlyo62cqcGf$ zq*kS(InbrX7d8q#3ZYf<%MHNP^yyDwpo2u!uOKx#qzEoc;q46Pq7NR9MYff=I49m} zQEPb5dN*v@&jRkI8bHd44!t_O+&I{xbZw&fffD9Yu(NzWho*fZ5C4&KU@t;KC;bor zp=OA57#X4CLiU05e|WUt`>=?X7$0rpt}SFa$|^bh-R>+ zqTfZd>IrdGTwlD*ElfT761juAUd5R9wfw-YSfYNjPMnd_EAv)BeV!EyH-nUd$=C*G z8yV(4>Z%4Lgn%XWY|7_950M2hz7H0+`sZ^f0nQtm=s88yhuM;2GQ019GoJB@IK0^;|g_-d(y#b^svbNYzcZ9a{$Oy*XD z1wyZD^$@)1M!8c8bvl3dwXuBBdO88enoAj?LwT-26Y8d%!iI0dDn_;{LlFEvxS97#uYH5it^x2C) zY6J8^_LR%+Nh+10+=wG9%M8ic+GgeJ(+;;Iz^OrB&${y&V6>7i#Ph4=1%qUFNf29} zL}|@eIRDtc>4&q!#U!*=#tvfQ&;orMOne2J&De<}y3^qIGa*qTl6}KcK^^v3Q7Jn5 zr3SJfstSwjwgvhRiYqFj7;Q)DUmW@!ttKjjvb$I9jeUqX;gmrP=970IMbca%=Z--M zWSr!MS0k3DlZI#QI_ZPLOhM{LgWmdjrSna-!zFCl(E=67U!lx8ZwaP=bs`H;;?v&J(vVa@n?#*8H)zjWH2t@} zUdIP|n~Z1KeIJS&i{jSE;y}E}Em(nQwKIZBW+RF#uvT_b~FgcJ2OGRFF0h zhSJY?f>5cG_Fx@q*4u^UbX!*`1heY1Qn4kCJ4}V(qFTx5hF>`J7 zTtQ0suB*vmO+yQEKcBX^YJGJ1Y1~}_1gvtg9@Zl>65C$X$$HT$`GzFly(;v@CE>M3SOY07^F$7{Zc3J)xsTqmIF&Pw5P^`zE;YxP$&?~u~_G6jw z%|Fi{b$GGqqFhgE=d36ZDI{3qJSt z)S%=6Jx*To?yj(xxdxP$-Ao>5g9H@7V1+pvMPFZ=CeFRxA)K58E9VeklB z9@VX1wjf1SChdM=SyyV|#-DdIXhf6L9>S7;AH5~7crc1j`)#D^dGoo?;d0kq@JD&G zhfhx4M$lP#y`DGx1ru~OB)qmEv+RroL`~Ps;=H*nr_3-!q E4@H7~dH?_b literal 0 HcmV?d00001 diff --git a/src/machine-learning-and-data-mining/main.tex b/src/machine-learning-and-data-mining/main.tex index 3593190..e25791d 100644 --- a/src/machine-learning-and-data-mining/main.tex +++ b/src/machine-learning-and-data-mining/main.tex @@ -29,7 +29,9 @@ \input{sections/_data_lake.tex} \input{sections/_crisp.tex} \input{sections/_machine_learning.tex} + \input{sections/_data_prepro.tex} \input{sections/_classification.tex} \input{sections/_regression.tex} + \input{sections/_clustering.tex} \end{document} \ No newline at end of file diff --git a/src/machine-learning-and-data-mining/sections/_classification.tex b/src/machine-learning-and-data-mining/sections/_classification.tex index 01c8909..33310b2 100644 --- a/src/machine-learning-and-data-mining/sections/_classification.tex +++ b/src/machine-learning-and-data-mining/sections/_classification.tex @@ -269,14 +269,6 @@ a macro (unweighted) average or a class-weighted average. When $\kappa = 1$, there is perfect agreement ($\sum_{i}^{\texttt{classes}} TP_i = 1$), when $\kappa = -1$, there is total disagreement ($\sum_{i}^{\texttt{classes}} TP_i = 0$) and when $\kappa = 0$, there is random agreement. - - - \item[Cost sensitive learning] \marginnote{Cost sensitive learning} - Assign a cost to the errors. This can be done by: - \begin{itemize} - \item Altering the proportions of the dataset by duplicating samples to reduce its misclassification. - \item Weighting the classes (possible in some algorithms). - \end{itemize} \end{description} @@ -317,6 +309,35 @@ a macro (unweighted) average or a class-weighted average. \end{description} +\subsection{Data imbalance} +A classifier may not perform well when predicting a minority class of the training data. +Possible solutions are: +\begin{descriptionlist} + \item[Undersampling] \marginnote{Undersampling} + Randomly reduce the number of example of the majority classes. + + \item[Oversampling] \marginnote{Oversampling} + Increase the examples of the minority classes. + + \begin{description} + \item[Synthetic minority oversampling technique (SMOTE)] \marginnote{SMOTE} + \begin{enumerate} + \item Randomly select an example $x$ belonging to the minority class. + \item Select a random neighbor $z_i$ among its $k$-nearest neighbors $z_1, \dots, z_k$. + \item Synthetize a new example by selecting a random point of the feature space between $x$ and $z_i$. + \end{enumerate} + \end{description} + + \item[Cost sensitive learning] \marginnote{Cost sensitive learning} + Assign a cost to the errors. Higher weights are assigned to minority classes. + This can be done by: + \begin{itemize} + \item Altering the proportions of the dataset by duplicating samples to reduce its misclassification. + \item Weighting the classes (possible in some algorithms). + \end{itemize} +\end{descriptionlist} + + \section{Decision trees} diff --git a/src/machine-learning-and-data-mining/sections/_clustering.tex b/src/machine-learning-and-data-mining/sections/_clustering.tex new file mode 100644 index 0000000..3ba3d57 --- /dev/null +++ b/src/machine-learning-and-data-mining/sections/_clustering.tex @@ -0,0 +1,137 @@ +\chapter{Clustering} + + +\section{Similarity and dissimilarity} + +\begin{description} + \item[Similarity] \marginnote{Similarity} + Measures how alike two objects are. + Often defined in the range $[0, 1]$. + + \item[Dissimilarity] \marginnote{Dissimilarity} + Measures how two objects differ. + 0 indicates no difference while the upper-bound varies. +\end{description} + +\begin{table}[ht] + \centering + \renewcommand{\arraystretch}{2} + \begin{tabular}{c | c | c} + \textbf{Attribute type} & \textbf{Dissimilarity} & \textbf{Similarity} \\ + \hline + Nominal & $d(p, q) = \begin{cases} 0 & \text{if } p=q \\ 1 & \text{if } p \neq q \end{cases}$ & $s(p, q) = 1 - d(p, q)$ \\ + \hline + Ordinal & $d(p, q) = \frac{\vert p - q \vert}{V}$ with $p, q \in \{ 0, \dots, V \}$ & $s(p, q) = 1 - d(p, q)$ \\ + \hline + Interval or ratio & $d(p, q) = \vert p - q \vert$ & $s(p, q) = \frac{1}{1 + d(p, q)}$ + \end{tabular} + \caption{Similarity and dissimilarity by attribute type} +\end{table} + +\begin{description} + \item[Similarity properties] \phantom{} + \begin{enumerate} + \item $\texttt{sim}(p, q) = 1$ iff $p = q$. + \item $\texttt{sim}(p, q) = \texttt{sim}(q, p)$. + \end{enumerate} +\end{description} + + +\subsection{Distance} + +Given two $D$-dimensional data entries $p$ and $q$, possible distance metrics are: +\begin{descriptionlist} + \item[Minkowski distance ($L_r$)] \marginnote{Minkowski distance} + \[ \texttt{dist}(p, q) = \left( \sum_{d=1}^{D} \vert p_d - q_d \vert^r \right)^{\frac{1}{r}} \] + where $r$ is a parameter. + + Common values for $r$ are: + \begin{descriptionlist} + \item[$r = 1$] + Corresponds to the $L_1$ norm. + It is useful for discriminating 0 distance and near-0 distance as + an $\varepsilon$ change in the data corresponds to an $\varepsilon$ change in the distance. + \item[$r = 2$] + Corresponds to the Euclidean distance or $L_2$ norm. + \item[$r = \infty$] + Corresponds to the $L_\infty$ norm. + Considers only the dimensions with the maximum difference. + \end{descriptionlist} + + \item[Mahalanobis distance] \marginnote{Mahalanobis distance} + \[ \texttt{dist}(p, q) = \sqrt{ (p-q) \matr{\Sigma}^{-1} (p-q)^T } \] + where $\matr{\Sigma}$ is the covariance matrix of the dataset. + The Mahalanobis distance of $p$ and $q$ increases when the segment connecting them + points towards a direction of greater variation of the data. + + \begin{figure}[h] + \centering + \includegraphics[width=0.35\textwidth]{img/mahalanobis.png} + \caption{The Mahalanobis distance between $(A, B)$ is greater than $(A, C)$, while the Euclidean distance is the same.} + \end{figure} +\end{descriptionlist} + +\subsubsection{Distance properties} +\begin{descriptionlist} + \item[Positive definiteness] + $\texttt{dist}(p, q) \geq 0$ and $\texttt{dist}(p, q) = 0$ iff $p = q$. + \item[Symmetry] + $\texttt{dist}(p, q) = \texttt{dist}(q, p)$ + \item[Triangle inequality] + $\texttt{dist}(p, q) \leq \texttt{dist}(p, r) + \texttt{dist}(r, q)$ +\end{descriptionlist} + + + +\subsection{Vector similarity} + +\begin{description} + \item[Binary vectors] + Given two examples $p$ and $q$ with binary features, we can compute the following values: + \[ + \begin{split} + M_{00} &= \text{ number of features that equals to 0 for both $p$ and $q$} \\ + M_{01} &= \text{ number of features that equals to 0 for $p$ and 1 for $q$} \\ + M_{10} &= \text{ number of features that equals to 1 for $p$ and 0 for $q$} \\ + M_{11} &= \text{ number of features that equals to 1 for both $p$ and $q$} + \end{split} + \] + Possible distance metrics are: + \begin{descriptionlist} + \item[Simple matching coefficient] \marginnote{Simple matching coefficient} + $\texttt{SMC}(p, q) = \frac{M_{00} + M_{11}}{M_{00} + M_{01} + M_{10} + M_{11}}$ + \item[Jaccard coefficient] \marginnote{Jaccard coefficient} + $\texttt{JC}(p, q) = \frac{M_{11}}{M_{01} + M_{10} + M_{11}}$ + \end{descriptionlist} + + \item[Cosine similarity] \marginnote{Cosine similarity} + Cosine of the angle between two vectors: + \[ \texttt{cos}(p, q) = \frac{p \cdot q}{\Vert p \Vert \cdot \Vert q \Vert} \] + + \item[Extended Jaccard coefficient (Tanimoto)] \marginnote{Extended Jaccard coefficient (Tanimoto)} + Variation of the Jaccard coefficient for continuous values: + \[ \texttt{T}(p, q) = \frac{p \cdot q}{\Vert p \Vert^2 + \Vert q \Vert^2 - p \cdot q} \] +\end{description} + + +\subsection{Correlation} + +\begin{description} + \item[Pearson's correlation] \marginnote{Pearson's correlation} + Measure of linear relationship between a pair of quantitative attributes $e_1$ and $e_2$. + To compute the Pearson's correlation, the values of $e_1$ and $e_2$ are first standardized and then ordered to obtain the vectors $\vec{e}_1$ and $\vec{e}_2$. + The correlation is then computed as the dot product between $\vec{e}_1$ and $\vec{e}_2$: + \[ \texttt{corr}(e_1, e_2) = \langle \vec{e}_1, \vec{e}_2 \rangle \] + + Pearson's correlation has the following properties: + \begin{itemize} + \item If the variables are independent, then the correlation is 0 (but not vice versa). + \item If the correlation is 0, then there is no linear relationship between the variables. + \item $+1$ implies positive linear relationship, $-1$ implies negative linear relationship. + \end{itemize} + + \item[Symmetric uncertainty] + Measure of correlation for nominal attributes: + \[ U(e_1, e_2) = 2 \frac{H(e_1) + H(e_2) - H(e_1, e_2)}{H(e_1) + H(e_2)} \in [0, 1] \] + where $H$ is the entropy. +\end{description} \ No newline at end of file diff --git a/src/machine-learning-and-data-mining/sections/_data_prepro.tex b/src/machine-learning-and-data-mining/sections/_data_prepro.tex new file mode 100644 index 0000000..95a1d88 --- /dev/null +++ b/src/machine-learning-and-data-mining/sections/_data_prepro.tex @@ -0,0 +1,163 @@ +\chapter{Data preprocessing} + +\section{Aggregation} +\marginnote{Aggregation} + +Combining multiple attributes into a single one. +Useful for: +\begin{descriptionlist} + \item[Data reduction] + Reduce the number of attributes. + + \item[Change of scale] + View the data in a more general level of detail (e.g. from cities and regions to countries). + + \item[Data stability] + Aggregated data tend to have less variability. +\end{descriptionlist} + + + +\section{Sampling} +\marginnote{Sampling} +Sampling can be used when the full dataset is too expensive to obtain or too expensive to process. +Obviously a sample has to be representative. + +Type of sampling techniques are: +\begin{descriptionlist} + \item[Simple random] \marginnote{Simple random} + Extraction of a single element following a given probability distribution. + + \item[With replacement] \marginnote{With replacement} + Multiple extractions with repetitions following a given probability distribution + (i.e. multiple simple random extractions). + + If the population is small, the sample may underestimate the actual population. + + \item[Without replacement] \marginnote{Without replacement} + Multiple extractions without repetitions following a given probability distribution. + + \item[Stratified] \marginnote{Stratified} + Split the data and sample from each partition. + Useful when the partitions are homogenous. +\end{descriptionlist} + +\begin{description} + \item[Sample size] + The sampling size represents a tradeoff between data reduction and precision. + In a labeled dataset, it is important to consider the probability of sampling data of all the possible classes. +\end{description} + + + +\section{Dimensionality reduction} + +\begin{description} + \item[Curse of dimensionality] \marginnote{Curse of dimensionality} + Data with a high number of dimensions result in a sparse feature space + where distance metrics are ineffective. + + \item[Dimensionality reduction] \marginnote{Dimensionality reduction} + Useful to: + \begin{itemize} + \item Avoid the curse of dimensionality. + \item Reduce noise. + \item Reduce the time and space complexity of mining and learning algorithms. + \item Visualize multi-dimensional data. + \end{itemize} +\end{description} + +\subsection{Principal component analysis} \marginnote{PCA} +Projection of the data into a lower-dimensional space that maximizes the variance of the data. +It can be proven that this problem can be solved by finding the eigenvectors of the covariance matrix of the data. + +\subsection{Feature subset selection} \marginnote{Feature subset selection} + Local technique to reduce dimensionality by: + \begin{itemize} + \item Removing redundant attributes. + \item Removing irrelevant attributes. + \end{itemize} + + This can be achieved by: + \begin{descriptionlist} + \item[Brute force] + Try all the possible subsets of the dataset. + + \item[Embedded approach] + Feature selection is naturally done by the learning algorithm (e.g. decision trees). + + \item[Filter approach] + Features are filtered using domain-specific knowledge. + + \item[Wrapper approaches] + A mining algorithm is used to select the best features. + \end{descriptionlist} + + + + +\section{Feature creation} +\marginnote{Feature creation} +Useful to help a learning algorithm capture data characteristics. +Possible approaches are: +\begin{descriptionlist} + \item[Feature extraction] + Features extracted from the existing ones (e.g. from a picture of a face, the eye distance can be a new feature). + + \item[Mapping] + Projecting the data into a new feature space. + + \item[New features] + Add new, possibly redundant, features. +\end{descriptionlist} + + + +\section{Data type conversion} + +\subsection{One-hot encoding} \marginnote{One-hot encoding} + A discrete feature $E \in \{ e_1, \dots, e_n \}$ with $n$ unique values is replaced with + $n$ new binary features $H_{e_1}, \dots, H_{e_n}$ each corresponding to a value of $E$. + For each entry, if its feature $E$ has value $e_i$, then $H_{e_i} = \texttt{true}$ and the rests are \texttt{false}. + +\subsection{Ordinal encoding} \marginnote{Ordinal encoding} + A feature whose values have an ordering can be converted in a consecutive sequence of integers + (e.g. ["good", "neutral", "bad"] $\mapsto$ [1, 0, -1]). + +\subsection{Discretization} \marginnote{Discretization} + Convert a continuous feature to a discrete one. + \begin{description} + \item[Binarization] \marginnote{Binarization} + Given a continuous feature and a threshold, + it can be replaced with a new binary feature that is \texttt{true} if the value is above the threshold and \texttt{false} otherwise. + + \item[Thresholding] \marginnote{Thresholding} + Same as binarization but using multiple thresholds. + + \item[K-bins] \marginnote{K-bins} + A continuous feature is discretized using $k$ bins each representing an integer from $0$ to $k-1$. + \end{description} + + + +\section{Attribute transformation} +Useful for normalizing features with different scales and outliers. + +\begin{description} + \item[Mapping] \marginnote{Mapping} + Map the domain of a feature into a new set of values (i.e. apply a function). + + \item[Standardization] \marginnote{Standardization} + Transform a feature with Gaussian distribution into a standard distribution. + \[ x = \frac{x - \mu}{\sigma} \] + + \item[Rescaling] \marginnote{Rescaling} + Map a feature into a fixed range (e.g. scale to $[0, 1]$ or $[-1, 1]$). + + \item[Affine transformation] \marginnote{Affine transformation} + Apply a linear transformation on a feature before rescaling it. + This method is more robust to outliers. + + \item[Normalization] \marginnote{Normalization} + Normalize each data row to unit norm. +\end{description}