{"id":746,"date":"2021-10-30T11:06:42","date_gmt":"2021-10-30T11:06:42","guid":{"rendered":"https:\/\/www.riskideas.com\/?p=746"},"modified":"2023-08-01T14:53:17","modified_gmt":"2023-08-01T14:53:17","slug":"a-topological-perspective-of-linear-regression","status":"publish","type":"post","link":"https:\/\/www.riskideas.com\/index.php\/2021\/10\/30\/a-topological-perspective-of-linear-regression\/","title":{"rendered":"A Topological Perspective of Linear Regression"},"content":{"rendered":"\n<p>Suppose you were presented with the following four graphical representations of datasets <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-3332e73fbb027cfaaecf29fd35567875_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#123;&#40;&#120;&#95;&#105;&#44;&#32;&#121;&#95;&#105;&#41;&#92;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"67\" style=\"vertical-align: -5px;\"\/> with the question: which of the following data sets follow a linear model?<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/uploads\/2021\/08\/Screen-Shot-2021-08-03-at-20.13.44-1024x913.png\" alt=\"\" class=\"wp-image-809\" width=\"720\" height=\"641\" srcset=\"https:\/\/www.riskideas.com\/wp-content\/uploads\/2021\/08\/Screen-Shot-2021-08-03-at-20.13.44-1024x913.png 1024w, https:\/\/www.riskideas.com\/wp-content\/uploads\/2021\/08\/Screen-Shot-2021-08-03-at-20.13.44-300x267.png 300w, https:\/\/www.riskideas.com\/wp-content\/uploads\/2021\/08\/Screen-Shot-2021-08-03-at-20.13.44-768x685.png 768w, https:\/\/www.riskideas.com\/wp-content\/uploads\/2021\/08\/Screen-Shot-2021-08-03-at-20.13.44-100x89.png 100w, https:\/\/www.riskideas.com\/wp-content\/uploads\/2021\/08\/Screen-Shot-2021-08-03-at-20.13.44-864x770.png 864w, https:\/\/www.riskideas.com\/wp-content\/uploads\/2021\/08\/Screen-Shot-2021-08-03-at-20.13.44-1200x1070.png 1200w, https:\/\/www.riskideas.com\/wp-content\/uploads\/2021\/08\/Screen-Shot-2021-08-03-at-20.13.44.png 1492w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><figcaption class=\"wp-element-caption\">Figure 1. Four Data Sets<\/figcaption><\/figure><\/div>\n\n\n<p>Almost everyone will agree that B is linear. Some might also add A to the list of linear models, and very few would say that C and D are linear. In this article I will show that A, B, and C are in fact linear models, with a caveat regarding C, and D is linearizable with the same caveat as C.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<p>First we want to understand what it means to be linear. I find the most practical definition to be: <strong>if we can solve the model using least squares (a closed form solution) then we have a linear model<\/strong>. But in this definition lies a lot of interesting nuances. Let&#8217;s start with something classic, one dimensional linear regression (or what we have in dataset B).<\/p>\n\n\n\n<p>Most of us are familiar with the representation:<p class=\"ql-center-displayed-equation\" style=\"line-height: 17px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-87ab85b585c5f55b0f1847c6bfedee8c_l3.png\" height=\"17\" width=\"109\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#36;&#36;&#121;&#95;&#105;&#61;&#92;&#98;&#101;&#116;&#97;&#95;&#49;&#32;&#120;&#95;&#105;&#32;&#43;&#32;&#92;&#98;&#101;&#116;&#97;&#95;&#48;&#36;&#36;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>which is solved via the least squares regression:<p class=\"ql-center-displayed-equation\" style=\"line-height: 22px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-dd14959f1078f34455c701a2a5af033a_l3.png\" height=\"22\" width=\"148\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#36;&#36;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#92;&#98;&#101;&#116;&#97;&#125;&#61;&#40;&#88;&#94;&#84;&#32;&#88;&#41;&#94;&#123;&#45;&#49;&#125;&#88;&#94;&#84;&#32;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#121;&#125;&#36;&#36;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>where: <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-491096d0b064912168d4bbea954e1b10_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#92;&#98;&#101;&#116;&#97;&#125;&#61;&#91;&#92;&#98;&#101;&#116;&#97;&#95;&#48;&#44;&#32;&#92;&#98;&#101;&#116;&#97;&#95;&#49;&#93;&#94;&#84;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"98\" style=\"vertical-align: -5px;\"\/>, <meta charset=\"utf-8\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-81c8a6509e4dd8e2f647ea0082f0919c_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#121;&#125;&#61;&#91;&#121;&#95;&#48;&#44;&#32;&#46;&#46;&#46;&#44;&#32;&#121;&#95;&#78;&#93;&#94;&#84;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"123\" style=\"vertical-align: -5px;\"\/>, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-f25ef5f1bb58c78bcb8e63f37072c87e_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#120;&#125;&#61;&#91;&#120;&#95;&#48;&#44;&#32;&#46;&#46;&#46;&#44;&#32;&#120;&#95;&#78;&#93;&#94;&#84;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"126\" style=\"vertical-align: -5px;\"\/> and <meta charset=\"utf-8\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-342e91f462c5e9bf819870b43a449160_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#88;&#61;&#91;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#49;&#125;&#44;&#32;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#120;&#125;&#93;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"77\" style=\"vertical-align: -5px;\"\/>. Unfortunately, simply memorizing formulas hides the magic that acts underneath. So let us unpack the least squares regression with a bit of derivation.<\/p>\n\n\n\n<p>The linear model above was somewhat misleading. Truly, the full representation pays attention to the error term:<meta charset=\"utf-8\"><p class=\"ql-center-displayed-equation\" style=\"line-height: 17px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-2d1434f9d15b97753f4af503ca983c69_l3.png\" height=\"17\" width=\"145\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#36;&#36;&#121;&#95;&#105;&#61;&#92;&#98;&#101;&#116;&#97;&#95;&#49;&#32;&#120;&#95;&#105;&#32;&#43;&#32;&#92;&#98;&#101;&#116;&#97;&#95;&#48;&#32;&#43;&#32;&#92;&#101;&#112;&#115;&#105;&#108;&#111;&#110;&#95;&#105;&#36;&#36;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>Actually, we can write this model slightly different:<meta charset=\"utf-8\"><p class=\"ql-center-displayed-equation\" style=\"line-height: 17px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-501930058c83369a406eaf09612354e5_l3.png\" height=\"17\" width=\"145\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#36;&#36;&#92;&#101;&#112;&#115;&#105;&#108;&#111;&#110;&#95;&#105;&#61;&#92;&#98;&#101;&#116;&#97;&#95;&#49;&#32;&#120;&#95;&#105;&#32;&#43;&#32;&#92;&#98;&#101;&#116;&#97;&#95;&#48;&#32;&#45;&#32;&#121;&#95;&#105;&#36;&#36;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p><\/p>\n\n\n\n<p>Now suppose that <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-f0aa232b2f53bc9466aa901bd42d2ed6_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#101;&#112;&#115;&#105;&#108;&#111;&#110;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"12\" style=\"vertical-align: -3px;\"\/> is normally distributed, then we have:<meta charset=\"utf-8\"><p class=\"ql-center-displayed-equation\" style=\"line-height: 45px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-afa6c8fe4e6a071754df29be2f37e6b6_l3.png\" height=\"45\" width=\"495\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#36;&#36;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#80;&#125;&#92;&#123;&#92;&#101;&#112;&#115;&#105;&#108;&#111;&#110;&#95;&#105;&#124;&#92;&#109;&#97;&#116;&#104;&#98;&#98;&#123;&#92;&#98;&#101;&#116;&#97;&#125;&#92;&#125;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#50;&#92;&#112;&#105;&#92;&#115;&#105;&#103;&#109;&#97;&#94;&#50;&#125;&#125;&#101;&#94;&#123;&#45;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#101;&#112;&#115;&#105;&#108;&#111;&#110;&#95;&#105;&#94;&#50;&#125;&#123;&#50;&#92;&#115;&#105;&#103;&#109;&#97;&#94;&#50;&#125;&#125;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#92;&#115;&#113;&#114;&#116;&#123;&#50;&#92;&#112;&#105;&#92;&#115;&#105;&#103;&#109;&#97;&#94;&#50;&#125;&#125;&#101;&#94;&#123;&#45;&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#92;&#98;&#101;&#116;&#97;&#95;&#49;&#32;&#120;&#95;&#105;&#32;&#43;&#32;&#92;&#98;&#101;&#116;&#97;&#95;&#48;&#32;&#45;&#32;&#121;&#95;&#105;&#41;&#94;&#50;&#125;&#123;&#50;&#92;&#115;&#105;&#103;&#109;&#97;&#94;&#50;&#125;&#125;&#61;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#80;&#125;&#92;&#123;&#40;&#120;&#95;&#105;&#44;&#121;&#95;&#105;&#41;&#124;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#92;&#98;&#101;&#116;&#97;&#125;&#92;&#125;&#36;&#36;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>And if we look at all the dataset together, we have the joint distribution:<meta charset=\"utf-8\"><p class=\"ql-center-displayed-equation\" style=\"line-height: 49px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-eb91ad15d136aebb7c93378cb1595cd0_l3.png\" height=\"49\" width=\"503\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#36;&#36;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#80;&#125;&#92;&#123;&#92;&#123;&#40;&#120;&#95;&#105;&#44;&#121;&#95;&#105;&#41;&#92;&#125;&#124;&#92;&#109;&#97;&#116;&#104;&#98;&#98;&#123;&#92;&#98;&#101;&#116;&#97;&#125;&#92;&#125;&#61;&#92;&#112;&#114;&#111;&#100;&#95;&#123;&#105;&#125;&#123;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#80;&#125;&#92;&#123;&#40;&#120;&#95;&#105;&#44;&#121;&#95;&#105;&#41;&#124;&#92;&#109;&#97;&#116;&#104;&#98;&#98;&#123;&#92;&#98;&#101;&#116;&#97;&#125;&#92;&#125;&#125;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#40;&#92;&#115;&#113;&#114;&#116;&#123;&#50;&#92;&#112;&#105;&#92;&#115;&#105;&#103;&#109;&#97;&#94;&#50;&#125;&#41;&#94;&#78;&#125;&#101;&#94;&#123;&#45;&#92;&#115;&#117;&#109;&#95;&#123;&#105;&#125;&#123;&#92;&#102;&#114;&#97;&#99;&#123;&#40;&#92;&#98;&#101;&#116;&#97;&#95;&#49;&#32;&#120;&#95;&#105;&#32;&#43;&#32;&#92;&#98;&#101;&#116;&#97;&#95;&#48;&#32;&#45;&#32;&#121;&#95;&#105;&#41;&#94;&#50;&#125;&#123;&#50;&#92;&#115;&#105;&#103;&#109;&#97;&#94;&#50;&#125;&#125;&#125;&#36;&#36;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p><\/p>\n\n\n\n<p>To get the best model we want to maximize the likelihood of the joint distribution over the parameter <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-b6a7605b1bcca8f1b416eaf733f34e08_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#98;&#101;&#116;&#97;\" title=\"Rendered by QuickLaTeX.com\" height=\"17\" width=\"11\" style=\"vertical-align: -4px;\"\/>. Instead of finding the maximum directly, we can also find the maximum of the log of the joint distribution, since both will share the same maximum point because the log function is one-to-one. Taking the log we see the negative sum of square errors, and negating the negative leads us to minimizing the square errors instead &#8211; giving us the least squares regression.<\/p>\n\n\n\n<p>Considering this derivation, we can now give a more fundamental definition of a linear model:<\/p>\n\n\n\n<p><strong>A data generating process is linear if for any sufficiently large sample, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-93ff567377a8672272f238b014314a75_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#123;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#120;&#125;&#95;&#105;&#92;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"32\" style=\"vertical-align: -5px;\"\/>, there exist functions:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-7b169beeff64bb5e342dead5ab1c933a_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;&#40;&#92;&#98;&#101;&#116;&#97;&#124;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#120;&#125;&#95;&#105;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"56\" style=\"vertical-align: -5px;\"\/> that is parameterized by and linear in the variables <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-2475cf9241d72983cc2b3012a014b8de_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#92;&#98;&#101;&#116;&#97;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"17\" width=\"11\" style=\"vertical-align: -4px;\"\/><\/strong><\/li>\n\n\n\n<li><strong><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-3d59876b10edc9eb7903f5f56394ed65_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#103;&#40;&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#120;&#125;&#95;&#105;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"38\" style=\"vertical-align: -5px;\"\/> that is non-constant over <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-ecf030f940f517787145a85f7a90f283_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#109;&#97;&#116;&#104;&#98;&#102;&#123;&#120;&#125;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"16\" style=\"vertical-align: -3px;\"\/><\/strong><\/li>\n<\/ul>\n\n\n\n<p><strong>such that the sum of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-9c09a708375fde2676da319bcdfe8b24_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"10\" style=\"vertical-align: -4px;\"\/> and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-d208fd391fa57c168dc0f151de829fee_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#103;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: -4px;\"\/> describes an N-dimensional normal distribution over the data.<\/strong><\/p>\n\n\n\n<p>Let&#8217;s take this definition and see if we can apply it to cases A to D above.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<h6 class=\"wp-block-heading\">Model A<\/h6>\n\n\n\n<p>If we define <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-700028a77f5a80568629f7c6fd780a68_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;&#40;&#40;&#92;&#109;&#117;&#95;&#120;&#44;&#32;&#92;&#109;&#117;&#95;&#121;&#41;&#124;&#40;&#120;&#95;&#105;&#44;&#32;&#121;&#95;&#105;&#41;&#41;&#61;&#91;&#92;&#109;&#117;&#95;&#120;&#44;&#32;&#92;&#109;&#117;&#95;&#121;&#93;&#94;&#84;\" title=\"Rendered by QuickLaTeX.com\" height=\"21\" width=\"230\" style=\"vertical-align: -6px;\"\/> and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-578321152bdda62e596dc12c8dbacdeb_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#103;&#40;&#120;&#95;&#105;&#44;&#32;&#121;&#95;&#105;&#41;&#61;&#91;&#45;&#120;&#95;&#105;&#44;&#32;&#45;&#121;&#95;&#105;&#93;&#94;&#84;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"169\" style=\"vertical-align: -5px;\"\/>, then our function defines a 2-dimensional normal distribution, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-f33c3de189bb21c083603e240d7f0053_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#78;&#40;&#48;&#44;&#92;&#83;&#105;&#103;&#109;&#97;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"59\" style=\"vertical-align: -5px;\"\/>. The below code demonstrates this (including the code that was used to generate the data):<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import numpy as np\n\nrands = np.random.normal(size=1000)\nx = rands[:500]\/10\ny = rands[500:]\/10\n\nY = np.array([x, y])\nY = np.expand_dims(Y, -1)\nX = np.ones(Y.shape)\nbeta = np.squeeze(np.matmul(np.matmul(np.linalg.inv(np.matmul(np.transpose(X, [0, 2, 1]), X)), np.transpose(X, [0, 2, 1])), Y))\n\nprint(beta)  # prints (will differ slightly per run): [-0.00205515 -0.00054436]<\/pre>\n\n\n\n<p>Before moving to Model B, let&#8217;s first analyze this result. Firstly, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-ee586888106a522acbe0acaa15e83e6f_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#40;&#92;&#109;&#117;&#95;&#120;&#44;&#32;&#92;&#109;&#117;&#95;&#121;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"58\" style=\"vertical-align: -6px;\"\/> turns out to be the average values of <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-3332e73fbb027cfaaecf29fd35567875_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#123;&#40;&#120;&#95;&#105;&#44;&#32;&#121;&#95;&#105;&#41;&#92;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"67\" style=\"vertical-align: -5px;\"\/>. Secondly, we use the data <meta charset=\"utf-8\"><\/meta><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-3332e73fbb027cfaaecf29fd35567875_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#123;&#40;&#120;&#95;&#105;&#44;&#32;&#121;&#95;&#105;&#41;&#92;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"67\" style=\"vertical-align: -5px;\"\/> for the dependent variable and let the regressor be constant. This second point is very important, since the least squares regression won&#8217;t work otherwise. Thirdly (and finally), we did two independent linear regressions, and we can describe this process by a 2-dimensional linear model:<p class=\"ql-center-displayed-equation\" style=\"line-height: 22px;\"><span class=\"ql-right-eqno\"> &nbsp; <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-73af72824b60725ad5238f7feeb7de88_l3.png\" height=\"22\" width=\"227\" class=\"ql-img-displayed-equation quicklatex-auto-format\" alt=\"&#36;&#36;&#91;&#120;&#95;&#105;&#44;&#32;&#121;&#95;&#105;&#93;&#94;&#84;&#32;&#61;&#32;&#91;&#92;&#109;&#117;&#95;&#120;&#44;&#32;&#92;&#109;&#117;&#95;&#121;&#93;&#94;&#84;&#32;&#43;&#32;&#78;&#40;&#48;&#44;&#32;&#92;&#83;&#105;&#103;&#109;&#97;&#41;&#36;&#36;\" title=\"Rendered by QuickLaTeX.com\"\/><\/p>Where <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-3a56934ebacb780284bd45586e00fcaa_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#78;&#40;&#48;&#44;&#32;&#92;&#83;&#105;&#103;&#109;&#97;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"59\" style=\"vertical-align: -5px;\"\/> is a normal distribution generator.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<h6 class=\"wp-block-heading\">Model B<\/h6>\n\n\n\n<p>We did this one already, but let&#8217;s add it here for completeness. If we define <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-4886d175ea0780b3ae909992761ee0ca_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;&#40;&#40;&#92;&#98;&#101;&#116;&#97;&#95;&#48;&#44;&#32;&#92;&#98;&#101;&#116;&#97;&#95;&#49;&#41;&#124;&#40;&#120;&#95;&#105;&#44;&#32;&#121;&#95;&#105;&#41;&#41;&#61;&#92;&#98;&#101;&#116;&#97;&#95;&#48;&#43;&#92;&#98;&#101;&#116;&#97;&#95;&#49;&#32;&#120;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"234\" style=\"vertical-align: -5px;\"\/> and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-6e694c8ee7bba09a6550c0d13b4f59f4_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#103;&#40;&#120;&#95;&#105;&#44;&#32;&#121;&#95;&#105;&#41;&#61;&#45;&#121;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"112\" style=\"vertical-align: -5px;\"\/>, then our function defines a 1-dimensional normal distribution. And the code:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import numpy as np\n\nrands = np.random.normal(size=1000)\nx = np.arange(-1, 1, 0.01\/5*2) + rands[:500]\/10\ny = 2*x+1 + rands[500:]\/10\n\nY = y\nX = np.expand_dims(x, -1)\nX = np.concatenate([np.ones(X.shape), X], -1)\nbeta = np.squeeze(np.matmul(np.matmul(np.linalg.inv(np.matmul(np.transpose(X, [1, 0]), X)), np.transpose(X, [1, 0])), Y))\n\nprint(beta)  # prints (will differ slightly per run): [0.99938605 1.94454423]<\/pre>\n\n\n\n<p>And the results are as expected. We&#8217;ll move on to the next model since there is not much new to see here.<\/p>\n\n\n\n<h6 class=\"wp-block-heading\">Model C<\/h6>\n\n\n\n<p>For this one, let&#8217;s try some polynomial. If we define <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-6a1799d08206316a2a6d0cf2e6d8a62b_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;&#40;&#40;&#92;&#98;&#101;&#116;&#97;&#95;&#48;&#32;&#46;&#46;&#46;&#32;&#92;&#98;&#101;&#116;&#97;&#95;&#52;&#41;&#124;&#40;&#120;&#95;&#105;&#44;&#32;&#121;&#95;&#105;&#41;&#41;&#61;&#92;&#98;&#101;&#116;&#97;&#95;&#48;&#43;&#92;&#98;&#101;&#116;&#97;&#95;&#49;&#32;&#120;&#95;&#105;&#43;&#92;&#98;&#101;&#116;&#97;&#95;&#50;&#32;&#120;&#95;&#105;&#94;&#50;&#43;&#92;&#98;&#101;&#116;&#97;&#95;&#51;&#32;&#120;&#95;&#105;&#94;&#51;&#32;&#43;&#32;&#92;&#98;&#101;&#116;&#97;&#95;&#52;&#32;&#120;&#95;&#105;&#94;&#52;\" title=\"Rendered by QuickLaTeX.com\" height=\"20\" width=\"412\" style=\"vertical-align: -5px;\"\/> and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-6e694c8ee7bba09a6550c0d13b4f59f4_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#103;&#40;&#120;&#95;&#105;&#44;&#32;&#121;&#95;&#105;&#41;&#61;&#45;&#121;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"112\" style=\"vertical-align: -5px;\"\/>, then we are also 1-dimensional. The code:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import numpy as np\n\nrands = np.random.normal(size=1000)\nx = np.arange(-1, 1, 0.01\/5*2)\ny = x+1+2*x**2 + rands[500:]\/10\n\nY = y\nX = np.expand_dims(x, -1)\nX = np.concatenate([np.ones(X.shape), X, X**2, X**3, X**4], -1)\nbeta = np.squeeze(np.matmul(np.matmul(np.linalg.inv(np.matmul(np.transpose(X, [1, 0]), X)), np.transpose(X, [1, 0])), Y))\n\nprint(beta)  # prints (will differ slightly per run): [0.97756608  1.01012193  2.10034334 -0.0032706  -0.10355826]<\/pre>\n\n\n\n<p>Here we do have some new elements to analyze, so let&#8217;s take a moment. Firstly, notice that we did not add an element of randomness to the regressor, <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-c8700e0258243116de0d4f288e2e3b44_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;&#95;&#105;\" title=\"Rendered by QuickLaTeX.com\" height=\"11\" width=\"15\" style=\"vertical-align: -3px;\"\/>. This is especially important to the linearity aspect. When both our dependent variable and regressors were linear in the data, then any randomness attached to their value was linearly added, still resulting in a normally distributed error term. However, squaring and so forth random normal errors does not result in a random normal error &#8211; try this out! Secondly, we would have been fine just fitting a second order polynomial, but even still our result showed us significant results only in the first three coefficients.<\/p>\n\n\n\n<h6 class=\"wp-block-heading\">Model D<\/h6>\n\n\n\n<p>Some data that looks nonlinear is secretly hiding inside it a linear model. Cyclic time-series are one such example. Here, we don&#8217;t have a time-series, per se, but the data does exhibit some cyclicality in it. A suitable function can look like <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-3e90617dfda941424d117b99c473cb60_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#102;&#40;&#40;&#92;&#98;&#101;&#116;&#97;&#95;&#48;&#44;&#32;&#92;&#98;&#101;&#116;&#97;&#95;&#49;&#41;&#124;&#40;&#120;&#95;&#105;&#44;&#32;&#121;&#95;&#105;&#41;&#41;&#61;&#92;&#98;&#101;&#116;&#97;&#95;&#48;&#32;&#43;&#32;&#92;&#98;&#101;&#116;&#97;&#95;&#49;&#32;&#92;&#97;&#114;&#99;&#116;&#97;&#110;&#40;&#92;&#102;&#114;&#97;&#99;&#123;&#121;&#95;&#105;&#125;&#123;&#120;&#95;&#105;&#125;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"22\" width=\"301\" style=\"vertical-align: -8px;\"\/> and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-727110207a26c6dc732158b62d4c72f6_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#103;&#40;&#120;&#95;&#105;&#44;&#121;&#95;&#105;&#41;&#61;&#92;&#115;&#113;&#114;&#116;&#123;&#120;&#95;&#105;&#94;&#50;&#43;&#121;&#95;&#105;&#94;&#50;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"32\" width=\"159\" style=\"vertical-align: -10px;\"\/>. And the code:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import numpy as np\n\nrands = np.random.normal(size=1000)\nt = np.arange(-1, 1, 0.01\/5*2)\nx = np.cos((t + rands[:500]\/10)*3.1459)*(1 + rands[500:]\/10)\ny = np.sin((t + rands[:500]\/10)*3.1459)*(1 + rands[500:]\/10)\n\nr = np.sqrt(x**2 + y**2)\ntt = np.arctan(y\/x)\n\nY = r\nX = np.expand_dims(tt, -1)\nX = np.concatenate([np.ones(X.shape), X], -1)\nbeta = np.squeeze(np.matmul(np.matmul(np.linalg.inv(np.matmul(np.transpose(X, [1, 0]), X)), np.transpose(X, [1, 0])), Y))\n\nprint(beta)  # prints (will differ slightly per run): [0.99578537 -0.0028933]<\/pre>\n\n\n\n<p>We have much more to say here. Firstly, we see that our data fits a circle quite perfectly, as we could have done away with the <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-0218b80f67ea2ecfd9ef6eb775f54541_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#97;&#114;&#99;&#116;&#97;&#110;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"50\" style=\"vertical-align: -1px;\"\/> altogether &#8211; the resulting coefficient is insignificant. Meaning, we essentially have Model A but in a different space. Secondly, notice that unlike Model A, we started here with two input variables and ended up with only one average. We can say that we lost some information, but that is not entirely true. If we attempted to fit the data from Model A using this model, we would get some really bad results &#8211; again, I recommend trying it out. What we gained here is the relationship between the two inputs, that their square sum is 1. Try as you may to get better results and you won&#8217;t. Simply put, we were able to reproduce the underlying generating model perfectly, so any more and we will simply be wrong!<\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<p>In this last section I want to dedicate a few lines to bring all these examples together and circle back to the definition we gave for a linear model above. Firstly, we may ask what good is building models on transformations of the original data, because what interests us is the data, not some other derived data &#8211; i.e. we want some <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-ec88996bf6548406fe4b6cb46d054581_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#121;&#61;&#102;&#40;&#120;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"67\" style=\"vertical-align: -5px;\"\/>. So two things:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The representation above is excellent for anomaly detection, and this is actually how all statistical anomaly detection methods work &#8211; detect some underlying pattern to the data, both <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-0af556714940c351c933bba8cf840796_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#121;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: -4px;\"\/> and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-ede05c264bba0eda080918aaa09c4658_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/>, and see if they lie within the pattern. The framework above works with this pattern perfectly.<\/li>\n\n\n\n<li>If we merely aim for <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-ec88996bf6548406fe4b6cb46d054581_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#121;&#61;&#102;&#40;&#120;&#41;\" title=\"Rendered by QuickLaTeX.com\" height=\"19\" width=\"67\" style=\"vertical-align: -5px;\"\/> then we precondition <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-0af556714940c351c933bba8cf840796_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#121;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: -4px;\"\/> to be the dependent variable and <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-ede05c264bba0eda080918aaa09c4658_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#120;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"10\" style=\"vertical-align: 0px;\"\/> to be the regressor, when it could be the other way around or neither at all. In fact, this framework for thinking about regression will always fail to explain something like Models A and D.<\/li>\n<\/ol>\n\n\n\n<p>If we choose to go with the <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-2751c9330febc2815678351066922f0d_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#98;&#102;&#109;&#97;&#116;&#104;&#123;&#102;&#125;&#44;&#92;&#98;&#102;&#109;&#97;&#116;&#104;&#123;&#103;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"26\" style=\"vertical-align: -4px;\"\/> approach above, then we are able to model a larger class of relationships &#8211; even going as far as the behavior in Model D. Furthermore, we have a closed-form technique to solve for the model, which greatly saves compute time (and degrees of freedom) in implementing gradient descent on non-linear sections of a potentially more complicated model.<\/p>\n\n\n\n<p>However, we need  to be careful in how we think of &#8220;linearizing&#8221; models. An important criteria is that <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-1e66a900ab2e13679b7a834c5232b379_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#98;&#102;&#109;&#97;&#116;&#104;&#123;&#102;&#125;&#43;&#92;&#98;&#102;&#109;&#97;&#116;&#104;&#123;&#103;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"41\" style=\"vertical-align: -4px;\"\/> must be normally distributed over the data, while  <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-27d09c3d038252be9437c2d269778d70_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#98;&#102;&#109;&#97;&#116;&#104;&#123;&#102;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"10\" style=\"vertical-align: -4px;\"\/> is linear in the parameters and  <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.riskideas.com\/wp-content\/ql-cache\/quicklatex.com-2d71b70b956918239e3d15cd650cb4ae_l3.png\" class=\"ql-img-inline-formula quicklatex-auto-format\" alt=\"&#92;&#98;&#102;&#109;&#97;&#116;&#104;&#123;&#103;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"9\" style=\"vertical-align: -4px;\"\/> is non-constant over the data. This is not always the case, and we must pay attention to (and test) this hypothesis before accepting the results.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Suppose you were presented with the following four graphical representations of datasets with the question: which of the following data sets follow a linear model? Almost everyone will agree that &#8230;<\/p>\n","protected":false},"author":1,"featured_media":825,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[79,76],"tags":[],"class_list":["post-746","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data","category-math"],"_links":{"self":[{"href":"https:\/\/www.riskideas.com\/index.php\/wp-json\/wp\/v2\/posts\/746","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.riskideas.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.riskideas.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.riskideas.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.riskideas.com\/index.php\/wp-json\/wp\/v2\/comments?post=746"}],"version-history":[{"count":72,"href":"https:\/\/www.riskideas.com\/index.php\/wp-json\/wp\/v2\/posts\/746\/revisions"}],"predecessor-version":[{"id":873,"href":"https:\/\/www.riskideas.com\/index.php\/wp-json\/wp\/v2\/posts\/746\/revisions\/873"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.riskideas.com\/index.php\/wp-json\/wp\/v2\/media\/825"}],"wp:attachment":[{"href":"https:\/\/www.riskideas.com\/index.php\/wp-json\/wp\/v2\/media?parent=746"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.riskideas.com\/index.php\/wp-json\/wp\/v2\/categories?post=746"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.riskideas.com\/index.php\/wp-json\/wp\/v2\/tags?post=746"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}