Jekyll20230709T14:34:1907:00https://eurisko.us/feed.xmlEurisko{"name"=>nil, "avatar"=>nil, "bio"=>nil, "location"=>nil, "employer"=>nil, "googlescholar"=>nil, "email"=>nil, "uri"=>nil, "bitbucket"=>nil, "codepen"=>nil, "dribbble"=>nil, "flickr"=>nil, "facebook"=>nil, "foursquare"=>nil, "github"=>nil, "google_plus"=>nil, "keybase"=>nil, "instagram"=>nil, "lastfm"=>nil, "linkedin"=>nil, "orcid"=>nil, "pinterest"=>nil, "soundcloud"=>nil, "stackoverflow"=>nil, "steam"=>nil, "tumblr"=>nil, "twitter"=>nil, "vine"=>nil, "weibo"=>nil, "xing"=>nil, "youtube"=>nil, "wikipedia"=>nil}Debugging 10120210601T00:00:0007:0020210601T00:00:0007:00https://eurisko.us/debugging101<p><i>Note to reader: this post was written before students had access to VS Code and its associated debugging capabilities. Consequently, some valuable debugging tools like breakpoints are not covered in the post.</i></p>
<p>Debugging can seem like a difficult challenge, but once you understand how to see what’s under the hood, it becomes much simpler. Many of your problems may not actually be problems at all, but simple placement or division errors. Hopefully these 5 tips will help solve all your problems – they have for me!</p>
<h2>Tip 1: Look at the Error</h2>
<p>If something is wrong, the error will give you the line, file, and sometimes even tell you how it went wrong. This is the most helpful for silly errors as the console will tell you if something is not defined, the wrong type, dividing by 0, etc. Once you know the error, it’ll either be an easy fix or a big problem.</p>
<p>For definition/attribute errors, check the variable names and look around where they are defined. You might have misspelled the name by accident or defined the variable after you called it or the function it’s in. Other than that, check the lines around the error and make sure your parentheses, brackets, curly brackets, quotes, and apostrophes are closed.</p>
<h3>Example 1.1: Misspelled, Missing and Misplaced Items</h3>
<p>Consider the code below. This is the proper version that will output 6 with no errors.</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%">[<span style="color: #0000DD; fontweight: bold">1</span>] one <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">1</span>
[<span style="color: #0000DD; fontweight: bold">2</span>] five <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">5</span>
[<span style="color: #0000DD; fontweight: bold">3</span>] <span style="color: #007020">sum</span> <span style="color: #333333">=</span> one <span style="color: #333333">+</span> five
</pre></div>
</font>
<p><br /></p>
<p>Now, consider the following three snippets. The first snippet has a misspelling.</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%">[<span style="color: #0000DD; fontweight: bold">1</span>] one <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">1</span>
[<span style="color: #0000DD; fontweight: bold">2</span>] fove <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">5</span>
[<span style="color: #0000DD; fontweight: bold">3</span>] <span style="color: #007020">sum</span> <span style="color: #333333">=</span> one <span style="color: #333333">+</span> five
</pre></div>
</font>
<p><br /></p>
<p>The second snippet has a missing item.</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%">[<span style="color: #0000DD; fontweight: bold">1</span>] one <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">1</span>
[<span style="color: #0000DD; fontweight: bold">2</span>]
[<span style="color: #0000DD; fontweight: bold">3</span>] <span style="color: #007020">sum</span> <span style="color: #333333">=</span> one <span style="color: #333333">+</span> five
</pre></div>
</font>
<p><br /></p>
<p>The third snippet has a misplaced item.</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%">[<span style="color: #0000DD; fontweight: bold">1</span>] one <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">1</span>
[<span style="color: #0000DD; fontweight: bold">2</span>]
[<span style="color: #0000DD; fontweight: bold">3</span>] <span style="color: #007020">sum</span> <span style="color: #333333">=</span> one <span style="color: #333333">+</span> five
[<span style="color: #0000DD; fontweight: bold">4</span>] five <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">5</span>
</pre></div>
</font>
<p><br /></p>
<p>All of these have the same error code:</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%">File <span style="backgroundcolor: #fff0f0">"file_name.py"</span>, line <span style="color: #0000DD; fontweight: bold">3</span>, <span style="color: #000000; fontweight: bold">in</span> <span style="color: #333333"><</span>module<span style="color: #333333">></span>
<span style="color: #007020">sum</span> <span style="color: #333333">=</span> one <span style="color: #333333">+</span> five
<span style="color: #FF0000; fontweight: bold">NameError</span>: name <span style="backgroundcolor: #fff0f0">'five'</span> <span style="color: #000000; fontweight: bold">is</span> <span style="color: #000000; fontweight: bold">not</span> defined
</pre></div>
</font>
<p><br /></p>
<p>Anything having to do with a name will be a Name Error. Attribute errors (these have to do with classes) also have the above causes.</p>
<h3>Example 1.2: Wrong Types</h3>
<p>Type errors are very simple: they occur when you try to put two things together that don’t go together. The two kinds I see most often are as follows.</p>
<ul>
<li>Any code of the form <code>'string' + number</code> will have an error code <code>TypeError: can only concatenate str (not "int") to str</code>. With strings, the $+$ will put two strings together. But if the second variable is an integer and not a string, the console will print that it cannot concatenate str with integers. If it was a different kind of variable, the console would say what the variable type is in the quotation marks.</li>
<li>Any code of the form <code>None + number</code> will have an error code <code>TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'</code>. This is saying that there is no way for these two types to combine using the $+$ sign, and then displaying the two types you are trying to use. The types will come up in order of which came first in the code. For these two errors, you just have to find a way so that this doesn't happen. Maybe making an if statement or changing the type first.</li>
</ul>
<h3>Example 1.3: Zero Division</h3>
<p>This is really self explanatory. Anytime you wind up dividing by zero, the console will spit out <code>ZeroDivisionError: division by zero</code>.</p>
<h3>Example 1.4: Syntax Error</h3>
<p>Syntax Errors are your missing ending or starting quotations, parenthesis, brackets, curly brackets, etc, or just when something is typed wrong and is making the computer confused on what you’re trying to tell it to do. The error code usually says where the syntax is going wrong, so you can just head to that area and fix the syntax. If you don’t understand the error code, go to tip 4.</p>
<h2>Tip 2: Look Under the Hood</h2>
<p>There are times where no error is thrown, but you still aren’t getting the right output from your code. During times like this it is best to print out your variables. Printing out your variables can also help show if you are in an infinite while loop, as those will just go forever.</p>
<p>In general, I start by following the returned variable (abbreviated RV) from its initialization to its return to see exactly what happens to it. If the RV is based off of one or more intermediate variables, check first to see if the RV was ever correct in the first place. If not, just keep on backtracking until you get to a variable that IS computed correctly.</p>
<p>Tip 3 might be helpful here depending on how long your code is. Closely monitor and double check every way that variable is changed until the RV is finally correct. Now to double check that everything is working properly, try to run your function with a different set of data. You should know what the correct answer should be and the steps between before you try.</p>
<h3>Example 2.1: Infinite While Loops</h3>
<p>A “while loop” is a kind of loop that will keep on restarting as long as its condition is true. Infinite while loops are created when the condition is never false (when it hits the end of each run through), if there is no “break” command, or if the “break” command is never triggered.</p>
<p>You can check if your loop is infinite by printing out the values of any variables involved in the condition. If things start going too fast for you to read, you can import the <code>time</code> library and use <code>time.sleep(number of seconds)</code> to make the code pause before continuing.</p>
<p>If the condition variables go past the stopping condition, check what values they take at the end of the looped code. You may need to make it a break condition instead (just the “break” command inside an if statement). If the condition variable is cycling, check what it is each time you change it. You may have a logical problem in your code.</p>
<h2>Tip 3: Separate into Chunks</h2>
<p>As your coding gets more advanced, the code you write generally gets longer. Longer code is more difficult to debug, so if there is any code that you don’t actually need, delete it. This reduces how much code you have to look through when something goes wrong. Of course, don’t simplify to the point that you can’t read your code, but try to remove as many unnecessary things as possible.</p>
<p>Second, generalize and create helper functions and child classes when possible. In general, try to break up your code so that in the long run there is less code to go through each time something breaks.</p>
<ul>
<li><i>Helper functions</i> are functions that carry out subtasks involved in your main function. I personally like to create helper functions for things that will be run over and over or things that are long and complex. For example, in almost all of my various graph classes, <code>set_breadth_first_distance and_previous</code> is a long, semicomplex bit of code that is run multiple times and is just part of what the main function accomplishes. We call this in <code>calc_distance</code> and <code>calc_shortest_path</code>.</li>
<li>A <i>child class</i> gains access to all of the parent class's methods. This means that you can state one or more core methods inside of one class and then create child classes off of it. Each child class now has access to these core methods without having to restate them. Here is how you state a child class: <code>class ChildClassName (ParentClassName): ...</code></li>
</ul>
<p>Now that your code is hopefully simplified, it’s time to find where things go wrong. You want to separate your code into “chunks” so that it’s more of a sieve. I would suggest making a copy of your code at this point. You find the general area that the code went wrong, and then begin dissecting it. Check each helper function to see if they are returning the proper things, expand out comprehensions and make sure everything is being set properly. Sometimes, everything will be correct except for one variable. Keep tracing back the error by repeating the same process on the area where that variable came from.</p>
<h2>Tip 4: Look it Up the Error Code</h2>
<p>I personally find this method VERY helpful for Haskell. Haskell can be very particular in what it accepts, where, and how, as well as not giving the most readable error codes. Let us go into an example from a semirecent quiz, where all of us had a certain problem.</p>
<h3>Example 4.1: Looking Up a Haskell Error</h3>
<p>As you might remember in Quiz 23, Problem 2 was a Haskell problem on calculating the GPA Average. The code I turned in went like this:</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%">calcPoints :: Char <span style="color: #333333">></span> Int
calcPoints char
<span style="color: #333333"></span> char <span style="color: #333333">==</span> <span style="backgroundcolor: #fff0f0">'A'</span> <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">4</span>
<span style="color: #333333"></span> char <span style="color: #333333">==</span> <span style="backgroundcolor: #fff0f0">'B'</span> <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">3</span>
<span style="color: #333333"></span> char <span style="color: #333333">==</span> <span style="backgroundcolor: #fff0f0">'C'</span> <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">2</span>
<span style="color: #333333"></span> char <span style="color: #333333">==</span> <span style="backgroundcolor: #fff0f0">'D'</span> <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">1</span>
<span style="color: #333333"></span> char <span style="color: #333333">==</span> <span style="backgroundcolor: #fff0f0">'F'</span> <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">0</span>
calcTotalPoints <span style="color: #007020">list</span> <span style="color: #333333">=</span> <span style="color: #007020">sum</span>([calcPoints x <span style="color: #333333"></span> x <span style="color: #333333"><</span> <span style="color: #007020">list</span>])
calcGPA <span style="color: #007020">list</span> <span style="color: #333333">=</span> (calcTotalPoints <span style="color: #007020">list</span>) <span style="color: #333333">/</span> (length <span style="color: #007020">list</span>)
main <span style="color: #333333">=</span> <span style="color: #008800; fontweight: bold">print</span>( calcGPA [<span style="backgroundcolor: #fff0f0">'A'</span>, <span style="backgroundcolor: #fff0f0">'B'</span>, <span style="backgroundcolor: #fff0f0">'B'</span>, <span style="backgroundcolor: #fff0f0">'C'</span>, <span style="backgroundcolor: #fff0f0">'C'</span>, <span style="backgroundcolor: #fff0f0">'C'</span>, <span style="backgroundcolor: #fff0f0">'D'</span>, <span style="backgroundcolor: #fff0f0">'F'</span>] )
</pre></div>
</font>
<p><br /></p>
<p>If you remember this, you know it didn’t work. The first thing I did to fix it was to look at the error code thrown out:</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%">No instance <span style="color: #008800; fontweight: bold">for</span> (Fractional Int) arising <span style="color: #008800; fontweight: bold">from</span> <span style="color: #0e84b5; fontweight: bold">a</span> <span style="color: #0e84b5; fontweight: bold">use</span> <span style="color: #0e84b5; fontweight: bold">of</span> <span style="backgroundcolor: #fff0f0">'/'</span> \\
In the expression: (calcTotalPoints <span style="color: #007020">list</span>) <span style="color: #333333">/</span> (length <span style="color: #007020">list</span>) \\
In an equation <span style="color: #008800; fontweight: bold">for</span> <span style="backgroundcolor: #fff0f0">'calcGPA'</span>: \\
calcGPA <span style="color: #007020">list</span> <span style="color: #333333">=</span> (calcTotalPoints <span style="color: #007020">list</span>) <span style="color: #333333">/</span> (length <span style="color: #007020">list</span>)
</pre></div>
</font>
<p><br /></p>
<p>From that I knew the problem was in <code>calcGPA</code>, but I checked <code>calcTotalPoints</code> just in case (it was fine). There was nothing else to do but look up the error code, so I copied and pasted the first line into the Google search engine.</p>
<p>I would recommend selecting one of the results from Stack Overflow, a response page where people put out questions on coding and others will respond. The answers and tips on there are usually pretty good and the people usually explain why something is wrong or why they think their version is better.</p>
<p>From the first link that I found, one of the answers explains that this error means that these are Ints, which do not have division when the result would be fractional. Hence Haskell is asking where the Fractional Ints are. We just need to divide these Ints and get a decimal. So now we look up a new search, “haskell how to divide ints and get a decimal”.</p>
<p>Again choosing a link from Stack Overflow, we see that we need to convert these Ints into Floats BEFORE dividing them, and we can do that with a nice little built in function called <code>fromIntegral (...)</code>. Now knowing this we can make a few small adjustments to our code and see if it works.</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%">calcPoints :: Char <span style="color: #333333">></span> Int
calcPoints char
<span style="color: #333333"></span> char <span style="color: #333333">==</span> <span style="backgroundcolor: #fff0f0">'A'</span> <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">4</span>
<span style="color: #333333"></span> char <span style="color: #333333">==</span> <span style="backgroundcolor: #fff0f0">'B'</span> <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">3</span>
<span style="color: #333333"></span> char <span style="color: #333333">==</span> <span style="backgroundcolor: #fff0f0">'C'</span> <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">2</span>
<span style="color: #333333"></span> char <span style="color: #333333">==</span> <span style="backgroundcolor: #fff0f0">'D'</span> <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">1</span>
<span style="color: #333333"></span> char <span style="color: #333333">==</span> <span style="backgroundcolor: #fff0f0">'F'</span> <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">0</span>
calcTotalPoints <span style="color: #007020">list</span> <span style="color: #333333">=</span> <span style="color: #007020">sum</span>([calcPoints x <span style="color: #333333"></span> x <span style="color: #333333"><</span> <span style="color: #007020">list</span>])
calcGPA <span style="color: #007020">list</span> <span style="color: #333333">=</span> fromIntegral(calcTotalPoints <span style="color: #007020">list</span>) <span style="color: #333333">/</span> fromIntegral(length <span style="color: #007020">list</span>)
main <span style="color: #333333">=</span> <span style="color: #008800; fontweight: bold">print</span>( calcGPA [<span style="backgroundcolor: #fff0f0">'A'</span>, <span style="backgroundcolor: #fff0f0">'B'</span>, <span style="backgroundcolor: #fff0f0">'B'</span>, <span style="backgroundcolor: #fff0f0">'C'</span>, <span style="backgroundcolor: #fff0f0">'C'</span>, <span style="backgroundcolor: #fff0f0">'C'</span>, <span style="backgroundcolor: #fff0f0">'D'</span>, <span style="backgroundcolor: #fff0f0">'F'</span>] )
</pre></div>
</font>
<p><br /></p>
<p>This will return $2.125,$ which is the correct answer for the problem.</p>
<h2>Tip 5: Ask for Help from Teachers or Peers</h2>
<p>Sometimes, you really can’t tell what the problem is with your code. You’ve taken it apart, checked everything, it all seems to be working right, but for some reason it just isn’t outputting what it’s supposed to. Now it is time to surrender and ask for someone else to check your code. Sometimes the answer is such a small stupid thing that you never thought to check.</p>
<p>I would personally recommend asking if your debugging takes more than 45 minutes. You don’t want to annoy other people with silly questions that you could resolve yourself, but you also don’t want to waste time making no progress (even though there is nothing more satisfying than fixing your code after having worked on it for almost 10 hours straight!).</p>
<p>Now that your code is finished and bug free, get ready to go through the whole debugging process again when you implement the next thing!</p>Maia DimasNote to reader: this post was written before students had access to VS Code and its associated debugging capabilities. Consequently, some valuable debugging tools like breakpoints are not covered in the post.DepthFirst and BreadthFirst Search20210601T00:00:0007:0020210601T00:00:0007:00https://eurisko.us/depthfirstandbreadthfirstsearch<p>To understand what depthfirst and breadthfirst search are, we must first know what graphs and directed graphs are. A <b>graph</b> is a set of objects (often called nodes) that are connected to each other. A pair of nodes is called an <b>edge</b>, and edges make up the “structure” of a graph. Put simply, a graph is a set of nodes and edges. A <b>directed graph</b> is a special instance of a graph. The difference is that in a regular (undirected) graph, the edges are bidirectional, meaning that they don’t have a specific direction. However, in a directed graph, each edge has a specified direction.</p>
<p>Graphs and directed graphs can be used to model many reallife phenomena. For example, a graph can model a social network since communication between social accounts goes both ways. On the other hand, a directed graph can model cities connected by railroad tracks since each railroad track has a specific direction.</p>
<h2>Examples of Graphs</h2>
<p>The following picture is an example of an undirected graph:</p>
<center><img src="https://eurisko.us/images/blog/depthfirstandbreadthfirstsearch1undirectedgraph.png" style="border: none; height: 20em;" alt="icon" /></center>
<p><br /></p>
<p>Here, the labeled circles represent nodes and the lines between the nodes represent undirected edges.</p>
<p>In the context of a social network, think of node 1 as yourself. The nodes 2, 7, and 8 would be your friends. The nodes 3 and 6 would be the friends of 2 and the nodes 9 and 12 would be the friends of 8. Since no other nodes are connected to 7, you are node 7’s only friend. By the same logic, nodes 4 and 5 are friends of 3 while nodes 9 and 12 are friends of 8.</p>
<p>The following picture is an example of a directed graph:</p>
<center><img src="https://eurisko.us/images/blog/depthfirstandbreadthfirstsearch2directedgraph.png" style="border: none; height: 20em;" alt="icon" /></center>
<p><br /></p>
<p>Again, the nodes are labeled circles and the edges are the lines between the nodes. However, notice that these edges have an arrow designating the direction.</p>
<p>In the context of cities connected by railroad tracks, node A would be your home city. The directed edge $A \to B$ represents a railroad that goes from city A to city B (but does not go back to city A). From city A, we can only go to city B. From city B, we can only go to city C (since the edge connecting B and D is $D \to B$, not $B \to D$). City C can only go to city E. When we get to city E, we can either go to city D (which in turn will allow us to loop back to city A city B) or city F (from which we can go to city D).</p>
<h2>DepthFirst and BreadthFirst Search</h2>
<p>Now that we know what graphs and directed graphs are, we can introduce depthfirst and breadthfirst search. As implied by the name, these methods traverse the graph by prioritizing either depth or breadth.</p>
<p><b>Depthfirst search</b> prioritizes depth. Starting at some node in the graph, we travel along one of the edges that connects to this node. This edge takes us to a new node, and we repeat the process, traveling along one of the edges that connects to the new node. This takes us deeper and deeper into the graph. Once we reach a node for which there are no more edges that we can travel along, we backtrack and go back up the graph until there is a path that we haven’t taken already. We then go deep down the path as far as possible, and then repeat the process until we’ve visited each and every node in the graph.</p>
<p><b>Breadthfirst search</b> prioritizes breadth. Instead of repeatedly traveling deep into the graph, breadthfirst search has us traveling through each “layer” of the graph. If we think about a family tree diagram as an example, a “layer” would be a single generation of people. In the case of a graph, a “layer” would be a single generation of nodes, so to speak. In this context, breadthfirst searching can be interpreted as traversing by “generation”, whereas depthfirst searching can be interpreted as traversing by direct “branches” of the family tree.</p>
<h2>Implementing a Graph Class</h2>
<p>A graph class is a collection of nodes along with methods for operating on the nodes. Each node has an index and a list of “neighbors” (or “parents”, in the case of a directed graph).</p>
<p>To initialize a graph, we can pass in a list of edges, where each edge is a tuple of node indices. From the edges list, we can determine all the possible indices of nodes, and then create a node that corresponds to each of those indices.</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"><span style="color: #008800; fontweight: bold">class</span> <span style="color: #BB0066; fontweight: bold">Graph</span>:
<span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">__init__</span>(<span style="color: #007020">self</span>, edges):
<span style="color: #007020">self</span><span style="color: #333333">.</span>edges <span style="color: #333333">=</span> edges
indices <span style="color: #333333">=</span> []
<span style="color: #008800; fontweight: bold">for</span> edge <span style="color: #000000; fontweight: bold">in</span> edges:
indices<span style="color: #333333">.</span>append(edge[<span style="color: #0000DD; fontweight: bold">0</span>])
indices<span style="color: #333333">.</span>append(edge[<span style="color: #0000DD; fontweight: bold">1</span>])
<span style="color: #007020">self</span><span style="color: #333333">.</span>nodes <span style="color: #333333">=</span> [Node(n) <span style="color: #008800; fontweight: bold">for</span> n <span style="color: #000000; fontweight: bold">in</span> <span style="color: #007020">range</span>(<span style="color: #007020">max</span>(indices) <span style="color: #333333">+</span> <span style="color: #0000DD; fontweight: bold">1</span>)]
</pre></div>
</font>
<p><br /></p>
<p>Now that we have initialized our graph, we are now able to build it. Building a graph involves setting the neighbors of each node.</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"><span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">build_from_edges</span>(<span style="color: #007020">self</span>):
<span style="color: #008800; fontweight: bold">for</span> edge <span style="color: #000000; fontweight: bold">in</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>edges:
i <span style="color: #333333">=</span> edge[<span style="color: #0000DD; fontweight: bold">0</span>]
j <span style="color: #333333">=</span> edge[<span style="color: #0000DD; fontweight: bold">1</span>]
<span style="color: #007020">self</span><span style="color: #333333">.</span>nodes[i]<span style="color: #333333">.</span>neighbors<span style="color: #333333">.</span>append(<span style="color: #007020">self</span><span style="color: #333333">.</span>nodes[j])
<span style="color: #007020">self</span><span style="color: #333333">.</span>nodes[j]<span style="color: #333333">.</span>neighbors<span style="color: #333333">.</span>append(<span style="color: #007020">self</span><span style="color: #333333">.</span>nodes[i])
</pre></div>
</font>
<p><br /></p>
<h2>Implementing DepthFirst and BreadthFirst Search</h2>
<p>To implement a breadthfirst search, we use a queue, a data structure that has elements inserted and removed according to the firstin firstout principle. Every time we visit a node, we want to add it to our queue if we haven’t visited it already. Our queue keeps track of which nodes we still need to “deal with”, so to speak. When we “deal with” a node, we add the node to our output array, add the node’s unvisited neighbors to the queue, and then finally remove the node from the queue. We continue doing this until there are no more nodes left in the queue.</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"><span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">fetch_nodes_breadth_first</span>(<span style="color: #007020">self</span>, root_index):
root_node <span style="color: #333333">=</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>nodes[root_index]
queue <span style="color: #333333">=</span> []
queue<span style="color: #333333">.</span>append(root_node)
visited <span style="color: #333333">=</span> {}
visited[root_node<span style="color: #333333">.</span>index] <span style="color: #333333">=</span> <span style="color: #007020">True</span>
result <span style="color: #333333">=</span> []
result<span style="color: #333333">.</span>append(root_node)
<span style="color: #008800; fontweight: bold">while</span> <span style="color: #007020">len</span>(queue) <span style="color: #333333">></span> <span style="color: #0000DD; fontweight: bold">0</span>:
node <span style="color: #333333">=</span> queue[<span style="color: #0000DD; fontweight: bold">0</span>]
<span style="color: #008800; fontweight: bold">for</span> neighbor <span style="color: #000000; fontweight: bold">in</span> node<span style="color: #333333">.</span>neighbors:
<span style="color: #008800; fontweight: bold">if</span> neighbor<span style="color: #333333">.</span>index <span style="color: #000000; fontweight: bold">not</span> <span style="color: #000000; fontweight: bold">in</span> visited:
queue<span style="color: #333333">.</span>append(neighbor)
result<span style="color: #333333">.</span>append(neighbor)
visited[neighbor<span style="color: #333333">.</span>index] <span style="color: #333333">=</span> <span style="color: #007020">True</span>
queue <span style="color: #333333">=</span> queue[<span style="color: #0000DD; fontweight: bold">1</span>:]
<span style="color: #008800; fontweight: bold">return</span> result
</pre></div>
</font>
<p><br /></p>
<p>Depthfirst search is similar to breadthfirst search. The only difference is that instead of a queue, we use a stack, a data structure that has elements inserted and removed according to the firstin lastout principle.</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"><span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">fetch_nodes_depth_first</span>(<span style="color: #007020">self</span>, root_index):
root_node <span style="color: #333333">=</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>nodes[root_index]
stack <span style="color: #333333">=</span> []
stack<span style="color: #333333">.</span>append(root_node)
visited <span style="color: #333333">=</span> {}
visited[root_node<span style="color: #333333">.</span>index] <span style="color: #333333">=</span> <span style="color: #007020">True</span>
result <span style="color: #333333">=</span> []
result<span style="color: #333333">.</span>append(root_node)
<span style="color: #008800; fontweight: bold">while</span> <span style="color: #007020">len</span>(queue) <span style="color: #333333">></span> <span style="color: #0000DD; fontweight: bold">0</span>:
node <span style="color: #333333">=</span> stack[<span style="color: #333333"></span><span style="color: #0000DD; fontweight: bold">1</span>]
<span style="color: #008800; fontweight: bold">for</span> neighbor <span style="color: #000000; fontweight: bold">in</span> node<span style="color: #333333">.</span>neighbors:
<span style="color: #008800; fontweight: bold">if</span> neighbor<span style="color: #333333">.</span>index <span style="color: #000000; fontweight: bold">not</span> <span style="color: #000000; fontweight: bold">in</span> visited:
stack<span style="color: #333333">.</span>append(neighbor)
result<span style="color: #333333">.</span>append(neighbor)
visited[neighbor<span style="color: #333333">.</span>index] <span style="color: #333333">=</span> <span style="color: #007020">True</span>
stack <span style="color: #333333">=</span> stack[:<span style="color: #333333"></span><span style="color: #0000DD; fontweight: bold">1</span>]
<span style="color: #008800; fontweight: bold">return</span> result
</pre></div>
</font>
<p><br /></p>
<p>Keep in mind that there can be different outputs for each of the search methods. In the case of breadth first search, you can switch up the order of elements on the same layer. In the case of depth first search, you can choose different paths to start going down.</p>Cayden LauTo understand what depthfirst and breadthfirst search are, we must first know what graphs and directed graphs are. A graph is a set of objects (often called nodes) that are connected to each other. A pair of nodes is called an edge, and edges make up the “structure” of a graph. Put simply, a graph is a set of nodes and edges. A directed graph is a special instance of a graph. The difference is that in a regular (undirected) graph, the edges are bidirectional, meaning that they don’t have a specific direction. However, in a directed graph, each edge has a specified direction.Efficiently Computing the Determinant of a Matrix, Part 1: Determinant by Cofactors20210601T00:00:0007:0020210601T00:00:0007:00https://eurisko.us/efficientlycomputingthedeterminantofamatrixpart1determinantbycofactors<p><i>Note: This post is part 1 of a 2part series: <a class="body" target="_blank" href="https://eurisko.us/20210601efficientlycomputingthedeterminantofamatrixpart1determinantbycofactors/">part 1</a>, <a class="body" target="_blank" href="https://eurisko.us/20210601efficientlycomputingthedeterminantofamatrixpart2determinantbyelementaryrowoperations/">part 2</a>.</i></p>
<p>The determinant is a number associated with a matrix that can be used to figure out many characteristics of that matrix. You can find the determinant of any square matrix.</p>
<p>For example, the determinant can be used to tell if a matrix is invertible, and it can be used to perform a change of variables in a higherorder integral. It can also tell us if a system of equations has a unique solution, and has multiple uses in calculus and linear algebra, including defining the characteristic polynomial of a matrix.</p>
<p>We denote the determinant of a matrix $A$ as $\vert A \vert$ or $\det(A).$</p>
<p>The determinant of a $1$ by $1$ matrix is the value in the matrix. In the following section, you will learn how to find the determinant of a $2$ by $2$ matrix. After that, you will learn $2$ different methods that you can use to find the determinant of any square matrix. One of those methods (using cofactors) will be very inefficient, whereas the other method (using elementary row operations) will be much more efficient.</p>
<h2>Determinant of a 2 by 2 Matrix</h2>
<p>The formula for finding the determinant of a $2$ by $2$ matrix is very simple. For a matrix</p>
<center>
$\begin{align*}
A = \begin{bmatrix}
a & b \\
c & d
\end{bmatrix}
\end{align*}$
</center>
<p><br /></p>
<p>we have</p>
<center>
$\begin{align*}
\text{det}(A) = ad  bc.
\end{align*}$
</center>
<p><br /></p>
<p>To show how simple finding the determinant of a $2$ by $2$ matrix can be, I’ll provide an example. Let’s say we have the matrix</p>
<center>
$\begin{align*}
A = \begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix}.
\end{align*}$
</center>
<p><br /></p>
<p>Using the formula, the determinant of this matrix is</p>
<center>
$\begin{align*}
\det(A) &= 1 \cdot 4  3 \cdot 2 \\
&= 4  6 \\
&= 2.
\end{align*}$
</center>
<p><br /></p>
<h2>Determinant by Cofactors: 3 by 3 Example</h2>
<p>The cofactor method is a way to find the determinant of a matrix. This method can be used to find the determinant of any $n$ by $n$ matrix, although it is an inefficient method to use when $n$ is not small.</p>
<p>Here, I’ll demonstrate how to use the cofactor method to find the determinant of the $3$ by $3$ matrix shown below.</p>
<center>
$\begin{align*}
\begin{bmatrix}
0 & 2 & 4\\
7 & 24 & 1\\
2 & 6 & 8
\end{bmatrix}
\end{align*}$
</center>
<p><br /></p>
<p>The first step is to choose which row you want to expand. In general, you want to try to expand the row with the most zeros in it. You will see why in the next step. For this example, I’m going to expand the first row.</p>
<p>The second step is to expand your chosen row. In this step, we use that fact that the determinant of a $3$ by $3$ matrix is equal to a linear combination of $2$ by $2$ matrices.</p>
<p>The coefficients of these $2$ by $2$ matrices are the elements of the row that we chose times each element’s sign from the sign chart. The sign chart for a $3$ by $3$ matrix, shown below, dictates whether the sign of an element needs to be flipped in the linear combination.</p>
<center>
$\begin{align*}
\begin{bmatrix}
+ &  & + \\
 & + &  \\
+ &  & +
\end{bmatrix}
\end{align*}$
</center>
<p><br /></p>
<p>For the 0, the sign does not matter because $0 = 0$. For the $2$ in the first row, we can see that the corresponding sign is a negative sign. That mean that the sign of $2$ needs to be flipped to be a $2$. For the $4$ in the first row, we see that the corresponding sign is a positive sign. That means that the sign of $4$ does not need to change.</p>
<p>The $2$ by $2$ matrices in the linear combination will contain all of the elements in the $3$ by $3$ matrix that aren’t in the same row or column of each of the $2$ by $2$ matrices’ coefficients. To find all of these elements for each element in your chose row, go through each element in your chosen row and use your finger (or imagine doing so) to block out the row and column that contains the element. All elements of the $3$ by $3$ matrix that aren’t covered by your fingers will be in that $2$ by $2$ matrix.</p>
<p>For the first $2$ by $2$ matrix, the corresponding coefficient is 0, so we can stop here. This is because any $2$ by $2$ matrix times $0$ will be the zero matrix, which we can disregard when doing addition. This is why it is important to choose to expand the row that has the most zeros.</p>
<p>For the second $2$ by $2$ matrix, the corresponding coefficient is $2$. We see that the $2$ that we are looking for is in the first row and the second column. Blocking out the first row and the second column with our fingers, we see that the $2$ by $2$ matrix will contain the elements $7$, $1$, $2$, and $8$. This gives us the matrix below.</p>
<center>
$\begin{align*}
\begin{vmatrix}
7 & 1 \\
2 & 8
\end{vmatrix}
\end{align*}$
</center>
<p><br /></p>
<p>For the third element in the first row, we use the same strategy from the second element, where we use our fingers to block out all of the elements in the same row as the corresponding coefficient. That coefficient is now $4$, so we see that this $2$ by $2$ matrix will contain the elements $7$, $24$, $2$, and $6$. This gives us the matrix below.</p>
<center>
$\begin{align*}
\begin{vmatrix}
7 & 24 \\
2 & 6
\end{vmatrix}
\end{align*}$
</center>
<p><br /></p>
<p>Plugging in the $2$ by $2$ matrices, their corresponding coefficients, and the coefficients’ new signs, we have</p>
<center>
$\begin{align*}
\begin{vmatrix}
0 & 2 & 4 \\
7 & 24 & 1 \\
2 & 6 & 8
\end{vmatrix}
=
0

2 \times \begin{vmatrix}
7 & 1 \\
2 & 8
\end{vmatrix}
+
4 \times \begin{vmatrix}
7 & 24 \\
2 & 6
\end{vmatrix}
\end{align*}$
</center>
<p><br /></p>
<p>We can now solve each $2$ by $2$ matrix as usual, multiply those determinants by the coefficients, and add them up to find the determinant of the $3$ by $3$ matrix.</p>
<p>Doing this with our example, we have</p>
<center>
$\begin{align*}
\begin{vmatrix}
0 & 2 & 4 \\
7 & 24 & 1 \\
2 & 6 & 8
\end{vmatrix}
&=
0

2 \times \begin{vmatrix}
7 & 1 \\
2 & 8
\end{vmatrix}
+
4 \times \begin{vmatrix}
7 & 24 \\
2 & 6
\end{vmatrix} \\
&= 2 \times \left(7 \times 8  1 \times 2\right) + 4 \times \left(7 \times 6  24 \times 2\right) \\
&= 2 \times 54 + 4 \times 6 \\
&= 132.
\end{align*}$
</center>
<p><br /></p>
<p>This method works for any $n$ by $n$ matrix, where $n$ is a natural number.</p>
<h2>Determinant by Cofactors: 4 by 4 Example</h2>
<p>Let’s see how we would use the cofactor method to find the determinant of the following $4$ by $4$ matrix:</p>
<center>
$\begin{align*}
\begin{vmatrix}
0 & 2 & 4 & 9 \\
7 & 24 & 1 & 14 \\
2 & 6 & 8 & 0 \\
7 & 2 & 8 & 13
\end{vmatrix}
\end{align*}$
</center>
<p><br /></p>
<p>Following the steps above, we choose the first row as the one to expand, because it is tied for the most zeros. Second, we expand the matrix. The coefficients will be $0, 2, 4,$ and $9,$ respectively. When we put the coefficients together with the signs of $+, , +,$ and $,$ respectively, we have the new coefficients $0, 2, 4,$ and $9.$</p>
<p>For the coefficient $0,$ the corresponding matrix does not matter, because it is multiplied by $0.$</p>
<p>For the coefficient $2,$ the corresponding $3$ by $3$ matrix is</p>
<center>
$\begin{align*}
\begin{vmatrix}
7 & 1 & 14 \\
2 & 8 & 0 \\
7 & 8 & 13
\end{vmatrix}
\end{align*}$
</center>
<p><br /></p>
<p>For the coefficient $4,$ the corresponding $3$ by $3$ matrix is</p>
<center>
$\begin{align*}
\begin{vmatrix}
7 & 24 & 14 \\
2 & 6 & 0 \\
7 & 2 & 13
\end{vmatrix}
\end{align*}$
</center>
<p><br /></p>
<p>For the coefficient $9,$ the corresponding $3$ by $3$ matrix is</p>
<center>
$\begin{align*}
\begin{vmatrix}
7 & 24 & 1 \\
2 & 6 & 8 \\
7 & 2 & 8
\end{vmatrix}
\end{align*}$
</center>
<p><br /></p>
<p>Therefore, adding the matrices and their coefficients together, we see that</p>
<center>
$\begin{align*}
\begin{vmatrix}
0 & 2 & 4 & 9 \\
7 & 24 & 1 & 14 \\
2 & 6 & 8 & 0 \\
7 & 2 & 8 & 13
\end{vmatrix}
=
0 \cdot *

2 \cdot \begin{vmatrix}
7 & 1 & 14 \\
2 & 8 & 0 \\
7 & 8 & 13
\end{vmatrix}
+
4 \cdot \begin{vmatrix}
7 & 24 & 14 \\
2 & 6 & 0 \\
7 & 2 & 13
\end{vmatrix}

9 \cdot \begin{vmatrix}
7 & 24 & 1 \\
2 & 6 & 8 \\
7 & 2 & 8
\end{vmatrix}
\end{align*}$
</center>
<p><br /></p>
<p>We can use the same process from before to find the determinant of each $3$ by $3$ matrix, then add them together to find the determinant of the $4$ by $4$ matrix.</p>
<h2>Implementing Determinant by Cofactors</h2>
<p>Computing the determinant of a matrix using the cofactor method can take a long time. Instead of doing it all by hand, we can write a program that will do this for us. Below, I have a program that can compute the determinant of a matrix using the cofactor method. This program, technically called a method, is part of an object. In python, the language that I’m using, an object is a collection of data and methods that act of that data. When we set a variable equal to this object and include parameters, the variable becomes a class. When inside the object, we use “self” to refer to this variable. Self has different attributes that we have given it, and can use the methods in the class to manipulate those attributes and give us what we want.</p>
<p>The method below is from an object that I have created python. This object has one attribute, called “elements”, which is the elements of our matrix. It is really a lists of lists, not an array. The method <code>calc_cofactor_method_determinant()</code> below takes self and finds the determinant of <code>self.elements</code> (the “elements” attribute) using the cofactor method. From now on, I may say “the matrix” to refer to <code>self.elements</code>.</p>
<p>To access an element of the matrix, we write <code>self.elements[i][j]</code>, where <code>i</code> is the index of the row that the element is in, and <code>j</code> is the index of the column that the element is in. The indices of lists start from $0$ and increase one at a time. That means the first row/column has index $0$, the second row/column has index $1$, and so on. So if we wanted to access the element in the first column and the first row, we would write <code>self.elements[0][0]</code>. If we canted to access the element in the third column and the second row of a matrix, we could write <code>self.elements[1][2]</code>.</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"><span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">calc_cofactor_method_determinant</span>(<span style="color: #007020">self</span>):
<span style="color: #008800; fontweight: bold">if</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>num_rows <span style="color: #333333">!=</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>num_cols:
<span style="color: #008800; fontweight: bold">raise</span> <span style="color: #FF0000; fontweight: bold">Exception</span>(<span style="backgroundcolor: #fff0f0">"Cannot take cofactor_method_determinant of a nonsquare matrix"</span>)
<span style="color: #008800; fontweight: bold">elif</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>num_rows <span style="color: #333333">==</span> <span style="color: #0000DD; fontweight: bold">1</span>:
<span style="color: #008800; fontweight: bold">return</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>elements[<span style="color: #0000DD; fontweight: bold">0</span>][<span style="color: #0000DD; fontweight: bold">0</span>]
<span style="color: #008800; fontweight: bold">elif</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>num_rows <span style="color: #333333">==</span> <span style="color: #0000DD; fontweight: bold">2</span>:
a, b <span style="color: #333333">=</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>elements[<span style="color: #0000DD; fontweight: bold">0</span>][<span style="color: #0000DD; fontweight: bold">0</span>], <span style="color: #007020">self</span><span style="color: #333333">.</span>elements[<span style="color: #0000DD; fontweight: bold">0</span>][<span style="color: #0000DD; fontweight: bold">1</span>]
c, d <span style="color: #333333">=</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>elements[<span style="color: #0000DD; fontweight: bold">1</span>][<span style="color: #0000DD; fontweight: bold">0</span>], <span style="color: #007020">self</span><span style="color: #333333">.</span>elements[<span style="color: #0000DD; fontweight: bold">1</span>][<span style="color: #0000DD; fontweight: bold">1</span>]
<span style="color: #008800; fontweight: bold">return</span> a<span style="color: #333333">*</span>d <span style="color: #333333"></span> b<span style="color: #333333">*</span>c
<span style="color: #008800; fontweight: bold">else</span>:
ans <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">0</span>
row_index <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">0</span>
<span style="color: #008800; fontweight: bold">for</span> col_index <span style="color: #000000; fontweight: bold">in</span> <span style="color: #007020">range</span>(<span style="color: #007020">self</span><span style="color: #333333">.</span>num_cols):
coefficient_sign <span style="color: #333333">=</span> (<span style="color: #333333"></span><span style="color: #0000DD; fontweight: bold">1</span> <span style="color: #333333">**</span> col_index)
coefficient <span style="color: #333333">=</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>elements[row_index][col_index]
minor <span style="color: #333333">=</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>calc_minor(row_index, col_index)
ans <span style="color: #333333">+=</span> coefficient_sign <span style="color: #333333">*</span> coefficient <span style="color: #333333">*</span> minor<span style="color: #333333">.</span>cofactor_method_determinant()
<span style="color: #008800; fontweight: bold">return</span> ans
</pre></div>
</font>
<p><br /></p>
<p>First, because we can only take the determinant of a square matrix, we need to make sure that the matrix is a square matrix. If <code>self.num_rows</code> (which counts the number of rows in our matrix) is not equal to <code>self.num_cols</code> (which counts the number of columns in our matrix), then we raise an exception.</p>
<p>Then, we check if the matrix is either a $1$ by $1$ matrix or a $2$ by $2$ matrix. We do this because we know the formulas for computing the determinant of these matrices. If the matrix is a $1$ by $1$ matrix, then we return the only element in the matrix. If the matrix is a $2$ by $2$ matrix, then we use the formula discussed in introduction: we set $a, b, c,$ and $d$ to their respective numbers in the matrix and return $ad  bc.$</p>
<p>If a matrix is not a $1$ by $1$ matrix or a $2$ by $2$ matrix, then the process is more complicated. We start to use the cofactor method here. As usual with the cofactor method, the first step is to choose which row we want to expand. For simplicity, I chose the first row. Next, we loop through the number of columns in the matrix. Before the recursion kicks in, we must compute the first minor, its coefficient, and the coefficient’s sign.</p>
<p>We compute the sign by raising $1$ to the index of the current column. If <code>col_index</code> equals $0$, for example then the sign is $1^{0} = 1$. When we multiply the coefficient by its sign, the coefficient keeps the same sign, as it should. To access the coefficient, we take the current element, which is at the row index $0$ and the current column index.</p>
<p>To compute the minor, we use the helper method called <code>calc_minor</code>. This method takes self, a row index, and a column index, and returns a modified version of self called minor that only has the elements that don’t have the same row index or column index as the parameters of the method.</p>
<p>Because we are using recursion, we will take the determinant of that minor using the cofactor method. Then, we take the determinant of the minors of that minor. This continues until we have the determinants of all the minors for the first minor of the matrix. Then, we multiply each minor to its corresponding coefficient that that coefficient’s sign, and add this sum to out answer. Then, <code>col_index</code> increases and we do the whole thing again for the second minor of the matrix.</p>
<p>Eventually, we loop through all the minors of the matrix. We just return our answer and we are done! We have successfully computed the determinant of a matrix using the cofactor method and the help of a method.</p>
<h2>Time Complexity</h2>
<p>When we found the determinant of a $4 \times 4$ matrix, the first step was to expand it into 4 $3 \times 3$ matrices. When we found the determinant of a $3 \times 3$ matrix, the first step was to expand it into 3 $2 \times 2$ matrices. We can see a pattern here: when solving an $n \times n$ matrix, we expand it into $n$ $(n1) \times (n1)$ matrices.</p>
<p>Continuing the expansion, when solving an $n \times n$ matrix, we would have $\dfrac{n!}{2}$ $2 \times 2$ matrices.</p>
<p>There are three operations per $2 \times 2$ matrix: multiplying the first and fourth elements, multiplying the second and third elements, and subtracting the latter from the former. Given that we have $\dfrac{n!}{2}$ $2 \times 2$ matrices, for the $2 \times 2$ matrices in the expansion of an $n \times n$ matrix, we have $\dfrac{3}{2}n!$ operations.</p>
<p>Finally, we must multiply our number of operations for all $2 \times 2$ matrices by $2$, because every $2 \times 2$ matrix can have a coefficient, which is one extra operation. This brings our maximum number of operations for finding the determinant of an $n \times n$ matrix to $3 \cdot n!$ operations. This is very inefficient!</p>
<p><i>This post is part 1 of a 2part series. <a class="body" target="_blank" href="https://eurisko.us/20210601efficientlycomputingthedeterminantofamatrixpart2determinantbyelementaryrowoperations/">Click here to continue to part 2.</a></i></p>Charlie WeinbergerNote: This post is part 1 of a 2part series: part 1, part 2.Efficiently Computing the Determinant of a Matrix, Part 2: Determinant by Elementary Row Operations20210601T00:00:0007:0020210601T00:00:0007:00https://eurisko.us/efficientlycomputingthedeterminantofamatrixpart2determinantbyelementaryrowoperations<p><i>Note: This post is part 2 of a 2part series: <a class="body" target="_blank" href="https://eurisko.us/20210601efficientlycomputingthedeterminantofamatrixpart1determinantbycofactors/">part 1</a>, <a class="body" target="_blank" href="https://eurisko.us/20210601efficientlycomputingthedeterminantofamatrixpart2determinantbyelementaryrowoperations/">part 2</a>.</i></p>
<p>Computing the determinant by hand is often annoying, and using the cofactor method takes more time the larger the matrix. The most efficient way to compute the determinant of a matrix is through using elementary row operations. Through this process, we reduce the matrix to echelon form and take the operations we did on the rows to compute the determinant.</p>
<p>Our first step is to take any given matrix to echelon form, but we have to keep track of the row operations we apply to be able to use them later when computing the determinant. Below, I have a randomly generated a $3\times 3$ matrix:</p>
<center>
$\begin{align*}
\begin{bmatrix}
0 & 2 & 4\\
7 & 24 & 1\\
2 & 6 & 8
\end{bmatrix}
\end{align*}$
</center>
<p><br /></p>
<p>The frst thing we would do is switch our top and bottom rows ($R_1$ and $R_3$). Then, we would divide our top row by 2 to get our pivot row for column $1$. We’ll need to keep a list of all numbers we divide any row by and the number of times we switch rows.</p>
<center>
$\begin{align*}
\begin{bmatrix}
1&3&4\\
7&24&1\\
0&2&4
\end{bmatrix} , \text{row divisors:(2), row swaps:(1)}
\end{align*}$
</center>
<p><br /></p>
<p>Next we subtract $7\times R_{1}$ from $R_{2}$. After that we subtract $R_{2}$ from $R_{1}$.</p>
<center>
$\begin{align*}
\begin{bmatrix}
1&3&4\\
0&3&20\\
0&2&4
\end{bmatrix}, \text{row divisors:(2), row swaps:(1)}
\end{align*}$
</center>
<p><br /></p>
<center>
$\begin{align*}
\begin{bmatrix}
1&0&16\\
0&3&27\\
0&2&4
\end{bmatrix}, \text{row divisors:(2), row swaps:(1)}
\end{align*}$
</center>
<p><br /></p>
<p>Now we divide $R_{2}$ by 3, and subtract $2\times R_{2}$ from $R_{3}$. We can also divide $R_{3}$ by $22$ as well.</p>
<center>
$\begin{align*}
\begin{bmatrix}
1&0&16\\
0&1&9\\
0&0&1
\end{bmatrix},\text{row divisors:(2,3,22), row swaps:(1)}
\end{align*}$
</center>
<p><br /></p>
<p>At this point we can see the matrix does reduce to the identity, so we can stop here. Now with the row divisors and swaps we collected we can use them to calculate the determinant. To get it, we take the product of the divisors and multiply it by $1$ to the power of the number of row swaps:</p>
<center>
$\begin{align*}
(2\times 3\times 22)\times 1^{1} = 132
\end{align*}$
</center>
<p><br /></p>
<p>When computing the determinant using either the cofactor method or by using a calculator, we can see this is actually the determinant.</p>
<h2>Implementation</h2>
<p>Below is the actual code for computing the determinant via row operations that I implemented into my matrix class. The first thing it’s doing is making a copy of the matrix, so that we don’t mutate the original matrix. Then we make a variable called <code>factors</code> that will be what the method returns, and that’s what we multiply our dividing numbers by.</p>
<p>After, we ensure the matrix is square, otherwise you can’t compute the determinant. We then loop throw each row of the matrix, checking which row is the pivot row, and making sure the pivot row is where its supposed to be (in the position of the next column that needs a pivot). Once we have the pivot row in the correct position, we multiply the <code>factors</code> variable by the pivot number, and then reduce the column that number is in to $1$’s and $0$’s. Then we move on to the next row. After going through the whole matrix, we return the <code>factors</code> variable.</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"><span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">calc_determinant</span>(<span style="color: #007020">self</span>):
matrix <span style="color: #333333">=</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>copy()
factors <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">1</span>
row_index <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">0</span>
<span style="color: #008800; fontweight: bold">if</span> matrix<span style="color: #333333">.</span>num_cols <span style="color: #333333">!=</span> matrix<span style="color: #333333">.</span>num_rows:
<span style="color: #008800; fontweight: bold">return</span> (<span style="backgroundcolor: #fff0f0">"Nonsquare matrices have no determinant."</span>)
<span style="color: #008800; fontweight: bold">for</span> col <span style="color: #000000; fontweight: bold">in</span> <span style="color: #007020">range</span>(matrix<span style="color: #333333">.</span>num_cols):
pivot <span style="color: #333333">=</span> matrix<span style="color: #333333">.</span>get_pivot_row(col)
<span style="color: #008800; fontweight: bold">if</span> pivot <span style="color: #333333">!=</span> <span style="color: #007020">None</span>:
<span style="color: #008800; fontweight: bold">if</span> pivot <span style="color: #333333">!=</span> row_index:
matrix <span style="color: #333333">=</span> matrix<span style="color: #333333">.</span>swap_rows(row_index, pivot)
factors <span style="color: #333333">*=</span> <span style="color: #333333"></span><span style="color: #0000DD; fontweight: bold">1</span>
factors <span style="color: #333333">*=</span> matrix<span style="color: #333333">.</span>elements[pivot][col]
matrix <span style="color: #333333">=</span> matrix<span style="color: #333333">.</span>normalize_row(row_index)
matrix <span style="color: #333333">=</span> matrix<span style="color: #333333">.</span>clear_above(row_index)
matrix <span style="color: #333333">=</span> matrix<span style="color: #333333">.</span>clear_below(row_index)
row_index <span style="color: #333333">+=</span> <span style="color: #0000DD; fontweight: bold">1</span>
<span style="color: #008800; fontweight: bold">else</span>:
mult_constant <span style="color: #333333">*=</span> <span style="color: #0000DD; fontweight: bold">0</span>
<span style="color: #008800; fontweight: bold">continue</span>
<span style="color: #008800; fontweight: bold">return</span> factors
</pre></div>
</font>
<p><br /></p>
<h2>Time Complexity</h2>
<p>When using this method to compute the determinant, the computation is relatively fast compared to other methods. Lets assume we have a $n \times n$ matrix that can be taken to echelon form, then we can easily find the maximum amount of computations needed to find the determinant using this method. Assuming that all the entries are not $1$, then for each row, you would divide by a number at some point, so that’s $n$ computations so far. For each row, you will also have to subtract it from all the other rows to make it the pivot row, which is $n1$ more computations for $n$ rows, which is $n(n1)$. This gives us $n+n(n1) = n^2$ total computations.</p>
<p>This is much more efficient than computing the determinant via the cofactor method, which can take $n!$ operations.</p>Nathan ReynosoNote: This post is part 2 of a 2part series: part 1, part 2.Learning to Debug20210601T00:00:0007:0020210601T00:00:0007:00https://eurisko.us/learningtodebug<p><i>Note to reader: this post was written before students had access to VS Code and its associated debugging capabilities. Consequently, some valuable debugging tools like breakpoints are not covered in the post.</i></p>
<p>At the beginning of the year, I knew nothing about Python. I learned different methods and syntaxes, but I had no idea what it meant, and if something went wrong, I was practically hopeless. My main source for help would be asking friends  which is a very good resource  though I grew reliant upon them, instead of using them as a tool for help. Looking back now, I realize that understanding what you’re coding  not just how Python works  but also the problem you are trying to solve, is very important. If you don’t know how the code is supposed to work, you can’t debug the issues.</p>
<h2>Write Clear, Concise Code</h2>
<p>When I look back at old problem files, it can be hard to understand my thought process in the code, even though it works. When coding, it is extremely important that the code makes sense and is readable, not just for you but also for others to understand. This is crucial when debugging, so that you can easily find and discrepancies in your logic, especially if your code is not returning errors. A core part of having clear code is a systematic variable naming convention. If you are debugging by printing your variables, having good names makes it easy to know what they should be equal to, so you can clearly pick out anything wrong.</p>
<p>An example of unclear coding is as follows:</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"><span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">funct</span>(var, var2):
number <span style="color: #333333">=</span> <span style="color: #0000DD; fontweight: bold">0</span>
number <span style="color: #333333">+=</span> var
number <span style="color: #333333">=</span> number <span style="color: #333333">+</span> var2
<span style="color: #008800; fontweight: bold">return</span> number
</pre></div>
</font>
<p><br /></p>
<p>Though this code would run and return the sum of the two inputs, it can be hard to tell exactly what it’s doing. There are multiple ways that this code could be cleaned up:</p>
<ul>
<li>On line 1, the two input variables (<code>var</code>, <code>var2</code>) could be defined as <code>num_1</code> and <code>num_2</code> (or similar), to show that these are numbers.</li>
<li>Function naming is also very important. The name of the function can tell you what it does very easily without even having to look at the code. In this example, <code>funct</code> explains nothing about the function itself. Instead, <code>calc_sum</code> would be a better name for the function, as it is clear that you are taking the sum of the two inputs.</li>
<li>In the function's body, there are multiple confusions and inconsistencies. The method used to calculate the sum is fine: initialize a variable, add each input, return the variable. However, the execution can be confusing. Initializing the variable as 0 in line 2 is fine, but unnecessary. Lines 3 and 4 use two different methods of variable addition and reassignment, which can be confusing to switch between (line 3's method is typically better, as it is more concise). These lines could also be condensed into a single line.</li>
<li>One issue in the code is knowing if you need a variable at all. In this case, <code>number</code> could be done without, since you can just return the sum without assigning it to an intermediate the variable.</li>
</ul>
<p>Using the changes listed above, here is a better version of the code:</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"><span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">calc_sum</span>(num_1, num_2):
<span style="color: #008800; fontweight: bold">return</span> num_1 <span style="color: #333333">+</span> num_2
</pre></div>
</font>
<p><br /></p>
<p>It’s much easier to read, and consequently, much easier to debug.</p>
<h2>Don't Do Too Much on Any Single Line</h2>
<p>There are often many different solutions to the same problem. Concise solutions are generally better, but not if they make the code messy. For example, list and dictionary comprehensions eliminate the need for a multiline “for” loop but can become cluttered very easily. Comprehensions can be especially complex when they are nested, where multiple lists/dictionaries are being created within one big one. This can make the code very confusing to understand, especially if the reader has no context of the problem.</p>
<p>An example of condensed code becoming confusing is as follows:</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"><span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">get_mod_binary</span>(n, low_range, high_range):
<span style="color: #008800; fontweight: bold">return</span> [convert_to_binary(num<span style="color: #333333">%</span>n) <span style="color: #008800; fontweight: bold">for</span> num <span style="color: #000000; fontweight: bold">in</span> <span style="color: #007020">range</span>(low_range, high_range) <span style="color: #008800; fontweight: bold">if</span> num<span style="color: #333333">%</span>n<span style="color: #333333">==</span><span style="color: #0000DD; fontweight: bold">0</span>]
<span style="color: #008800; fontweight: bold">print</span>(get_mod_binary(<span style="color: #0000DD; fontweight: bold">5</span>, <span style="color: #0000DD; fontweight: bold">1</span>, <span style="color: #0000DD; fontweight: bold">50</span>))
<span style="color: #888888">#[11001, 110010, 1001011, 1100100, 1111101, 10010110, 10101111, 11001000, 11100001]</span>
</pre></div>
</font>
<p><br /></p>
<p>The above function’s purpose is to take in a number and a range, find all numbers divisible by $n$ in the provided range, and convert the product of $n$ and the divisible numbers to a binary number. The function is confusing to read visually, but to help with the readability, we can break up the list comprehension to show the flow of the problem more clearly:</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"><span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">get_mod_binary</span>(n, low_range, high_range):
binary_list <span style="color: #333333">=</span> []
<span style="color: #008800; fontweight: bold">for</span> num <span style="color: #000000; fontweight: bold">in</span> <span style="color: #007020">range</span>(low_range, high_range):
<span style="color: #008800; fontweight: bold">if</span> num<span style="color: #333333">%</span>n<span style="color: #333333">==</span><span style="color: #0000DD; fontweight: bold">0</span>:
product <span style="color: #333333">=</span> num<span style="color: #333333">*</span>n
binary_list<span style="color: #333333">.</span>append(convert_to_binary(product))
<span style="color: #008800; fontweight: bold">return</span> binary_list
</pre></div>
</font>
<p><br /></p>
<p>Though the function has more lines, anyone reading your code can understand it better. Another upside to not having overlycondensed code is the ease of the debugging process: it’s much easier to print intermediate steps in spacedout code than in a compact list comprehension.</p>
<h2>Use the Right Data Structure</h2>
<p>Choosing the right structure to store and manipulate your data is very important. Not having the right data structure can make your code very confusing and difficult to work with.</p>
<p>One example of this is in my SQL parser in my custom DataFrame class. Originally, I stored each word as an item in a list, but this made it very difficult to work with, as it was hard to differentiate between the operations and the inputs to the operations. To solve this issue, I stored the data within a dictionary. This way, I could have the key be the operation and the value be the input to the operation. This allowed me to be able to carry out those operations easier than before, as I changed the data structure.</p>William WalliusNote to reader: this post was written before students had access to VS Code and its associated debugging capabilities. Consequently, some valuable debugging tools like breakpoints are not covered in the post.Simulating a Biological Neuron Using the HodgkinHuxley Model20210601T00:00:0007:0020210601T00:00:0007:00https://eurisko.us/simulatingabiologicalneuralnetworkusingthehodgkinhuxleymodel<p>To understand the HodgkinHuxley model, we must first understand what a neuron is. A neuron is a type of cell that is most prominently found in nerves and the brain, and neurons the primary building blocks of the nervous system. Neurons are connected by synapses, which allow signals to be sent and received rapidly and precisely.</p>
<p>There are three major types of neurons:</p>
<ul>
<li>Sensory neurons make sense of the outside world (e.g. touch, pain, vision, hearing, and taste) by sending signals from sensory organs to the brain.</li>
<li>Interneurons make up most of the neurons in our bodies and brain, and allow us to to think, see, and perceive things.</li>
<li>Motor neurons are in charge of contracting muscles so we can move.</li>
</ul>
<p>The most basic structure of a neuron consists of the main cell body (the plasma membrane, nucleus, etc.), dendrites, and an axon. Dendrites allow the neuron to receive signals, while the single axon is how the cell sends signals.</p>
<center><img src="https://eurisko.us/images/blog/simulatingabiologicalneuralnetworkusingthehodgkinhuxleymodel1neuronstructure.jpg" style="border: none; height: 20em;" alt="icon" /></center>
<p><br /></p>
<p>Neurons send signals via spikes in electrical activity called <b>action potentials</b>. Before we jump into modeling action potentials, let’s learn a bit more about them.</p>
<p>Each neuron has a resting membrane potential around 70 mV (it is negative because the surroundings of the neuron have accumulated ions). As neurotransmitters bind to the receptors, the neuron is depolarized, moving the membrane potential closer to 0 mV. When the membrane potential reaches a certain threshold (around 55 mV for a neuron starting at 70 mV), sodium channels open, allowing many sodium ions into the cell. Rapid depolarization follows, and the membrane potential becomes positive. This influx of positive charge is what is known as the action potential.</p>
<p>When the action potential reaches its peak, sodium channels close, potassium channels open, and the cell loses membrane potential (repolarization). The drop in membrane potential causes the neuron to become hyperpolarized, and enter a state where it is very difficult to cause the neuron to depolarize again. Eventually the neuron reaches its resting membrane potential again, where it is no longer hyperpolarized.</p>
<center><img src="https://eurisko.us/images/blog/simulatingabiologicalneuralnetworkusingthehodgkinhuxleymodel2actionpotential.jpg" style="border: none; height: 20em;" alt="icon" /></center>
<p><br /></p>
<h2>The HodgkinHuxley Model</h2>
<p>The HodgkinHuxley model won the Nobel Prize in 1963 for Physiology and Medicine, and the prize was awarded to Sir John Carew Eccles, Alan Lloyd Hodgkin, and Andrew Fielding Huxley. These men won the prize for their model of the action potentials of neurons using differential equations.</p>
<p>The model yields a graph of how electrical stimulus affects the action potential (change in voltage) of a neuron over time. Hodgkin and Huxley derived much of the model from experimentation, and those results combined with physics to yield usable differential equations.</p>
<p>From physics, we have that the current is proportional to the change in voltage via the capacitance</p>
<center>
$\begin{align*}
I = C \dfrac{dV}{dt},
\end{align*}$
</center>
<p><br /></p>
<p>and from this we can obtain</p>
<center>
$\begin{align*}
\dfrac{dV}{dt} = \dfrac{I}{C}.
\end{align*}$
</center>
<p><br /></p>
<p>In neurons, the capacitance is roughly $1$.</p>
<p>Next the current must be split into parts: the stimulus ($s$), the flux across the sodium and potassium channels ($I_{Na}$, $I_{K}$ respectively), and the current leakage ($I_{L}$). We then turn the previous equation into</p>
<center>
$\begin{align*}
\dfrac{dV}{dt} = \dfrac{1}{C} \left[ s  I_{Na}  I_{K}  I_{L} \right].
\end{align*}$
</center>
<p><br /></p>
<p>In this model it’s really just</p>
<center>
$\begin{align*}
\dfrac{dV}{dt} = s  I_{Na}  I_{K}  I_{L}.
\end{align*}$
</center>
<p><br /></p>
<p>since $C=1.$</p>
<p>One thing to keep in mind for all of these equations is that $V$ represents voltage <i>offset</i> from the resting potential (70 mV), and not the actual voltage of the neuron. This is done to make graphing and showing the model more simple.</p>
<p>The current across each channel (including the leakage) is related to the voltage difference relative to that channel’s equilibrium voltage. The proportionality constants were modeled experimentally, and some are written in terms of $n, m,$ and $h,$ which represent the probability of active/inactive channels. The equations can be written as follows (with $V$ representing voltage offset from the resting potential):</p>
<ul>
<li>$I_{Na}(V, m, h) = g_{Na}(m,h)(VV_{Na})$, with equilibrium voltage $V_{Na} = 115$ and proportionality constant $g_{Na}(m,h) = 120m^{3}h$</li>
<li>$I_{K}(V, n) = g_{K}(n)(VV_{K})$, with $V_{K} = 12$ and $g_{K}(n) = 36n^{4}$</li>
<li>$I_{L}(V) = 0.3(VV_{L})$, with $V_{L}=10.6$</li>
</ul>
<p>These variables $n, m,$ and $h$ still have to be dealt with. The rates of change for these variables depends on functions of voltage (alphas and betas), and are written as so:</p>
<center>
$\begin{align*}
\dfrac{\text dn}{\text dt} &= \alpha_n(V) (1n)  \beta_n(V) n \\
\dfrac{\text dm}{\text dt} &= \alpha_m(V)(1m)  \beta_m(V) m \\
\dfrac{\text dh}{\text dt} &= \alpha_h(V) (1h)  \beta_h(V) h
\end{align*}$
</center>
<p><br /></p>
<p>The alpha and beta functions are shown here:</p>
<center>
$\begin{align*}
\alpha_n(V) &= \dfrac{0.01(10V)}{\exp \left[ 0.1 (10V) \right]  1}, \quad& \alpha_m(V) &= \dfrac{0.1(25V)}{\exp \left[ 0.1 (25V) \right]  1}, \quad& \alpha_h(V) &= 0.07 \exp \left[ \dfrac{V}{20} \right], \\
\beta_n(V) &= 0.125 \exp \left[ \dfrac{V}{80} \right], \quad& \beta_m(V) &= 4 \exp \left[  \dfrac{V}{18} \right], \quad& \beta_h(V) &= \dfrac{1}{\exp \left[ 0.1( 30V) \right] + 1}.
\end{align*}$
</center>
<p><br /></p>
<p>In this particular example, the stimulus will be provided to the neuron at certain intervals</p>
<center>
$\begin{align*}
s(t) = \begin{cases}
150, & t \in [10,11] \cup [20,21] \cup [30,40] \cup [50,51] \cup [53,54] \\
& \phantom{t \in [} \cup [56,57] \cup [59,60] \cup [62,63] \cup [65,66] \\
0 & \text{otherwise}.
\end{cases}
\end{align*}$
</center>
<p><br /></p>
<p>Lastly, initial values must be dealt with. We have $V_{0} = 0$ (remember, $V$ represents the <i>offset</i> from the resting membrane potential), and $n_0, m_0, h_0$ can be approximated with by setting $V = 0$ and setting each $n, m, h$ to their asymptotic values:</p>
<center>
$\begin{align*}
n_0 &= \dfrac{\alpha_n(V_0)}{\alpha_n(V_0) + \beta_n(V_0)} \\
m_0 &= \dfrac{\alpha_m(V_0)}{\alpha_m(V_0) + \beta_m(V_0)} \\
h_0 &= \dfrac{\alpha_h(V_0)}{\alpha_h(V_0) + \beta_h(V_0)}
\end{align*}$
</center>
<p><br /></p>
<h2>Simulating a HodgkinHuxley Neuron</h2>
<p>Creating the HodgkinHuxley model through code first requires a class or function (using the Python coding language in this case) that is able to create graphs with just differential equations. Euler estimation is the method that is used to do just that. At its very core, the Euler estimator takes the rate of change of a function at a point and predicts the next point based on that rate of change and the userinputted step size (rate of change times step size gives us the change from the original point to the next point). As the step size decreases, the graph becomes more accurate.</p>
<p>For the Euler estimator used in this case, the user inputs rates of change for each dependent variable, and uses that to calculate the variables needed to graph. Point output is in the form $(t,x),$ where $t$ represents time and $x$ can represent multiple dependent variables via Python dictionary. The estimator will graph time vs. each dependent variable.</p>
<p>All that is left to do is turn the differentials and functions into Python functions, and use the Euler estimator to graph the model.</p>
<p>Here are some examples of turning the equations into code. There is one Python function for every “part” of the model: alphas and betas, initial values, currents, etc.</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"><span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">alpha_n</span>(t,x):
V <span style="color: #333333">=</span> x[<span style="backgroundcolor: #fff0f0">'V'</span>]
<span style="color: #008800; fontweight: bold">return</span> (<span style="color: #6600EE; fontweight: bold">0.01</span><span style="color: #333333">*</span>(<span style="color: #0000DD; fontweight: bold">10</span><span style="color: #333333"></span>V)) <span style="color: #333333">/</span> (math<span style="color: #333333">.</span>exp(<span style="color: #6600EE; fontweight: bold">0.1</span><span style="color: #333333">*</span>(<span style="color: #0000DD; fontweight: bold">10</span><span style="color: #333333"></span>V)) <span style="color: #333333"></span> <span style="color: #0000DD; fontweight: bold">1</span>)
n_0 <span style="color: #333333">=</span> alpha_n(<span style="color: #0000DD; fontweight: bold">0</span>,{<span style="backgroundcolor: #fff0f0">'V'</span>:<span style="color: #0000DD; fontweight: bold">0</span>}) <span style="color: #333333">/</span> (alpha_n(<span style="color: #0000DD; fontweight: bold">0</span>,{<span style="backgroundcolor: #fff0f0">'V'</span>:<span style="color: #0000DD; fontweight: bold">0</span>}) <span style="color: #333333">+</span> beta_n(<span style="color: #0000DD; fontweight: bold">0</span>,{<span style="backgroundcolor: #fff0f0">'V'</span>:<span style="color: #0000DD; fontweight: bold">0</span>}))
<span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">I_Na</span>(t,x):
V <span style="color: #333333">=</span> x[<span style="backgroundcolor: #fff0f0">'V'</span>]
m <span style="color: #333333">=</span> x[<span style="backgroundcolor: #fff0f0">'m'</span>]
h <span style="color: #333333">=</span> x[<span style="backgroundcolor: #fff0f0">'h'</span>]
<span style="color: #008800; fontweight: bold">return</span> <span style="color: #0000DD; fontweight: bold">120</span><span style="color: #333333">*</span>(m<span style="color: #333333">**</span><span style="color: #0000DD; fontweight: bold">3</span>)<span style="color: #333333">*</span>h<span style="color: #333333">*</span>(V <span style="color: #333333"></span> V_Na)
<span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">stim</span>(t):
<span style="color: #008800; fontweight: bold">if</span> <span style="color: #0000DD; fontweight: bold">10</span><span style="color: #333333"><=</span>t<span style="color: #333333"><=</span><span style="color: #0000DD; fontweight: bold">11</span> <span style="color: #000000; fontweight: bold">or</span> <span style="color: #0000DD; fontweight: bold">20</span><span style="color: #333333"><=</span>t<span style="color: #333333"><=</span><span style="color: #0000DD; fontweight: bold">21</span> <span style="color: #000000; fontweight: bold">or</span> <span style="color: #0000DD; fontweight: bold">30</span><span style="color: #333333"><=</span>t<span style="color: #333333"><=</span><span style="color: #0000DD; fontweight: bold">40</span> <span style="color: #000000; fontweight: bold">or</span> <span style="color: #0000DD; fontweight: bold">50</span><span style="color: #333333"><=</span>t<span style="color: #333333"><=</span><span style="color: #0000DD; fontweight: bold">51</span> <span style="color: #888888"># (etc)</span>
<span style="color: #008800; fontweight: bold">return</span> <span style="color: #0000DD; fontweight: bold">150</span>
<span style="color: #008800; fontweight: bold">return</span> <span style="color: #0000DD; fontweight: bold">0</span>
<span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">dV_dt</span>(t,x):
s <span style="color: #333333">=</span> stim(t)
Na_curr <span style="color: #333333">=</span> I_Na(t,x)
K_curr <span style="color: #333333">=</span> I_K(t,x)
L_curr <span style="color: #333333">=</span> I_L(t,x)
<span style="color: #008800; fontweight: bold">return</span> (<span style="color: #0000DD; fontweight: bold">1</span><span style="color: #333333">/</span>C) <span style="color: #333333">*</span> (s <span style="color: #333333"></span> Na_curr <span style="color: #333333"></span> K_curr <span style="color: #333333"></span> L_curr)
</pre></div>
</font>
<p><br /></p>
<p>The end product looks something like this:</p>
<center><img src="https://eurisko.us/images/blog/simulatingabiologicalneuralnetworkusingthehodgkinhuxleymodel3endproduct.png" style="border: none; height: 20em;" alt="icon" /></center>
<p><br /></p>
<h2>Analyzing the Model</h2>
<p>Now that we have the actual model, we can analyze how it relates to all the information about neurons above, as well as what the model could imply.</p>
<p>If we look at the first burst of stimulus (between $t=10$ and $t=11$), we can see the most basic reaction to stimulus: a sudden spike caused by both depolarization and the stimulus provided (two factors), a more gradual decrease in voltage as the repolarizes (it’s slower than the increase because there is only one factor repolarizing the neuron), and then where the neuron becomes hyperpolarized. This occurs at both $t=20$ and $t=30$ as well, but the example for $t=30$ will be discussed in the following paragraph.</p>
<p>The prolonged stimulus between $t=30$ and $t=40$ causes something interesting to occur in the neuron. The first thing to be noticed is that the neuron initially repolarizes at a slower rate. This is due to constant voltage being added into the system while the neuron is repolarizing. The reason the voltage doesn’t keep going down is because it about reaches the threshold where the cell can once again depolarize (55 mV, which would be represented as 15 mV on the graph). The odd halfspike that starts around $t=33$ is the second thing to be analyzed. The reason the voltage through the neuron does not increase as quickly as previous increases is because the neuron is still repolarizing (losing voltage) as the stimulus is providing voltage. And the little spike eventually reaches the point where sodium channels just all close, and depolarization stops.</p>
<p>The example shown in the preceding paragraph shows what happens when stimulus is constantly being poured into the system. But what happens when stimulus is provided in many short bursts? This question can be answered by analyzing the neuron’s behavior at $t = 50, 53, 56, 59, 62, 65$. The first burst of stimulus looks normal, but since the second burst of stimulus came before the neuron could properly repolarize (and while it was hyperpolarized), the amount of depolarization was very low. The next burst of stimulus (third one) got a better reaction because the neuron was less hyperpolarized, but it was still less than what we would normally expect. The fourth and sixth bursts are similar to the second, and the fifth one to the first and third.</p>Justin HongTo understand the HodgkinHuxley model, we must first understand what a neuron is. A neuron is a type of cell that is most prominently found in nerves and the brain, and neurons the primary building blocks of the nervous system. Neurons are connected by synapses, which allow signals to be sent and received rapidly and precisely.The Ultimate High School Computer Science Sequence: 9 Months In20210221T00:00:0008:0020210221T00:00:0008:00https://eurisko.us/theultimatehighschoolcomputersciencesequence9monthsin<p><i>The Eurisko sequence started during the summer of 2020 with an initial cohort of 5 high school students, all aged 1516 years old and entering their junior year (11th grade). The content of this sequence similar to what would be covered in upperlevel undergraduate courses (e.g. data structures/algorithms ranging from linked lists & sorting algorithms to graphs & traversals), and some content may even be beyond (e.g. building a machine learning library in Python from the ground up). The students build everything from scratch: for example, instead of using external libraries like numpy or pandas, the students built their regressors and classifiers on top of matrix and dataframe classes that they wrote themselves.</i></p>
<hr />
<p>Last June, <a class="body" href="https://eurisko.us/jasonroberts" target="_blank">Jason Roberts</a>, the founder of <a class="body" href="https://www.mathacademy.us/" target="_blank">Math Academy</a> and one of the original developers of Uber’s realtime technology, most widely known for coining the term <a class="body" href="https://www.google.com/search?q=luck+surface+area" target="_blank">“luck surface area”</a>, asked me teach his son Colby some computer science. Colby was just finishing up his sophomore year in <a class="body" href="http://www.theappacademy.us/index.html" target="_blank">App Academy</a>, but Jason felt that the curriculum was not geared towards students who had a strong aptitude in the subject, and that Colby could and should be learning a lot more. Plus, it’s the pandemic, and many extracurriculars have been shut down.</p>
<p>As I’ve come to expect with Jason, that initial idea grew quickly: he pulled in some of Colby’s buddies who had the necessary mathematical background, and we put together a summer computer science group that met Mon / Wed / Fri with ~10 hours of problem sets each week. Long story short (which I’ll elaborate on in a later post), the kids made progress faster than either of us could have possibly expected, and now App Academy is funding an official high school class in which a second cohort has joined the ranks.</p>
<p>It’s been 9 months since our initial summer crew (<a class="body" href="https://eurisko.us/colbyroberts" target="_blank">Colby</a>, <a class="body" href="https://eurisko.us/rileypaddock" target="_blank">Riley</a>, <a class="body" href="https://eurisko.us/georgemeza" target="_blank">George</a>, <a class="body" href="https://eurisko.us/davidgieselman" target="_blank">David</a>, and <a class="body" href="https://eurisko.us/elijahtarr" target="_blank">Elijah</a>) first started meeting, and in those 9 months, the students have gone from initially not knowing how to write helper functions, to building a machine learning library (and numerous other things) from scratch. This post is meant to summarize what we’ve done so far and what our plans are for the future.</p>
<p>All the relevant problem sets, quizzes/tests, and class recordings are documented in the class pages for <a class="body" href="https://eurisko.us/computationandmodeling2020summer" target="_blank">Computation & Modeling (Summer 2020)</a> and <a class="body" href="https://eurisko.us/machinelearning202021" target="_blank">Machine Learning (202021)</a>. Throughout this post I’ve included links to some of the more noteworthy problems that the students have completed, but any nonlinked problems can also be found on those class pages. The students also have GitHub repositories which can be found on their Eurisko pages.</p>
<p>Lastly, before we dive in, here is a bit of important background information:</p>
<ul>
<li>These students are currently high school juniors (1617 years old).</li>
<li>Whenever I say "implemented" or "built", I mean from scratch. The students aren't allowed to use external libraries. They have to build everything themselves. We've been working primarily in Python (though recently, we've also introduced C++ and Haskell). The students collaborate a lot, but every student writes every problem up on their own. We eat what we kill.</li>
<li>The students are all mathematically advanced and have learned at least through linear algebra and multivariable calculus. Most are in Math Academy, which means that by 11th grade, they've also learned much of differential equations, discrete math, and abstract algebra.</li>
<li>Most of the students had very little programming experience prior to Eurisko. Something as simple as checking if a string was a palindrome was not trivial to them. They didn't know how to write classes and helper functions, how to work with dictionaries, or even how to systematically debug their code. They've come a long way in a short time!</li>
</ul>
<h2>Why We're Doing It</h2>
<p>We want to teach students the art and craft of software development, while simultaneously giving equal weighting to the discipline of formal computer science and leveraging the advanced mathematics that the students have been learning through Math Academy. We also want the course to be very fun.</p>
<ul>
<li><i>The art and craft of software development.</i> We want the students to be good enough to do an internship at a tech company and be productive (as opposed to burdensome). This means they need to know how to write and debug code given highlevel requirements, test their code effectively, and use source control like GitHub.</li>
<li><i>Leveraging mathematics.</i> We're interested in teaching the students a lof of machine learning because frankly, it's super cool, and it builds on their strengths. We want to pull the powerful and interesting tools of mathematics into practical usecases in the context of modeling and prediction (as opposed to just building CRUD apps).</li>
<li><i>Formal computer science.</i> In addition to becoming competent software developers, we wanted to teach the students undergraduatelevel computer science (data structures / algorithms, programming languages, etc) so that no matter how competitive or rigorous the undergraduate curriculum they encounter, they'll be overprepared and find it easygoing (as opposed to overwhelming/intimidating).</li>
</ul>
<p>We chose Python as our primary programming language since it’s one of the most productive multipurpose languages, it’s a great learning language, and these days it’s the lingua franca of machine learning. But we also wanted to expose the students to the basics of multiple programming languages, so we picked C++ and Haskell as two other languages that would stretch the students in other ways. C++ forces them to think in terms of how the machine actually works, whereas Haskell represents a much more abstract idealized conception of computation. C++ and Haskell are also the kinds of languages that the students might run into early in an undergraduate computer science program, and if they’re not prepared, they could easily find themselves frustrated and struggling. So we’re taking preventative measures.</p>
<p>We’re also having the students learn SQL, since it’s an example of a different category of language: declarative, as opposed to imperative (Python, C++) or functional (Haskell). It also happens to be incredibly useful. In addition to learning SQL, the students will also build their own SQL parser, which will give them further insight into how programming languages are structured and what goes on behind the scenes when you run a program.</p>
<h2>What We've Done</h2>
<p>Here is a list of the main topics we’ve covered. We pulled the basics from MIT’s <a class="body" href="https://mitpress.mit.edu/books/introductioncomputationandprogrammingusingpythonsecondedition" target="_blank">Introduction to Computation and Programming Using Python</a>, and then added on a bunch of more advanced topics.</p>
<p><b>Algorithms and Data Structures.</b> The students have implemented linked lists, stacks, queues, sorting algorithms (selection sort, bubble sort, heapsort, mergesort, quicksort) and have used recursion in many different contexts. They’ve also implemented trees, undirected graphs, directed graphs, and weighted graphs, along with methods for computing depthfirst / breadthfirst traversals and distance / shortest path between two nodes (i.e. Dijkstra’s algorithm). They’ve also implemented a simple version of backtracking to solve a <a class="body" href="https://eurisko.us/files/all_problems_iteration_1.html#Problem441" target="_blank">magic square</a> and a mini sudoku puzzle. (Elijah wrote about the magic square problem <a class="body" href="https://eurisko.us/solvingmagicsquaresusingbacktracking/" target="_blank">here</a>.)</p>
<p><b>Optimization.</b> The students have implemented Newton’s method, gradient descent (both singlevariable and multivariable), and grid search. They’ve also implemented randomized hill climbers in the context of the 8queens problem.</p>
<p><b>Probability and Statistics.</b> The students have worked with many different types of distributions and have done some <a class="body" href="https://eurisko.us/files/all_problems_iteration_1.html#Problem411" target="_blank">basic Bayesian inference</a> (for example, if you have a set of numbers randomly selected from below some upper bound, the students can construct a confidence interval for the upper bound). The students also write up their problems in LaTeX (using <a class="body" href="https://www.overleaf.com/" target="_blank">Overleaf</a>).</p>
<p><b>Matrix, DataFrame, and EulerEstimator.</b> The students have built their own Matrix class from scratch, which includes methods to compute RREF, inverse, determinant, etc. They also built a DataFrame class which they use to manipulate datasets, and they built an EulerEstimator class that they use to simulate systems of differential equations.</p>
<p><b>Machine Learning.</b> The students have built linear/logistic regressors, a naive Bayes classifier, Gini decision trees, simple random forests, and they have just started building neural networks. These machine learning models all run on top of the Matrix, DataFrame, and Graph classes that the students had previously built. The students have also used these models for some <a class="body" href="https://eurisko.us/files/all_problems_iteration_1.html#Problem352" target="_blank">elementary prediction tasks</a> that have required the use of dummy variables and interaction terms. (George, Colby, and Riley wrote a 3part series about linear and logistic regression: <a class="body" href="https://eurisko.us/linearandlogisticregressionpart1understandingthemodels/" target="_blank">part 1</a>, <a class="body" href="https://eurisko.us/linearandlogisticregressionpart2fittingthemodels/" target="_blank">part 2</a>, and <a class="body" href="https://eurisko.us/linearandlogisticregressionpart3categoricalvariablesinteractiontermsandnonlineartransformationsofvariables/" target="_blank">part 3</a>.)</p>
<p><b>Differential Equations.</b> After building an EulerEstimator from scratch, the students used it to simulate several systems of differential equations: a predatorprey model (which David wrote about <a class="body" href="https://eurisko.us/predatorpreymodelingwitheulerestimation/" target="_blank">here</a>), the SIR epidemiological model, the <a class="body" href="https://eurisko.us/files/all_problems_iteration_1.html#Problem522" target="_blank">HodgkinHuxley neuron</a> (which won the Nobel Prize in the 1960s), and a network of HodgkinHuxley neurons connected together.</p>
<p><b>ObjectOriented Programming.</b> In addition to implementing numerous classes in the context of algorithms / data structures and machine learning, the students have been implementing the Space Empires board game along with intelligent agents that battle against each other. Space Empires is incredibly rich and complex and will be discussed extensively later in this post.</p>
<p><b>Programming Languages.</b> The students have recently started learning C++, Haskell, Shell, and SQL. So far, the exercises they’ve completed have been simple HackerRankstyle problems.</p>
<p><b>Writing.</b> The students each wrote a blog post last semester (the posts are linked in the categories above). Elijah’s post made it to the front page of Hacker News last weekend (<a class="body" href="https://news.ycombinator.com/item?id=26126652" target="_blank">link</a>).</p>
<h2>Space Empires</h2>
<p>The Space Empires game is incredibly complicated, which is in part why we chose it as the “big project” for the class. It’s provided the students with extensive practice planning, writing, and debugging code that’s spread over multiple directories and files. The <a class="body" href="https://www.gmtgames.com/spaceemp/SERules4B.pdf" target="_blank">rule book</a> is incredibly dense, but here is the gist of how the game works:</p>
<ul>
<li>There are 2 players on a grid. Each player starts with a "home colony" and some initial ships, and their goal is to destroy the opponent's home colony by sending ships to attack it.</li>
<li>Players have a currency called Construction Points (CPs). Players receive CP income from their home colony on each turn, and they can use these CPs to buy more ships or "technology" for future ships. Technology supplements ships' stats  for example, buying movement technology would allow ships to move more spaces at once, and buying attack or defense technology would increase the attack of defense strength of ships during combat. Players also have to pay maintenance costs on their ships each round (if they don't pay the maintenance cost for a ship, they lose the ship).</li>
<li>After both players move their ships, combat occurs at any grid square that contains ships from both players. Combat proceeds in rounds until only one player's ships remain at that spot.</li>
<li>During each round of combat, a "combat order" is constructed, in which ships are sorted by their attack class. The first ship in the combat order can attack any other ship. A 10sided die is rolled, and if the attacker's (attack strength + attack technology) minus the defender's (defense strength + defense technology) is less than or equal to the die roll, then a hit is scored. Once a ship sustains a number of hits equal to its hull size, it is destroyed. This procedure is repeated for each ship in the combat order.</li>
<li>At the end of a round of combat, if there are still ships from both teams left over, another round of combat begins. Combat continues until only one team's ships occupy the square.</li>
</ul>
<p>There are many other details. I won’t mention them all here, but here are a few examples to get the point across:</p>
<ul>
<li>Players can send colonyships to colonize other planets. Then, players can collect CPs and build ships at those colonies.</li>
<li>The number of ships that a player can build on any given turn is limited by the player's number of shipyards at each colony.</li>
<li>During combat, if two ships have the same tactics level, then the defending ship attacks first. (The defending ship is the ship that was the first to occupy the grid square).</li>
<li>Each round of combat starts with "ship screening", in which a player with more ships is given the opportunity to remove its ships from the combat round (but the number of ships that are left in combat must be at least the number of ships that the opponent has in that square).</li>
</ul>
<p>In our initial approach to implementing the game, we tried to implement the main rules of the game along with a subset of the more interesting details. Along the way, we created a couple simple strategy players to test that our games gave the same results, and it seemed like things were going fine. But once the students built more complex custom strategies and tried to have the custom strategies battle against each other, we ran into tons of edgecases and details that we hadn’t otherwise considered, and everyone’s game implementations were giving different results. (Each student has their own implementation of the game.) After several weeks of attempting to reconcile the discrepancies in our games, we decided to peel back to a much simpler version of the game, reconcile any discrepancies on that simple version, and gradually work our way back up to the full implementation, continuing to reconcile discrepancies at each “level” along the way.</p>
<p><b>Level 1.</b> We started off with the simplest game imaginable: each player has 3 scout units and that’s it. There was no economic phase, no CP, no technology. Level 1 consisted of each player moving their 3 units and engaging in combat, and that’s it. We created several strategy players, matched them up against each other, and engaged in pair coding sessions until all of the discrepancies in the outcomes were resolved.</p>
<p><b>Level 2.</b> We extended level 1 by introducing a single economic phase at the very beginning of the game, having players start with 3 shipyards in addition to 3 scouts (these are the normal starting conditions of the game), and allowing players to buy technology and/or more scouts during the single economic phase. This way, players would get some CP income from their home colony and have to make a choice between spending it all on a couple basic scouts, or buying just one scout with upgraded technology. Again, we created several basic strategy players and resolved any discrepancies in the outcomes of matchups.</p>
<p>Level 2 was also where we started competing with custom strategies. It turned out that the best strategy was to buy as many scouts as possible, wait for the opponent to attack, and then send all of one’s scouts on a direct path to attack the opponent’s home base once the opponent’s scouts had all been destroyed. We’ll call this the “camper” strategy because its units “camp” at the home colony and wait for the opponent to attack first.</p>
<p>The camper strategy exploited the fact that, when two units of the same tactics level are involved in combat, the defending ship gets to attack first. By waiting for the opponent to travel to the camper’s home colony, the camper was able to attack the opponent first. Additionally, because shipyards at a player’s home colony can engage in combat, the camper not only attacked first, but also had twice as many ships in the initial combat. These advantages gave the camper a much higher probability of winning the initial combat and destroying the opponent’s scouts, which in turn gave the camper a much higher probability of winning the second combat once it sent its scouts to the opponent’s home base.</p>
<p><b>Level 3.</b> We’re currently implementing level 3. Level 3 extends level 2 by introducing repeated economic phases. This means that the players get CP income on every turn, and have the opportunity to buy technology and/or more scouts on each turn. The optimal strategy is likely similar to the camper strategy from level 2, but it’s not entirely obvious what the best thing to do is while camping, and when the camper should pull the trigger and rush the opponent.</p>
<ul>
<li>If the opponent is going to attack the camper quickly, then it's in the camper's best interest to ignore technology and just buy as many scouts as possible. That way, it can outnumber the opponent.</li>
<li>If the opponent is going to wait a while before attacking the camper, then it's in the camper's best interest to first buy technology and then buy scouts only after all technology has been upgraded to the maximum. The reasoning for this strategy depends on a couple nuances: 1) because of maintenance costs, a player cannot maintain infinitely many scouts, and 2) a ship inherits the technology from the player at the time of building. This way, the resulting army of scouts will be both maximally large and equipped with maximum technology.</li>
<li>If the opponent is going to wait for the camper to attack first, then the camper may be able to exploit the opponent's way of detecting that the camper has attacked. For example, if the opponent attacks right after the camper does, then the camper can just send a single scout over to the opponent (as a sacrifice) and keep the rest of its scouts camped at the home base for the defenderattacksfirst advantage. On the other hand, if the opponent refuses to attack until all of the camper's scouts are destroyed, then the camper can repeatedly build an army of scouts and send all but one of them to attack the opponent.</li>
</ul>
<p>We’re now at a stage where the optimal strategy is no longer obvious.</p>
<h2>How Well is it Working?</h2>
<p>Again, the kids are making progress faster than either Jason or I could have possibly expected. But they’re also having more fun than either Jason or I could have possibly expected, too. Class often runs over time due to interesting discussions surrounding Space Empires, and I have to regularly tell kids to leave class to go to their other classes. The kids are on Slack all the time, and I’ve heard them mention that “it’s the only class that matters.” It’s their hardest class, by far – even compared to their Math Academy classes (where they’re studying upperdivision college math), which is in turn far more advanced than their AP classes. But they’re having a blast.</p>Justin SkycakThe Eurisko sequence started during the summer of 2020 with an initial cohort of 5 high school students, all aged 1516 years old and entering their junior year (11th grade). The content of this sequence similar to what would be covered in upperlevel undergraduate courses (e.g. data structures/algorithms ranging from linked lists & sorting algorithms to graphs & traversals), and some content may even be beyond (e.g. building a machine learning library in Python from the ground up). The students build everything from scratch: for example, instead of using external libraries like numpy or pandas, the students built their regressors and classifiers on top of matrix and dataframe classes that they wrote themselves.Linear and Logistic Regression, Part 1: Understanding the Models20201221T00:00:0008:0020201221T00:00:0008:00https://eurisko.us/linearandlogisticregressionpart1understandingthemodels<p><i>Note: This post is part 1 of a 3part series: <a class="body" target="_blank" href="https://eurisko.us/linearandlogisticregressionpart1understandingthemodels/">part 1</a>, <a class="body" target="_blank" href="https://eurisko.us/linearandlogisticregressionpart2fittingthemodels/">part 2</a>, <a class="body" target="_blank" href="https://eurisko.us/linearandlogisticregressionpart3categoricalvariablesinteractiontermsandnonlineartransformationsofvariables/">part 3</a>.</i></p>
<p>Regression is when you measure specific data points and fit a function to the trend. This can be used to establish connections between known variables and uncertainties, like the probability of a heart attack occurring via known traits. Another example could be determining the perfect amount of something, like the perfect amount of toppings on a pizza. You can relate the amount of toppings with customer satisfaction and determine an average amount of toppings that would lead to best reviews from customers.</p>
<p>There are two main types of regression I’m going to talk about, linear and logistic. Linear regression comes in the form of a straight line:</p>
<center><img src="https://eurisko.us/images/blog/linearandlogisticregressionpart1understandingthemodels1.png" style="border: none; height: 20em;" alt="icon" /></center>
<p><br /></p>
<p>Linear regression can be modeled with this equation:</p>
<center>
$\begin{align*}
y = \beta_0 + \beta_1x
\end{align*}$
</center>
<p><br /></p>
<p>Logistic regression is a form of regression that comes in a sigmoid shape and has an upper and lower limit. This sigmoid shape starts at the lower limit, but once it increases and goes towards the higher limit, it levels out again, forming the graph’s slike shape.</p>
<center><img src="https://eurisko.us/images/blog/linearandlogisticregressionpart1understandingthemodels2.png" style="border: none; height: 20em;" alt="icon" /></center>
<p><br /></p>
<p>Logistic regression can be modeled with this equation:</p>
<center>
$\begin{align*}
y = \dfrac{1}{1+e^{\beta x}}
\end{align*}$
</center>
<p><br /></p>
<p>This is the standard onevariable equation but both can be changed to fit multiple variables. Linear becomes</p>
<center>
$\begin{align*}
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 +\ldots,
\end{align*}$
</center>
<p><br /></p>
<p>and logistic becomes</p>
<center>
$\begin{align*}
y = \dfrac{1}{1+e^{\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \ldots}}.
\end{align*}$
</center>
<p><br /></p>
<p>These regressions can be used to fit different scenarios. The crucial difference between the regressions is their limits.</p>
<p>For example, say we want to model the population of a species in a new environment. This would be a clear cut case of using logistic over linear because you can’t have the people of a species grow forever due to environmental constraints. This is where we should choose logistic regression over linear regression.</p>
<p>Linear regression, on the other hand, doesn’t have bounds and is used to model different scenarios. Say you’re modeling experience with a game versus how many points you can score. The more you practice, the better you get, and the more points you can score. You could also use linear regression to model the yield of crops depending on how much water or fertilizer you use.\newline</p>
<h2>More on Logistic Regression</h2>
<p>Logistic regression is also perfect for determining probabilities. The reason you can use it to predict probabilities is due to its limits. Probability must be between $0\%$ and $100\%.$ A specific example of this is the weather. You can determine the chance of rain or sunshine using past weather patterns. Linear regression can’t model probabilities because there are no limits. There isn’t a $200\%$ of something happening.\newline</p>
<p>Logistic regression also doesn’t have to be in the bounds of 0 to 1, though that is normal. Remember this equation:</p>
<center>
$\begin{align*}
y = \dfrac{1}{1+e^{\beta x}}
\end{align*}$
</center>
<p><br /></p>
<p>If you change the numerator, then that is how you can change the upper limit of your regression.</p>
<center>
$\begin{align*}
y = \dfrac{U}{1+e^{\beta x}}
\end{align*}$
</center>
<p><br /></p>
<p>In the above equation, $U$ is the upper limit. This can be used for, say, a movie rating system that goes from 0 to 10 or 0 to 5 stars.</p>
<p>Not only can you change the upper bound, but you can also change the lower bound. Here is a generalized formula to fit a logistic regression for bounds of your choice:</p>
<center>
$\begin{align*}
y = L + \dfrac{U  L}{1+e^{\beta x}}
\end{align*}$
</center>
<p><br /></p>
<p>In this equation, $U$ is your upper limit, and $L$ is your lower limit. We would want to change the limits for different scenarios, just like usual.</p>
<p>For example, say you have a crowd of people and you want to predict the direction in which the crowd will move. This could go from $180^\circ$ to $180\circ$ where $0^\circ$ represents straight ahead, $90^\circ$ represents left, and $90^\circ$ represents right. There are many different scenarios in which you wouldn’t want the standard $0$ to $1$ bounds.
<br /></p>
<p><i>This post is part 1 of a 3part series. <a class="body" target="_blank" href="https://eurisko.us/linearandlogisticregressionpart2fittingthemodels/">Click here to continue to part 2.</a></i></p>George MezaNote: This post is part 1 of a 3part series: part 1, part 2, part 3.Linear and Logistic Regression, Part 2: Fitting the Models20201221T00:00:0008:0020201221T00:00:0008:00https://eurisko.us/linearandlogisticregressionpart2fittingthemodels<p><i>Note: This post is part 2 of a 3part series: <a class="body" target="_blank" href="https://eurisko.us/linearandlogisticregressionpart1understandingthemodels/">part 1</a>, <a class="body" target="_blank" href="https://eurisko.us/linearandlogisticregressionpart2fittingthemodels/">part 2</a>, <a class="body" target="_blank" href="https://eurisko.us/linearandlogisticregressionpart3categoricalvariablesinteractiontermsandnonlineartransformationsofvariables/">part 3</a>.</i></p>
<p>This is a blog post exploring how to fit linear and logistic regressions. First, note that linear and logistic regressors have different shapes. The shape of linear regression is a line, while the shape of logistic regression is a sigmoid:</p>
<center><img src="https://eurisko.us/images/blog/linearandlogisticregressionpart2fittingthemodels1.png" style="border: none; height: 20em;" alt="icon" /></center>
<p><br /></p>
<p>Also note that the same procedure can be used to fit a linear or logistic regressor, because the logistic equation can be rearranged to become a linear one.</p>
<center>
<table style="width:80%">
<tr>
<td width="50%"><b><center>Linear Function</center></b></td>
<td width="50%"><b><center>Logistic Function</center></b></td>
</tr>
<tr>
<td><center>$\beta_0 + \beta_1 x_1 + \ldots + \beta_n x_n=y$</center></td>
<td><center>$\begin{align*}
\dfrac{1}{1+e^{\beta_0 + \beta_1 x_1 + \ldots + \beta_n x_n}}&=y \\
\beta_0 + \beta_1 x_1 + \ldots + \beta_n x_n &= \ln\left(\dfrac{1}{y}1\right)
\end{align*}$</center></td>
</tr>
</table>
</center>
<p><br /></p>
<p>Let $y^\prime = y$ for the case of a linear regression, and $y^\prime = \ln\left(\dfrac{1}{y}1\right)$ for the case of a logistic regression. Then, we need to fit the regression to the following dataset:</p>
<center>
$\begin{align*}
\left\{ \begin{matrix} (x_{11}, & x_{12}, & \ldots & x_{1n}, & y_1^\prime) \\ (x_{21}, & x_{22}, & \ldots & x_{2n}, & y_2^\prime) \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ (x_{m1}, & x_{m2}, & \ldots & x_{mn}, & y_m^\prime) \end{matrix} \right\}
\end{align*}$
</center>
<p><br /></p>
<p>So, we need to solve the matrix equation</p>
<center>
$\begin{align*}
\begin{pmatrix} 1 & x_{11} & x_{12} & \ldots & x_{1n} \\ 1 & x_{21} & x_{22} & \ldots & x_{2n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{m1} & x_{m2} & \ldots & x_{mn} \end{pmatrix} \begin{pmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_m \end{pmatrix} &\approx \begin{pmatrix} y_1^\prime \\ y_2^\prime \\ \vdots \\ y_m^\prime \end{pmatrix}.
\end{align*}$
</center>
<p><br /></p>
<p>We can put this in equation form and perform operations to isolate $\vec{\beta}\mathbin{:}$</p>
<center>
$\begin{align*}
\mathbf{X} \vec{\beta} &\approx \vec{y}\\
\mathbf{X} \vec{\beta} &\approx \vec{y} \\
\mathbf{X}^T \mathbf{X} \vec{\beta} &\approx \mathbf{X}^T \vec{y} \\
\vec{\beta} &\approx \left( \mathbf{X}^T \mathbf{X} \right)^{1} \mathbf{X}^T \vec{y}
\end{align*}$
</center>
<p><br /></p>
<p>This way of finding $\vec{\beta}$ involves using the pseudoinverse, $\left( \mathbf{X}^T \mathbf{X} \right)^{1} \mathbf{X}^T.$ A matrix is not invertible unless it is square and we cannot guarantee this for $\mathbf{X},$ so we must take the pseudoinverse. By multiplying a $\mathbf{X}$ by its transpose, we can ensure that the result is square, and therefore, we can compute the inverse. Using the pseudoinverse minimizes the sum of squared error between the desired output $\vec{y}$ and the actual output $\mathbf{X}\vec{\beta}.$</p>
<p>For example, let’s fit a logistic regression to a medical data set</p>
<center>
$\begin{align*}
[(0, 0, 0.1), (1, 0, 0.2), (0, 2, 0.5), (4,5,0.6)]
\end{align*}$
</center>
<p><br /></p>
<p>which takes the form</p>
<center>
$\begin{align*}
(\textrm{amount medicine A}, \textrm{amount medicine B}, \textrm{survival probability}).
\end{align*}$
</center>
<p><br /></p>
<p>This data set is for a new medicine where the first column shows the amount of medicine A and the second medicine B. We have data on how well these medicines did when given to patients in differing amounts.</p>
<p>We can fit a logistic regression as follows:</p>
<center>
$\begin{align*}
\mathbf{X} = \begin{pmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 2 \\ 1 & 4 & 5 \end{pmatrix} \ \ \& \ \ \vec{y} = \begin{pmatrix} \ln \left( \dfrac{1}{0.1}  1 \right) \\ \ln \left( \dfrac{1}{0.2}  1 \right) \\ \ln \left( \dfrac{1}{0.5}  1 \right) \\ \ln \left( \dfrac{1}{0.6}  1 \right) \end{pmatrix}
= \begin{pmatrix} \ln \left( 9 \right) \\ \ln \left( 4 \right) \\ \ln \left( 1 \right) \\ \ln \left( \dfrac{2}{3} \right) \end{pmatrix}
\end{align*}$
</center>
<p><br /></p>
<center>
$\begin{align*}
\vec{\beta} &\approx (\mathbf{X}^T \mathbf{X})^{1} \mathbf{X}^T \vec{y} \\
\vec{\beta} &\approx \left(\begin{pmatrix} 1 & 1 & 1 & 1\\ 0 & 1 & 0 & 4 \\ 0 & 0 & 2 & 5 \end{pmatrix} \begin{pmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 2 \\ 1 & 4 & 5 \end{pmatrix}\right)^{1} \begin{pmatrix} 1 & 1 & 1 & 1\\ 0 & 1 & 0 & 4 \\ 0 & 0 & 2 & 5 \end{pmatrix} \begin{pmatrix}\ln \left( 9 \right) \\ \ln \left( 4 \right) \\ \ln \left( 1 \right) \\ \ln \left( \dfrac{2}{3} \right)\end{pmatrix} \\
\vec{\beta} &\approx \begin{pmatrix} 4 & 5 & 7 \\ 5 & 17 & 20 \\ 7 & 20 & 29 \end{pmatrix} ^{1} \begin{pmatrix} 1 & 1 & 1 & 1\\ 0 & 1 & 0 & 4 \\ 0 & 0 & 2 & 5 \end{pmatrix} \begin{pmatrix}\ln \left( 9 \right) \\ \ln \left( 4 \right) \\ \ln \left( 1 \right) \\ \ln \left( \dfrac{2}{3} \right)\end{pmatrix} \\
\vec{\beta} &\approx \begin{pmatrix} 1.567 \\ 0.278 \\ 0.640 \end{pmatrix}
\end{align*}$
</center>
<p><br /></p>
<p>Now we know the logistic $\beta$’s which are $\beta_0 = 1.567 \ \& \ \beta_1 = 0.278 \ \& \ \beta_2 = 0.640,$ so we plug in the variables $x_1, \ \& \ x_2$ into the equation:</p>
<center>
$\begin{align*}
f(x_1,x_2) &=\dfrac{1}{1 + e ^ {\beta_0 + \beta_1 x_1 + \beta_2 x_2}} \\
&=\dfrac{1}{1 + e ^ {1.567 + 0.278 x_1 0.640 x_2}}
\end{align*}$
</center>
<p><br /></p>
<p>Because very little changes from the linear regressor to the logistic regressor, my Python code for the logistic regressor inherits from the linear regressor class and changes only 2 things: it transforms the $\vec{y}$ using $y’ = \ln \left( \dfrac{1}{y}1 \right)$ and puts the $\beta$’s in a sigmoid function rather than a linear function.</p>
<p>First, let’s go through the code for the linear regressor. We start by importing a Matrix class and a Dataframe class that I had written to help process data. Then, we initialize the linear regressor:</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"><span style="color: #008800; fontweight: bold">from</span> <span style="color: #0e84b5; fontweight: bold">matrix</span> <span style="color: #008800; fontweight: bold">import</span> Matrix
<span style="color: #008800; fontweight: bold">from</span> <span style="color: #0e84b5; fontweight: bold">dataframe</span> <span style="color: #008800; fontweight: bold">import</span> DataFrame
<span style="color: #008800; fontweight: bold">import</span> <span style="color: #0e84b5; fontweight: bold">math</span>
<span style="color: #008800; fontweight: bold">class</span> <span style="color: #BB0066; fontweight: bold">LinearRegressor</span>:
<span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">__init__</span>(<span style="color: #007020">self</span>, dataframe, dependent_variable<span style="color: #333333">=</span><span style="backgroundcolor: #fff0f0">'ratings'</span>):
<span style="color: #007020">self</span><span style="color: #333333">.</span>dependent_variable <span style="color: #333333">=</span> dependent_variable
<span style="color: #007020">self</span><span style="color: #333333">.</span>independent_variables <span style="color: #333333">=</span> [column <span style="color: #008800; fontweight: bold">for</span> column <span style="color: #000000; fontweight: bold">in</span> dataframe<span style="color: #333333">.</span>columns <span style="color: #008800; fontweight: bold">if</span> column <span style="color: #333333">!=</span> dependent_variable]
X_dataframe <span style="color: #333333">=</span> dataframe<span style="color: #333333">.</span>select<span style="color: #333333">.</span>columns(<span style="color: #007020">self</span><span style="color: #333333">.</span>independent_variables)
y_dataframe <span style="color: #333333">=</span> dataframe<span style="color: #333333">.</span>select_columns([<span style="color: #007020">self</span><span style="color: #333333">.</span>dependent_variable])
<span style="color: #007020">self</span><span style="color: #333333">.</span>X <span style="color: #333333">=</span> Matrix(X_dataframe<span style="color: #333333">.</span>to_array())
<span style="color: #007020">self</span><span style="color: #333333">.</span>y <span style="color: #333333">=</span> Matrix(X_dataframe<span style="color: #333333">.</span>to_array())
<span style="color: #007020">self</span><span style="color: #333333">.</span>coefficients <span style="color: #333333">=</span> {}
</pre></div>
</font>
<p><br /></p>
<p>The way we would solve to get the $\vec{\beta}$’s is as follows:</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"> <span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">solve_coefficients</span>(<span style="color: #007020">self</span>):
beta <span style="color: #333333">=</span> (((<span style="color: #007020">self</span><span style="color: #333333">.</span>X<span style="color: #333333">.</span>transpose() <span style="color: #FF0000; backgroundcolor: #FFAAAA">@</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>X)<span style="color: #333333">.</span>inverse()) <span style="color: #FF0000; backgroundcolor: #FFAAAA">@</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>X<span style="color: #333333">.</span>transpose()) <span style="color: #FF0000; backgroundcolor: #FFAAAA">@</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>y
<span style="color: #007020">self</span><span style="color: #333333">.</span>set_coefficients(beta)
<span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">set_coefficients</span>(<span style="color: #007020">self</span>, beta):
<span style="color: #008800; fontweight: bold">for</span> i, column_name <span style="color: #000000; fontweight: bold">in</span> <span style="color: #007020">enumerate</span>(<span style="color: #007020">self</span><span style="color: #333333">.</span>dependent_variables):
<span style="color: #007020">self</span><span style="color: #333333">.</span>coefficients[column_name] <span style="color: #333333">=</span> beta[i]
</pre></div>
</font>
<p><br /></p>
<p>In order to find the actual prediction that the regression with the $\beta$’s, we need to plug the $\beta$’s into the regression function. For the linear regressor, this is just a linear function $f(x_1,\ldots, x_n)=\beta_0 + \beta_1 x_1 + \ldots + \beta_n x_n.$</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"> <span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">predict</span>(<span style="color: #007020">self</span>, input_dict):
<span style="color: #008800; fontweight: bold">return</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>regression_function(input_dict)
<span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">regression_function</span>(<span style="color: #007020">self</span>, input_dict):
<span style="color: #008800; fontweight: bold">return</span> <span style="color: #007020">sum</span>([input_dict[key] <span style="color: #333333">*</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>coefficients[key] <span style="color: #008800; fontweight: bold">for</span> key <span style="color: #000000; fontweight: bold">in</span> input_dict])
</pre></div>
</font>
<p><br /></p>
<p>For the logistic regression, it’s the same process but we need to transform the $y$ values:</p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"><span style="color: #008800; fontweight: bold">class</span> <span style="color: #BB0066; fontweight: bold">LogisticRegressor</span>(LinearRegressor):
<span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">__init__</span>(<span style="color: #007020">self</span>, dataframe, dependent_variable<span style="color: #333333">=</span><span style="backgroundcolor: #fff0f0">'ratings'</span>):
<span style="color: #007020">super</span>()<span style="color: #333333">.</span>__init__(dataframe, dependent_variable<span style="color: #333333">=</span><span style="backgroundcolor: #fff0f0">'ratings'</span>)
<span style="color: #007020">self</span><span style="color: #333333">.</span>y <span style="color: #333333">=</span> <span style="color: #007020">self</span><span style="color: #333333">.</span>y<span style="color: #333333">.</span>apply(<span style="color: #008800; fontweight: bold">lambda</span> y: math<span style="color: #333333">.</span>log(<span style="color: #0000DD; fontweight: bold">1</span><span style="color: #333333">/</span>y <span style="color: #333333"></span> <span style="color: #0000DD; fontweight: bold">1</span>))
</pre></div>
</font>
<p><br /></p>
<p>And we use a different regression function:</p>
<center>
$\begin{align*}
f(x_1,\ldots, x_n)=\dfrac{1}{1+e^{\beta_0 + \beta_1 x_1 + \ldots + \beta_n x_n}}
\end{align*}$
</center>
<p><br /></p>
<font size="3em">
<! HTML generated using hilite.me ><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;borderwidth:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; lineheight: 125%"> <span style="color: #008800; fontweight: bold">def</span> <span style="color: #0066BB; fontweight: bold">regression_function</span>(<span style="color: #007020">self</span>, input_dict):
linear_sum <span style="color: #333333">=</span> <span style="color: #007020">sum</span>([gathered_inputs[key] <span style="color: #333333">*</span> coefficients[key] <span style="color: #008800; fontweight: bold">for</span> key <span style="color: #000000; fontweight: bold">in</span> gathered_inputs])
<span style="color: #008800; fontweight: bold">return</span> <span style="color: #0000DD; fontweight: bold">1</span> <span style="color: #333333">/</span> (<span style="color: #0000DD; fontweight: bold">1</span> <span style="color: #333333">+</span> math<span style="color: #333333">.</span>e <span style="color: #333333">**</span> linear_sum)
</pre></div>
</font>
<p><br /></p>
<p><i>This post is part 2 of a 3part series. <a class="body" target="_blank" href="https://eurisko.us/linearandlogisticregressionpart3categoricalvariablesinteractiontermsandnonlineartransformationsofvariables/">Click here to continue to part 3.</a></i></p>Colby RobertsNote: This post is part 2 of a 3part series: part 1, part 2, part 3.Linear and Logistic Regression, Part 3: Categorical Variables, Interaction Terms, and Nonlinear Transformations of Variables20201221T00:00:0008:0020201221T00:00:0008:00https://eurisko.us/linearandlogisticregressionpart3categoricalvariablesinteractiontermsandnonlineartransformationsofvariables<p><i>Note: This post is part 3 of a 3part series: <a class="body" target="_blank" href="https://eurisko.us/linearandlogisticregressionpart1understandingthemodels/">part 1</a>, <a class="body" target="_blank" href="https://eurisko.us/linearandlogisticregressionpart2fittingthemodels/">part 2</a>, <a class="body" target="_blank" href="https://eurisko.us/linearandlogisticregressionpart3categoricalvariablesinteractiontermsandnonlineartransformationsofvariables/">part 3</a>.</i></p>
<p>This blog post will explore categorical variables, interaction terms, and nonlinear transformations variables. All of these topics revolve around linear regression, which is a way of solving for the coefficients of a linear function that best fits a set of data points. If you want to understand how linear regressions work, look at <a class="body" href="https://eurisko.us/linearandlogisticregressionpart1understandingthemodels/" target="_blank">Part 1</a> and <a class="body" href="https://eurisko.us/linearandlogisticregressionpart2fittingthemodels/" target="_blank">Part 2</a>.</p>
<h2>Using Linear and Logistic Regression with Categorical Variables</h2>
<p>For this blog post, you have to understand the general idea of fitting a line to a data set. However, keep an open mind about what the collection of data can be. It is not always as simple as $(x,y)$ coordinates. For most of this post, we will use the example of solving for the rating of a sandwich that can have some number of slices beef, some number of tablespoons of peanut butter, and some condiments (mayo and jelly).</p>
<p>“Condiments” is a categorical variable because it takes nonnumeric values. Each entry is a list of condiments that may include mayo, jelly, both, or neither. In other words, the condiments variable is “categorized” as mayo, jelly, both, or none. An example set of data points is shown below. Each row corresponds to a different sandwich.</p>
<center>
<table style="width:80%">
<tr>
<td width="25%"><b><center>beef</center></b></td>
<td width="25%"><b><center>pb</center></b></td>
<td width="25%"><b><center>condiments</center></b></td>
<td width="25%"><b><center>rating</center></b></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>[]</center></td>
<td><center>$1$</center></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>['mayo']</center></td>
<td><center>$1$</center></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>['jelly']</center></td>
<td><center>$4$</center></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>['mayo', 'jelly']</center></td>
<td><center>$0$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$0$</center></td>
<td><center>[]</center></td>
<td><center>$4$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$0$</center></td>
<td><center>['mayo']</center></td>
<td><center>$8$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$0$</center></td>
<td><center>['jelly']</center></td>
<td><center>$1$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$0$</center></td>
<td><center>['mayo', 'jelly']</center></td>
<td><center>$0$</center></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$5$</center></td>
<td><center>[]</center></td>
<td><center>$5$</center></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$5$</center></td>
<td><center>['mayo']</center></td>
<td><center>$0$</center></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$5$</center></td>
<td><center>['jelly']</center></td>
<td><center>$9$</center></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$5$</center></td>
<td><center>['mayo', 'jelly']</center></td>
<td><center>$0$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$5$</center></td>
<td><center>[]</center></td>
<td><center>$0$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$5$</center></td>
<td><center>['mayo']</center></td>
<td><center>$0$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$5$</center></td>
<td><center>['jelly']</center></td>
<td><center>$0$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$5$</center></td>
<td><center>['mayo', 'jelly']</center></td>
<td><center>$0$</center></td>
</tr>
</table>
</center>
<p><br /></p>
<p>As you can see here, we have numeric values for beef and peanut butter because the number of slices of beef and tablespoons of peanut butter can vary. But we don’t have a number to plug in the condiments variable. Sometimes there is mayo or jelly in the sandwich, or there isn’t. That is why they aren’t represented by a number but rather by name.</p>
<p>The way you interpret these categorical variables is true or false, which can be represented numerically as a $1$ or a $0.$ So we merely break down this condiments variable into a mayo variable and a jelly variable, each of which is a $1$ if it appears on the sandwich or a $0$ if it doesn’t.</p>
<center>
<table style="width:80%">
<tr>
<td width="20%"><b><center>beef</center></b></td>
<td width="20%"><b><center>pb</center></b></td>
<td width="20%"><b><center>mayo</center></b></td>
<td width="20%"><b><center>jelly</center></b></td>
<td width="20%"><b><center>rating</center></b></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>$1$</center></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>$1$</center></td>
<td><center>$0$</center></td>
<td><center>$1$</center></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>$1$</center></td>
<td><center>$4$</center></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>$1$</center></td>
<td><center>$1$</center></td>
<td><center>$0$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>$4$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$0$</center></td>
<td><center>$1$</center></td>
<td><center>$0$</center></td>
<td><center>$8$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>$1$</center></td>
<td><center>$1$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$0$</center></td>
<td><center>$1$</center></td>
<td><center>$1$</center></td>
<td><center>$0$</center></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$5$</center></td>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>$5$</center></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$5$</center></td>
<td><center>$1$</center></td>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$5$</center></td>
<td><center>$0$</center></td>
<td><center>$1$</center></td>
<td><center>$9$</center></td>
</tr>
<tr>
<td><center>$0$</center></td>
<td><center>$5$</center></td>
<td><center>$1$</center></td>
<td><center>$1$</center></td>
<td><center>$0$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$5$</center></td>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$5$</center></td>
<td><center>$1$</center></td>
<td><center>$0$</center></td>
<td><center>$0$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$5$</center></td>
<td><center>$0$</center></td>
<td><center>$1$</center></td>
<td><center>$0$</center></td>
</tr>
<tr>
<td><center>$5$</center></td>
<td><center>$5$</center></td>
<td><center>$1$</center></td>
<td><center>$1$</center></td>
<td><center>$0$</center></td>
</tr>
</table>
</center>
<p><br /></p>
<p>Now, our data consists of all numeric values, and we can plug it into a regressor.</p>
<h2>Interactions Between Variables</h2>
<p>Interaction terms are additional terms added to regression models to account for when two variables together have an effect that is different from each of them in isolation.</p>
<p>An example of this when you are trying to fit a model that predicts a the rating of a sandwich with beef or peanut butter. Without interaction terms, our model would be</p>
<center>
$\begin{align*}
y = \beta_0+\beta_1(\textrm{beef})+\beta_1(\textrm{pb})
\end{align*}$
</center>
<p><br /></p>
<p>where $\beta_1$ and $\beta_2$ are coefficients that our regression algorithm calculates.</p>
<p>Without interaction, the model will say “Well, beef is good, so I’ll give that a positive rating, and peanut butter is good, so I’ll give that a positive rating too.” Then we get to a sandwich with beef and peanut butter, and because there is no interaction, it says “Wow, beef is good, and peanut butter is good so that sandwich will be the best sandwich ever.” Which we both know is not valid.</p>
<p>When we have a sandwich with peanut butter and beef, we need an interaction term that will recognize that because both peanut butter and beef are present, it is a bad sandwich. The way we do this is by adding a term to our model:</p>
<center>
$\begin{align*}
y = \beta_0+\beta_1(\textrm{beef})+\beta_2(\textrm{pb})+\beta_3(\textrm{beef})(\textrm{pb})
\end{align*}$
</center>
<p><br /></p>
<p>When either beef or peanut butter is $0,$ the interaction term will be $0$ and will not influence the rating. However, if we have beef and peanut butter, the interaction term will be able to bring down the rating by a lot.</p>
<h2>Fitting NonLinear Data with a Linear Regression</h2>
<p>An exciting property of linear regressions is that many nonlinear models (such as polynomials) can be fit using linear regression. The trick is to use linear regression to solve for the coefficient of a <i>function</i> of $x$ like seen here:</p>
<center>
$\begin{align*}
y =\beta_0 + \beta_1f_1(x_1) + \beta_2f_2(x_2)+ \ldots +\beta_nf_n(x_n) + \beta_kf_k(x_k)
\end{align*}$
</center>
<p><br /></p>
<p>These $f_n(x)$ can be any function of x such as $x^2,x^3, \ldots, x^n$ which is how we fit polynomials. We can even fit more complex functions like</p>
<center>
$\begin{align*}
y = \beta_1\sin(x) + \beta_2\ln(x) + \beta_3\sqrt{x}.
\end{align*}$
</center>
<p><br /></p>
<p>The only constraint is that each term of the function must be of the form $\beta f(x)$. So, we could not regress $y =x^a$ because that would require fitting an exponent, not a coefficient. Linear regression can only solve for the coefficients of a model.</p>
<p>The way we fit a regression of the form</p>
<center>
$\begin{align*}
y =\beta_0+\beta_1f_1(x_1) + \beta_2f_2(x_2)+...+\beta_nf_n(x_n)
\end{align*}$
</center>
<p><br /></p>
<p>is we transform the data using the functions in the model, like so:</p>
<center>
$\begin{align*}
(x_1,x_2,\ldots,x_n,y) \rightarrow (f_1(x_1), f_2(x_2), \ldots , f_n(x_n), y)
\end{align*}$
</center>
<p><br /></p>
<p>The model is a linear function of the transformed data points. So by fitting the altered data to the model, we are solving a linear regression. And when we plug the coefficients back into the model, we will get a function that is the best fit for the original data.</p>
<p>An important consequence of this is that we can fit a logistic regression using the linear regression algorithm. This is possible because the format of the logistic regression is</p>
<center>
$\begin{align*}
y = \dfrac{M}{1+e^{\beta_1x_1+\beta_2x_2+...+\beta_nx_n}}
\end{align*}$
</center>
<p><br /></p>
<p>which can be rearranged to</p>
<center>
$\begin{align*}
\beta_1x_1+\beta_2x_2+...+\beta_nx_n = \ln \left( \dfrac{M}{y}  1 \right)
\end{align*}$
</center>
<p><br /></p>
<p>which fits in our category of $\beta \cdot f(x).$</p>Riley PaddockNote: This post is part 3 of a 3part series: part 1, part 2, part 3.