Jekyll2024-01-14T14:12:53-08:00https://bensepanski.github.io/feed.xmlBenjamin SepanskiSSE at VeridiseBenjamin Sepanskiben.sepanski@veridise.comDeploying and Debugging Remotely with Intellij2021-10-13T00:00:00-07:002021-10-13T00:00:00-07:00https://bensepanski.github.io/posts/2021/10/13/Deploying-and-Debugging-Remotely-with-Intellij<p>In this blog post I will show you how to sync an <a href="https://www.jetbrains.com/idea/">Intellij IDEA</a> project with a remote server (called <em>deploying</em> the project to a server), and how to debug remote runs of that project.</p>
<p><strong><em>EDIT Jan 3, 2022</em></strong><em>: Note that this process is only available for the Intellij IDEA Ultimate edition, not the Community edition.</em></p>
<h2 id="related-guides">Related Guides</h2>
<p>Intellij has its own guides on these topics, check them out here:</p>
<ul>
<li>Intellij <a href="https://www.jetbrains.com/help/idea/tutorial-deployment-in-product.html#before">Deployment Guide</a></li>
<li>Intellij <a href="https://www.jetbrains.com/help/idea/tutorial-remote-debug.html">Remote Debug Guide</a></li>
</ul>
<h2 id="introduction">Introduction</h2>
<p><a href="https://www.jetbrains.com/idea/">Intellij IDEA</a> is an incredibly powerful IDE. If you’re anything like me, it’s become an essential component of any Java program you write. However, compiling and running large applications on my laptop gets frustratingly slow. Since I have access to bigger and better machines, I want to compile and run on those remote servers. However, I need several things before this actually improves my workflow:</p>
<ol>
<li>Keeping the local and remote in sync should be easy <em>without</em> using git</li>
<li>Debugging applications running on the remote should be push-button</li>
<li>All file-editing should be performed locally (I want the power of Intellij without having to set up X-forwarding, etc.)</li>
</ol>
<p>(1) and (3) can be achieving using <a href="#deployment"><em>deployment</em></a>: setting up a remote clone of a project that Intellij syncs in the background. (2) can be achieved using <a href="#remote-debug">remote debug</a>.
In the rest of the blog, I’ll show you how to set this up using an example project.</p>
<h2 id="setup">Setup</h2>
<p>I’m running Intellij 2021.2.1. Any recent version of Intellij should work. You’ll need to first set up SSH for your remote server, which I’ll call <code class="language-plaintext highlighter-rouge">remote</code>. For instance, you should be able to successfully run</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ssh remoteUserName@remote
</code></pre></div></div>
<p>I’ll assume that your SSH key is located in <code class="language-plaintext highlighter-rouge">~/.ssh/id_rsa</code>. Password-based authentication is a similar process.</p>
<p>For this example, we’ll start by making a new Java project using the <a href="https://maven.apache.org/">Maven</a> build system.</p>
<p><img src="/files/posts/2021-10-13-Deploying-and-Debugging-Remotely-with-Intellij/01-Intellij-Create-Project.png" alt="" /></p>
<p><img src="/files/posts/2021-10-13-Deploying-and-Debugging-Remotely-with-Intellij/02-Maven.png" alt="" /></p>
<p>Now we’re ready to set up for deployment!</p>
<h2 id="deployment">Deployment</h2>
<p>First, open the deployment configuration by going to <code class="language-plaintext highlighter-rouge">Tools</code> <code class="language-plaintext highlighter-rouge">></code> <code class="language-plaintext highlighter-rouge">Deployment</code> <code class="language-plaintext highlighter-rouge">></code> <code class="language-plaintext highlighter-rouge">Configuration</code></p>
<p><img src="/files/posts/2021-10-13-Deploying-and-Debugging-Remotely-with-Intellij/03-Configuration-Menu.png" alt="" /></p>
<p>Click <code class="language-plaintext highlighter-rouge">+</code>, and add an SFTP server. Choose whatever name you want, it doesn’t matter.</p>
<p><img src="/files/posts/2021-10-13-Deploying-and-Debugging-Remotely-with-Intellij/04-Add-SFTP.png" alt="" /></p>
<p>If you already have an SSH configuration setup on Intellij for your desired server, go ahead and select it. Otherwise, let’s set one up!
Click the three dots next to SSH configuration to get started:</p>
<p><img src="/files/posts/2021-10-13-Deploying-and-Debugging-Remotely-with-Intellij/05-Three-Dots.png" alt="" /></p>
<p>Enter the host, your remote username, and select your authentication type. I’m going to assume you’re using a password-protected private key in <code class="language-plaintext highlighter-rouge">~/.ssh/id_rsa</code>. Only change the port from 22 (or set the local port) if you know what you’re doing!</p>
<p>Once you’re done, press “Test Connection” to make sure it works.</p>
<p><img src="/files/posts/2021-10-13-Deploying-and-Debugging-Remotely-with-Intellij/06-SSH-Config.png" alt="" /></p>
<p>You can set the “root” directory if you wish. This sets what Intellij perceives as the root
directory of the remote server (not the root directory of your remote project, we’ll set that later).
If you do set the root, just remember that the file mappings are relative to the root you set.</p>
<p>Once you’re done, press OK and make sure your remote is in bold on the left menu. If it is not, select it and press the check mark to make it the default configuration.</p>
<p><img src="/files/posts/2021-10-13-Deploying-and-Debugging-Remotely-with-Intellij/07-Set-Default.png" alt="" /></p>
<p>Finally, we need to set up the file mappings.
On your remote, pick some path where you want your remote to be stored. I’m going to use <code class="language-plaintext highlighter-rouge">~/intellijRemotes/<projectName></code>.
Create that directory.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>myRemoteUserName@remote> <span class="nb">mkdir</span> ~/intellijRemotes/IntellijRemoteExample
</code></pre></div></div>
<p>Click on “Mappings”, and copy the path to the deployed project on your remote.</p>
<p><img src="/files/posts/2021-10-13-Deploying-and-Debugging-Remotely-with-Intellij/08-Mapping.png" alt="" /></p>
<p>Press OK, and now you’re good to go!
What exactly does that mean?</p>
<ul>
<li>Any file you save locally will be automatically uploaded to your remote.</li>
<li>You can upload, download, or sync any file or directory in you project by
<ol>
<li>Right-clicking the file or directory</li>
<li>Clicking “deployment”</li>
<li>Selecting either upload, download, or sync</li>
</ol>
</li>
</ul>
<p>Look over some options by going to <code class="language-plaintext highlighter-rouge">Tool</code> <code class="language-plaintext highlighter-rouge">></code> <code class="language-plaintext highlighter-rouge">Deployment</code> <code class="language-plaintext highlighter-rouge">></code> <code class="language-plaintext highlighter-rouge">Options</code></p>
<p><img src="/files/posts/2021-10-13-Deploying-and-Debugging-Remotely-with-Intellij/09-Options.png" alt="" /></p>
<ul class="task-list">
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Delete target items when source ones do not exist
<ul>
<li>This is useful to avoid confusing errors where you’ve deleted a file locally, but Intellij does not delete the remote copy. I’d recommend setting this as long as you’re not using the remote to backup files.</li>
</ul>
</li>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" checked="checked" />Create empty directories
<ul>
<li>Have Intellij create empty directories when you upload. This is helpful if you’re outputting things to an empty directory and want it to be created on the remote when you create it locally.</li>
</ul>
</li>
</ul>
<p>You can also exclude items by names/patterns at this menu. Another place you can exclude specific paths for specific remotes is by clicking <code class="language-plaintext highlighter-rouge">Tools</code> <code class="language-plaintext highlighter-rouge">></code> <code class="language-plaintext highlighter-rouge">Deployment</code> <code class="language-plaintext highlighter-rouge">></code> <code class="language-plaintext highlighter-rouge">Configuration</code> and selecting “Excluded Paths”.</p>
<p>Note that you can create multiple remotes by repeating this process! Intellij only automatically uploads changes to the default. All other uploads, downloads, and syncs have to be manual.</p>
<h2 id="remote-debug">Remote Debug</h2>
<p>Intellij’s debugger is one of its most powerful features. There’s no reason you should lose that just because you want to run your program remotely.</p>
<p>First, we’re going to build a <a href="https://www.jetbrains.com/help/idea/run-debug-configuration.html">configuration</a> that will help us connect to our remote application. Start by clicking “Add configuration.”</p>
<p><img src="/files/posts/2021-10-13-Deploying-and-Debugging-Remotely-with-Intellij/10-Configuration.png" alt="" /></p>
<p>Click the <code class="language-plaintext highlighter-rouge">+</code> on the top left, and select “Remote JVM Debug”.</p>
<p><img src="/files/posts/2021-10-13-Deploying-and-Debugging-Remotely-with-Intellij/11-Remote-JVM-Debug.png" alt="" /></p>
<p>Name the configuration whatever you want. Enter the host name and whatever port you want to use to connect. If you have several maven projects/sub-projects, make sure to select the correct module classpath!</p>
<p>I usually use port 8000, but all that matters is that TCP connections can be made from your local IP address to your remote at that port. If you have issues, you can use <a href="https://www.acronis.com/en-us/articles/telnet/">this guide</a> to figure out which ports are open.</p>
<p><img src="/files/posts/2021-10-13-Deploying-and-Debugging-Remotely-with-Intellij/12-Debug-Setup.png" alt="" /></p>
<p>Next, you’ll want to copy the “Command line arguments for remote JVM.” You’re going to need these arguments later.</p>
<p><img src="/files/posts/2021-10-13-Deploying-and-Debugging-Remotely-with-Intellij/13-JVM-Args.png" alt="" /></p>
<p>Once you’re done, press “OK”.</p>
<p>Now, SSH into your remote, and run your application using the arguments you copied.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> myUserName@localhost> ssh myRemoteUserName@remote
myRemoteUserName@Remote> java \
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000 \
-jar myApp.jar
</code></pre></div></div>
<p>You should see this as output:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Listening for transport dt_socket at address: 8000
</code></pre></div></div>
<p>Now go back to the local Intellij instance and run the configuration you just created! It should connect and start debugging like normal.</p>
<p>Note that all terminal output from your application will appear on your remote terminal, not on the local Intellij terminal. However, the Debugger tab will work as usual.</p>
<h2 id="common-pitfalls">Common Pitfalls</h2>
<p>I hope this tutorial was helpful for you! Before I let you go, I’d like to warn you of a couple pitfalls that I commonly ran into when I first started using this setup.</p>
<ol>
<li>Intellij only syncs changes made by saving/deleting files. If you switch git branches, you’ll need to manually sync with the remote.</li>
<li>Sometimes, the Intellij remote debug application continues running after the remote application has stopped. Make sure to manually stop it locally to avoid processes clogging up your remote machine. If necessary, use <code class="language-plaintext highlighter-rouge">htop</code> on the remote to check for and kill these processes.</li>
<li>If you use multiple remote deployments, only one can be the default at a time. You’ll have to manually sync other changes.</li>
<li>If you click <code class="language-plaintext highlighter-rouge">Tools</code> <code class="language-plaintext highlighter-rouge">></code> <code class="language-plaintext highlighter-rouge">Deployment</code> <code class="language-plaintext highlighter-rouge">></code> <code class="language-plaintext highlighter-rouge">Upload to ...</code> (resp. <code class="language-plaintext highlighter-rouge">Download</code>, <code class="language-plaintext highlighter-rouge">Sync</code>) it will only <code class="language-plaintext highlighter-rouge">Upload</code> (resp. <code class="language-plaintext highlighter-rouge">Download</code>, <code class="language-plaintext highlighter-rouge">Sync</code>) the file which is currently open. To <code class="language-plaintext highlighter-rouge">Upload</code> (resp. <code class="language-plaintext highlighter-rouge">Download</code>, <code class="language-plaintext highlighter-rouge">Sync</code>) the entire project, you need to right-click on the directory from the <code class="language-plaintext highlighter-rouge">Project</code> tab.</li>
</ol>Benjamin Sepanskiben.sepanski@veridise.comIn this blog post I will show you how to sync an Intellij IDEA project with a remote server (called deploying the project to a server), and how to debug remote runs of that project.Denali: A Goal-directed Superoptimizer2021-03-10T00:00:00-08:002021-03-10T00:00:00-08:00https://bensepanski.github.io/posts/2021/03/10/Denali-a-goal-directed-superoptimizer<p>This blog post was written for <a href="https://www.cs.utexas.edu/~bornholt/">Dr. James Bornholt</a>’s <a href="https://www.cs.utexas.edu/~bornholt/courses/cs395t-21sp/">CS 395T: Systems Verification and Synthesis, Spring 2021</a>. It summarizes the context and contributions of the paper <a href="https://dl.acm.org/doi/10.1145/543552.512566">Denali: A Goal-directed Superoptimizer</a>, written by <a href="https://rjoshi.org/bio/index.html">Dr. Rajeev Joshi</a>, <a href="https://en.wikipedia.org/wiki/Greg_Nelson_(computer_scientist)">Dr. Greg Nelson</a>, and <a href="http://people.csail.mit.edu/randall/">Dr. Keith Randall</a>.</p>
<p>None of the technical ideas discussed in this blog are my own, they are summaries/explanations based on the referenced works.</p>
<h1 id="denali-a-goal-directed-superoptimizer">Denali: A Goal-directed Superoptimizer</h1>
<p>Tdoday’s paper is <a href="https://dl.acm.org/doi/10.1145/543552.512566">Denali: A Goal-directed Superoptimizer</a>. At the time of its publication (2002), it was one of the first <strong>superoptimizer</strong>s: a code generator which seeks to find truly optimal code. This is a dramatically different approach from traditional compiler optimizations, and is usually specific to efficiency-critical straight-line kernels written at the assembly level.</p>
<h2 id="background">Background</h2>
<h3 id="what-is-superoptimization">What <em>is</em> Super🦸optimization?</h3>
<p>Plenty of compilers are <em>optimizing compilers</em>. However, in the strictest sense of the word, they don’t really find an <em>optimal</em> translation. They just find one that, according to some heuristics, ought to improve upon a naive translation. Why? Finding optimal translations is, in general, undecidable. Even for simplified, decidable versions of the problem, it is prohibitively time consuming to insert into any mortal programmer’s build-run-debug development cycle.</p>
<p>However, sometimes it is worth the effort to find a <em>truly optimal</em> solution. To disambiguate between these two “optimization” procedures, we use the term <em>superoptimization</em> when we are seeking a “truly optimal” solution. Superoptimization is an offline procedure and typically targets straight-line sequences of machine code inside critical loops.</p>
<p>With a few simplifying assumptions, the shortest straight-line code is the fastest. Consequently, we seek the shortest program.</p>
<h3 id="superoptimization-the-pre-denali-era-beginning-of-time--2002">Super🦸optimization: The Pre-Denali Era (Beginning of Time – 2002)</h3>
<p><a href="https://en.wikipedia.org/wiki/Alexia_Massalin">Alexia Massalin</a> coined the term “<a href="https://en.wikipedia.org/wiki/Superoptimization">superoptimization</a>” in her 1987 paper <a href="https://dl.acm.org/doi/10.1145/36177.36194">Superoptimizer – A look at the Smallest Program</a>. Massalin used a (pruned) exhaustive search to find the shortest implementation of various straight line computations in the <a href="https://en.wikipedia.org/wiki/Motorola_68000#Instruction_set_details">68000 instruction set</a>. For instance, she found the shortest programs to compute the signum function, absolute value, max/min, and others. Her goal was to identify unintuitive idioms in these shortest programs so that performance engineers could use them in practice.</p>
<p>While Massalin’s technique was powerful, it did not scale well (the shortest programs were at most 13 instructions long in Massalin’s paper). Moreover, the output programs were not automatically verified to be equivalent to the input programs. They are instead highly likely to be equivalent, and must be verified by hand.</p>
<p>Granlund \& Kenner <a href="https://dl.acm.org/doi/10.1145/143103.143146">followed up on Massalin’s work</a> in 1992 with the <a href="https://www.gnu.org/software/superopt/">GNU Superoptimizer</a>. They integrated a variation of Massalin’s superoptimizer into GCC to eliminate branching.</p>
<p>Until 2002, research in superoptimizers seemed to stall. Judging by citations during that period, most researchers considered Massalin’s work to fit inside the field of optimizing compilers. These researchers viewed superoptimization as a useful engineering tool, but of little theoretical interest or scalability. Rather, superoptimization was seen as an interesting application of brute-force search. Massalin and the GNU Superoptimizer seemed to become a token citation in the optimizing compiler literature.</p>
<h2 id="the-denali-approach">The Denali Approach</h2>
<h3 id="goal-directed-vs-brute-force">Goal-Directed vs. Brute Force</h3>
<p>Massalin’s superoptimizer relies on brute-force search: enumerate candidate programs until you find the desired program. Given the massive size of any modern instruction set, this does not scale well. However, since we want the shortest program, we have to rely on some kind of brute-force search. Denali’s insight is that Massalin’s search algorithm was enumerating <em>all</em> candidate programs, instead of only enumerating <em>relevant</em> candidate programs.</p>
<p>Denali users specify their desired program as a set of (memory location, expression to evaluate) pairs. For instance, (<em>%rdi</em>, <em>2 * %rdi</em>) is the program which doubles the value of <em>%rdi</em>.</p>
<p>Denali’s algorithm only “enumerates” candidate programs which it can prove are equivalent to the desired program. For efficiency, it stores this enumeration in a compact graph structure called an <a href="#e-graphs">E-Graph</a>, then searches the E-Graph using a SAT solver.</p>
<h3 id="e-graphs">E-Graphs</h3>
<h4 id="what-is-an-e-graph">What is an E-Graph?</h4>
<p>An E-Graph is used to represent expressions. For instance, a literal 4 or a register value <em>%rdi</em> is represented as a node with no children.</p>
<p><img src="/files/posts/2021-03-10-Denali-a-goal-directed-superoptimizer/egraph-rdi-and-4.svg?token=AIEHSZA5TESJ3BC6LBD6FELAGFZ3O" alt="" /></p>
<p>The expression <em>%rdi * 4</em> is represented as a node ‘<em>*</em>’ whose first child represents <em>%rdi</em> and whose second child represents 4.</p>
<p><img src="/files/posts/2021-03-10-Denali-a-goal-directed-superoptimizer/egraph-rdi-times-4.svg?token=AIEHSZACZTJLTY7GVZSQQUDAGF2EM" alt="" /></p>
<p>Bigger expressions are represented just like you would think. For instance, the expression <em>%rdi * 4 + 1</em> would be represented as</p>
<p><img src="/files/posts/2021-03-10-Denali-a-goal-directed-superoptimizer/egraph-rdi-times-4-plus-1.svg?token=AIEHSZH4LPUKAMS5CIODF5TAGF3AO" alt="" /></p>
<p>So far, this just looks like an <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax Tree</a>. E-Graphs are distinguished from ASTs by the ability to represent <strong>multiple equivalent expressions</strong>. Suppose we wish to add the equivalence 4<span style="color:blue">=</span>2**2 to our E-graph. We do this by adding a special <em><span style="color:blue">equivalence edge</span></em></p>
<p><img src="/files/posts/2021-03-10-Denali-a-goal-directed-superoptimizer/egraph-rdi-times-4-plus-1-with-exp.svg?token=AIEHSZB7XJBWPFALUQVHFJDAGF4SC" alt="" /></p>
<p>Since there is no machine exponentiation instruction, this does not look useful at first. However, now we can add a further <span style="color:blue">equivalence edge</span> based on the fact that <em>%rdi « 2 <span style="color:blue">=</span> %rdi * 2**2 <span style="color:blue">=</span> %rdi * * 4</em>.</p>
<p><img src="/files/posts/2021-03-10-Denali-a-goal-directed-superoptimizer/egraph-rdi-times-4-plus1-with-shift.svg?token=AIEHSZBXN3ND7R2LHKI3VYTAGF6EU" alt="" /></p>
<p>Since E-Graphs represent <strong>A=B</strong> by keeping both <strong>A</strong> and <strong>B</strong> around, they can become quite massive.</p>
<h4 id="how-do-we-build-e-graphs">How do we build E-Graphs?</h4>
<p>We can use proof rules to repeatedly grow the E-Graph and/or add <span style="color:blue">equivalence edge</span>s. If we keep applying our proof rules until our graph stops changing, then we’ve deduced all the ways we can provably compute our expression (relative to our proof rules). For instance, in the previous example we had only three proof rules:</p>
<ol>
<li>4 = 2**2</li>
<li>x * 2**n = x « n</li>
<li>If a = b, and b = c, then a = c</li>
</ol>
<p>If we add more proof rules, we may be able to deduce faster ways to compute our expression.</p>
<h4 id="other-uses-of-e-graphs">Other uses of E-Graphs</h4>
<p>An early variant of E-Graphs is described in Greg Nelson’s (one of the Denali authors) <a href="http://scottmcpeak.com/nelson-verification.pdf">Ph.D. Thesis</a>. These were used by Nelson in the automated theorem prover <a href="https://www.hpl.hp.com/techreports/2003/HPL-2003-148.pdf">Simplify</a> for equational reasoning. Since then, search over E-graphs via the <a href="https://people.eecs.berkeley.edu/~necula/Papers/NelsonOppenCong.pdf">congruence closure</a> algorithm is used by many modern SMT solveres for reasoning about equality of uninterpreted functions (it is even taught in <a href="https://www.cs.utexas.edu/~isil/">Dr. Isil Dillig</a>’s <a href="https://www.cs.utexas.edu/~isil/cs389L/lecture11-6up.pdf">CS 389L course</a> here at UT!). For example, the <a href="https://github.com/Z3Prover/z3">Z3 SMT solver</a> <a href="https://github.com/Z3Prover/z3/blob/830f314a3f32ce896fcf93fd40666d4a390fc330/src/ast/euf/euf_enode.h#L30">implements an E-graph</a>, and the <a href="https://github.com/CVC4/CVC4">CVC4 Solver</a> <a href="https://github.com/CVC4/CVC4/blob/e4fd524b02054a3ac9724f184e55a983cb6cb6b9/src/theory/uf/equality_engine.h#L50">implements an incremental congruence closure</a>.</p>
<h3 id="search-over-e-graphs">Search over E-Graphs</h3>
<p>Nodes in an E-Graph that are connected by an <span style="color:blue">equivalence edge</span> represent expressions that are equivalent according to the proof rules. Therefore, we only need to evaluate one of the nodes. Denali can use a SAT solver to figure out the optimal choice of nodes. Their encoding is not too complicated.</p>
<p>The basic idea of the encoding is as follows:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>For each machine instruction node T,
L(i, T) = { 1 T starts executing on cycle i
{ 0 otherwise
</code></pre></div></div>
<p>Then, all we have to do is add constraints so that</p>
<ul>
<li>Exactly one instruction starts executing per cycle.</li>
<li>Each instruction’s arguments are available when the instruction gets executed.</li>
<li>Some node equivalent to the root node gets computed.</li>
</ul>
<p>Now we can find the shortest program encoded in our E-Graph by constraining the SAT solver to look for a program of length 1, then length 2, then length 3, …. until we find a solution.</p>
<h2 id="impact-of-the-denali-superoptimizer">Impact of the Denali Superoptimizer</h2>
<h3 id="preliminary-results">Preliminary Results</h3>
<p>The Denali paper presents several preliminary results. For the <a href="https://en.wikipedia.org/wiki/DEC_Alpha#:~:text=Alpha%2C%20originally%20known%20as%20Alpha,set%20computer%20(CISC)%20ISA.">Alpha instruction set architecture</a>, they are able to generate some programs of length up to 31 instructions. For comparison, the GNU superoptimizer is unable to generate (near) optimal instructions sequences of length greater than 5.</p>
<p>However, in addition to Denali’s built-in architectural axioms, the programmers specify program-specific axioms in their examples. This trades off automation for the ability to generate longer (near) optimal instruction sequences.</p>
<h3 id="superoptimization-the-post-denali-era-2002--present-day">Super🦸optimization: The Post-Denali Era (2002 – Present Day)</h3>
<p>Denali demonstrated that, for small programs, it is possible to generate provably equivalent, (nearly) optimal code. Since then, there has been a lot of interest in superoptimization. Here are some projects/papers that have popped up since Denali.</p>
<ul>
<li><a href="https://arxiv.org/pdf/1711.04422.pdf">Souper</a> is an <a href="https://github.com/google/souper">open-source</a> project that extracts straight-line code from the LLVM IR and applies superoptimization. It uses caching so that it can be run online (2017).
<ul>
<li>SMT-based goal-directed search</li>
<li>Maintained by <a href="https://www.google.com">Google</a>, jointly developed by researchers at Google, <a href="https://www.nvidia.com/en-us/">NVIDIA</a>, <a href="https://www.tno.nl/en/">TNO</a>, <a href="https://www.microsoft.com/en-us/">Microsoft</a>, <a href="https://www.ses.com/">SES</a>, and the University of Utah.</li>
</ul>
</li>
<li><a href="https://github.com/KTH/slumps">slumps</a> is based on souper and targets web assembly (2020).</li>
<li><a href="http://stoke.stanford.edu/">STOKE</a> is a superoptimizer for x86 (2013) (which we will be <a href="https://www.cs.utexas.edu/~bornholt/courses/cs395t-21sp/schedule/">reading on March 22</a> after spring break)
<ul>
<li>Uses stochastic enumerative search.</li>
<li>Is still maintained and open source at <a href="https://github.com/StanfordPL/stoke">StanfordPL/stoke</a>.</li>
</ul>
</li>
<li><a href="https://www.embecosm.com/about/">embecosm</a>, a compiler research group, is developing <a href="https://www.embecosm.com/services/superoptimization/#Related-Services">GSO 2.0</a> (2015)</li>
<li>There has been research into <a href="https://theory.stanford.edu/~aiken/publications/papers/asplos06.pdf">automatic peephole optimizations</a> (2006).</li>
</ul>
<p>However, while there is active industry and research interest in the <em>problem</em> that Denali presented (finding a provably equivalent, near-optimal translation), most modern approaches (e.g. souper) rely on SMT-based synthesis techniques. Denali’s methods of superoptimization seem to have largely fallen by the wayside. Part of this is because Denali’s provably (almost) optimal program relies on a set of user-specified axioms, and is only optimal with respect to those axioms. Part of the appeal of an SMT solver is standardized theories for certain objects and operations.</p>
<p>Both enumerative search (e.g. STOKE) and goal-directed search (e.g. souper) are used today. In addition, Denali’s general notion of specification (a set of symbolic input-output pairs) is still used, with various project-specific modifications. Projects still rely on (often hand-written) heuristics to measure the cost/cycle-count of candidate programs.</p>
<h1 id="discussion-questions">Discussion Questions</h1>
<ul>
<li>As computation times are increasingly memory-bound, does superoptimization run into concerns with Amdahl’s law?</li>
<li>SMT solvers are powerful tools, but incredibly general-purpose. What types of computations are likely to be compute-bound, and can we use that domain-specific knowledge to make superoptimization faster?</li>
<li>Superoptimization seems naturally related to <a href="https://en.wikipedia.org/wiki/Automatic_vectorization">automated vectorization</a>. However, people seem to treat the two problems as separate. Is there any reason automated vectorization might make superoptimization much more difficult?</li>
<li><a href="https://github.com/flame/blis">BLIS</a> is a framework for creating <a href="http://www.netlib.org/blas/">BLAS</a> implementations for new architectures by only implementing small, finite computations kernels. Can BLIS be combined with superoptimization to automatically generate BLAS libraries for specific architectures?</li>
</ul>
<h1 id="references">References</h1>
<p>All graphs were built using <a href="https://graphviz.org/">graphviz</a>. The example E-Graph in section <a href="#What-is-an-E-Graph">What is an E-Graph?</a> is based on the example in today’s paper.</p>Benjamin Sepanskiben.sepanski@veridise.comThis blog post was written for Dr. James Bornholt’s CS 395T: Systems Verification and Synthesis, Spring 2021. It summarizes the context and contributions of the paper Denali: A Goal-directed Superoptimizer, written by Dr. Rajeev Joshi, Dr. Greg Nelson, and Dr. Keith Randall.