<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>Perpetually Curious Blog</title>
        <link>https://lambdafoo.com</link>
        <description><![CDATA[Personal opinions on technology,functional programming and various systems topics.]]></description>
        <atom:link href="https://lambdafoo.com/rss.xml" rel="self"
                   type="application/rss+xml" />
        <lastBuildDate>Wed, 25 Mar 2026 00:00:00 UT</lastBuildDate>
        <item>
    <title>Quick Hardware Performance Counters on macOS ARM64</title>
    <link>https://lambdafoo.com/posts/2026-03-25-mperf-hardware-counters-macos.html</link>
    <description><![CDATA[<div class="post">
  <h1 class="post-title">Quick Hardware Performance Counters on macOS ARM64</h1>
  <span class="post-date">March 25, 2026</span>
  <p>If you’ve ever profiled OCaml programs on Linux, you’ve probably reached for <code>perf stat</code>. It’s the go-to tool for grabbing hardware performance counters—cycles, instructions, cache misses—without any instrumentation overhead. On macOS, the equivalent story has been <strong>open Instruments</strong>, which is fine for GUI-driven investigation but terrible for automated benchmarking pipelines.</p>
<p>I wanted something I could stick in a shell script, get output as JSON, and run in the terminal. So I put together <a href="https://github.com/tmcgilchrist/mperf">mperf</a>, a <code>perf stat</code>-like CLI for Apple Silicon. Here is what it looks like:</p>
<pre><code>$ sudo ./mperf-stat -e cycles -e instructions -e l1d-tlb-misses -- ./my_benchmark

 Performance counter stats:

         1,234,567,890  cycles
         2,345,678,901  instructions                # 1.90 IPC
            12,345,678  l1d-tlb-misses

       0.543210 seconds wall time
       0.520000 seconds user
       0.020000 seconds sys</code></pre>
<h2 id="why-not-just-use-instruments">Why Not Just Use Instruments?</h2>
<p>Instruments is powerful, but it’s an interactive GUI tool. You can invoke it from the command line using <code>xctrace</code> but the results need the same GUI tool to view them. Sometimes you just need a simple cli tool that prints out the most interesting stats, in my case I want it invoked from a Makefile or a CI runner.</p>
<p>There’s also no good reason this should require full Xcode and Instruments. The hardware counters are right there in the CPU; the kernel exposes them through private frameworks. The only real requirement is root access, no need for disabling SIP or code signing or other special entitlements.</p>
<h2 id="using-apples-private-frameworks">Using Apple’s Private Frameworks</h2>
<p>Apple Silicon exposes hardware performance counters through two private frameworks: <code>kperf.framework</code> and <code>kperfdata.framework</code>, living under <code>/System/Library/PrivateFrameworks/</code>. These are the same frameworks that Instruments uses internally. They’re undocumented, but <a href="https://gist.github.com/ibireme/173517c208c7dc333ba962c1f0d67d12">ibireme’s kpc_demo</a> showed that you can load them at runtime with <code>dlsym</code> and drive them from userspace.</p>
<p>The CPU-specific event databases live in <code>/usr/share/kpep/</code> as plist files—<code>a14.plist</code> for M1, <code>a15.plist</code> for M2, <code>as4.plist</code> for M4, and so on. mperf provides portable aliases (<code>cycles</code>, <code>instructions</code>, <code>branch-misses</code>, <code>l1d-cache-misses</code>, etc.) that resolve to the right event names for whatever chip you’re running on. You can also pass raw event names if you want something specific.</p>
<p>Apple Silicon gives you 2 fixed counters (cycles and instructions) plus 8 configurable counters, for a maximum of 10 simultaneous events. Unlike Linux perf, mperf doesn’t do multiplexing — if you ask for more than 10 events, it’s an error rather than a silently degraded estimate. More on that distinction below.</p>
<h2 id="the-multi-threading-problem">The Multi-Threading Problem</h2>
<p>A simple approach would be to fork a child, start counting, wait for it to exit, read counters. That works for single-threaded programs, but OCaml 5.x programs with multiple domains spawn multiple pthreads—each domain gets a domain thread plus a backup thread for systhreads. A 4-domain program has at least 8 pthreads, and naive per-thread measurement would miss most of the work.</p>
<p>This is where Apple’s <strong>Profile Every Thread (PET)</strong> mechanism comes in. Instead of reading counters for a single thread, PET sets up a kernel timer that fires periodically (default: every 1ms) and snapshots PMC values for <em>every</em> thread matching a PID filter. These samples get written to a kernel trace buffer (<code>kdebug</code>) with thread IDs and timestamps.</p>
<p>The approach is:</p>
<ol type="1">
<li>Fork a child process, held at a pipe barrier</li>
<li>Configure the PMC hardware with requested events</li>
<li>Set up PET sampling filtered to the child’s PID</li>
<li>Enable kdebug tracing for <code>PERF_KPC_DATA_THREAD</code> events</li>
<li>Release the child (close the pipe), let it exec the target command</li>
<li>Poll kdebug for samples until the child exits</li>
<li>For each thread, compute the delta between first and last sample</li>
<li>Sum deltas across all threads</li>
</ol>
<pre><code>Thread 1: [sample_0] -------- [sample_1] -------- [sample_N]
Thread 2:      [sample_0] -------- [sample_1] -------- [sample_N]
Thread 3:           [sample_0] -------- [sample_1] -------- [sample_N]
...

Result = Σ (thread_last - thread_first) for all threads</code></pre>
<p>This is fundamentally a sampling-based approximation rather than continuous counting. But for benchmarks that run longer than a few milliseconds, the results are accurate enough to be useful. The comparison with Linux <code>perf stat</code> is more nuanced than “exact vs approximate” though.</p>
<h2 id="sampling-period-trade-offs">Sampling Period Trade-offs</h2>
<p>The <code>-p</code> flag controls the sampling period. The default 1ms works well for most OCaml programs since domains typically live for the program’s duration. For short-lived benchmarks you can go faster at the cost of more overhead by setting smaller values for <code>-p</code>.</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Default 1ms - good balance for most programs</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="fu">sudo</span> ./mperf-stat <span class="at">-e</span> cycles <span class="at">-e</span> instructions <span class="at">--</span> ./benchmark</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="co"># 0.5ms - catches short-lived threads, higher overhead</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="fu">sudo</span> ./mperf-stat <span class="at">-p</span> 0.5 <span class="at">-e</span> cycles <span class="at">-e</span> instructions <span class="at">--</span> ./short_benchmark</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="co"># 5ms - less overhead for long-running workloads</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a><span class="fu">sudo</span> ./mperf-stat <span class="at">-p</span> 5 <span class="at">-e</span> cycles <span class="at">-e</span> instructions <span class="at">--</span> ./long_benchmark</span></code></pre></div>
<h2 id="ocaml-bindings">OCaml Bindings</h2>
<p>Since the goal is integration with OCaml benchmarking services, mperf includes an OCaml library that wraps the CLI tool and parses its JSON output:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">open</span> Apple_perf_stat</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> () =</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>  <span class="kw">match</span> run</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>    ~events:[<span class="st">&quot;cycles&quot;</span>; <span class="st">&quot;instructions&quot;</span>; <span class="st">&quot;l1d-tlb-misses&quot;</span>; <span class="st">&quot;branch-misses&quot;</span>]</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>    [<span class="st">&quot;./my_benchmark&quot;</span>; <span class="st">&quot;--size&quot;</span>; <span class="st">&quot;10000&quot;</span>]</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>  <span class="kw">with</span></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>  | <span class="dt">Ok</span> <span class="dt">result</span> -&gt;</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Printf</span>.printf <span class="st">&quot;IPC: %.2f</span><span class="ch">\n</span><span class="st">&quot;</span> (get_ipc <span class="dt">result</span>);</span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Printf</span>.printf <span class="st">&quot;Wall time: %.3f s</span><span class="ch">\n</span><span class="st">&quot;</span> (wall_time_seconds <span class="dt">result</span>);</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">match</span> get_counter <span class="st">&quot;l1d-tlb-misses&quot;</span> <span class="dt">result</span> <span class="kw">with</span></span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a>     | <span class="dt">Some</span> misses -&gt; <span class="dt">Printf</span>.printf <span class="st">&quot;TLB misses: %Ld</span><span class="ch">\n</span><span class="st">&quot;</span> misses</span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a>     | <span class="dt">None</span> -&gt; ())</span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a>  | <span class="dt">Error</span> e -&gt;</span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Printf</span>.eprintf <span class="st">&quot;Error: %s</span><span class="ch">\n</span><span class="st">&quot;</span> (string_of_error e);</span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a>    <span class="dt">exit</span> <span class="dv">1</span></span></code></pre></div>
<p>The library has no external dependencies beyond <code>unix</code> and <code>str</code> from the OCaml standard library. It handles the root privilege check, argument construction, JSON parsing, and error reporting.</p>
<p>The idea is to use instruction counts as a stable performance metric. Wall-clock time varies with system load and thermal throttling; instruction counts are deterministic. Pairing that with IPC and cache miss rates gives you a pretty complete picture of where regressions come from. The limit of 10 counters on Apple Silicon is enough to get started.</p>
<h2 id="counting-accuracy">Counting Accuracy</h2>
<p>How accurate are the results from mperf? And how do they compare to Linux perf?</p>
<p>It’s tempting to describe this as “perf stat is exact, mperf is approximate” but that’s not the full picture. <code>perf stat</code> uses exact hardware counting <em>only when the requested events fit within the CPU’s physical PMU counters</em>. Once you exceed that limit, the kernel enables time-division multiplexing: it rotates events through the available counters via an <code>hrtimer</code> and scales the final counts:</p>
<pre><code>final_count = raw_count * (time_enabled / time_running)</code></pre>
<p>The <a href="https://perfwiki.github.io/main/tutorial/">perf wiki</a> is explicit about this: “It is very important to understand this is an estimate not an actual count. Depending on the workload, there will be blind spots which can introduce errors during scaling.” When multiplexing is active, <code>perf stat</code> shows a percentage column indicating what fraction of time each event was actually measured — anything below 100% means the count was extrapolated.</p>
<p>How many events before multiplexing kicks in depends on the hardware? It depends on how many PMU counters are exposed on the particular CPU. Here are a few examples from modern CPUs:</p>
<table>
<thead>
<tr>
<th>CPU</th>
<th>General-Purpose</th>
<th>Fixed</th>
<th>Total</th>
</tr>
</thead>
<tbody>
<tr>
<td>Intel Skylake (HT on)</td>
<td>4</td>
<td>3</td>
<td>7</td>
</tr>
<tr>
<td>Intel Ice Lake (HT off)</td>
<td>8</td>
<td>4</td>
<td>12</td>
</tr>
<tr>
<td>AMD Zen 3/4</td>
<td>6</td>
<td>0</td>
<td>6</td>
</tr>
<tr>
<td>ARM Cortex-A76 / Neoverse N1</td>
<td>6</td>
<td>1</td>
<td>7</td>
</tr>
<tr>
<td>Apple M1-M4</td>
<td>8</td>
<td>2</td>
<td>10</td>
</tr>
</tbody>
</table>
<p>On Linux you can check your own CPU with <code>dmesg</code>:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> dmesg <span class="kw">|</span> <span class="fu">grep</span> <span class="at">-E</span> <span class="st">&quot;generic registers|fixed-purpose events&quot;</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="ex">...</span> generic registers:      4</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="ex">...</span> fixed-purpose events:   3</span></code></pre></div>
<p>Note that the NMI watchdog steals one general-purpose counter by default on most distros (<code>cat /proc/sys/kernel/nmi_watchdog</code> — if it returns 1, you’ve lost a counter).</p>
<p>So in practice the comparison is:</p>
<ul>
<li><strong>perf stat with ~6 events on Skylake</strong>: exact counting, genuinely better than mperf’s PET approach.</li>
<li><strong>perf stat with 12+ events</strong>: time-multiplexed scaled estimates. Still different from PET sampling (perf counts every event while the counter is active, then extrapolates for inactive periods), but the result is still an estimate.</li>
<li><strong>mperf with up to 10 events</strong>: PET sampling-based approximation, but all counters are always physically active: no time-sharing, no scaling. The inaccuracy comes from sampling granularity (missing activity between snapshots), not from extrapolation.</li>
</ul>
<p>For the common case of measuring cycles + instructions + a handful of cache/TLB events (say 4-6 total), perf stat on Linux gives you exact counts and mperf gives you a close approximation. For larger event sets in perf, you hit the multiplexing case and might see unexpected results. Both tools are producing estimates through different mechanisms. mperf’s hard limit of 10 events means you always know whether your counts are real.</p>
<p>The other caveats: mperf relies on Apple’s private <code>kperf</code> and <code>kperfdata</code> frameworks, which could change with any macOS update (though they’ve been stable across M1 through M4). And very short-lived threads might be missed entirely if they complete within a single sampling period.</p>
<h2 id="getting-started">Getting Started</h2>
<div class="sourceCode" id="cb7"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="fu">git</span> clone https://github.com/tmcgilchrist/mperf</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="bu">cd</span> mperf</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="fu">make</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a><span class="fu">sudo</span> ./mperf-stat <span class="at">--</span> echo hello</span></code></pre></div>
<p>List all available events for your CPU with <code>./mperf-stat -l</code>.</p>
<p>The code is based on <a href="https://gist.github.com/ibireme/173517c208c7dc333ba962c1f0d67d12">ibireme’s kpc_demo.c</a>, with event documentation from <a href="https://github.com/jiegec/apple-pmu">jiegec/apple-pmu</a> and the <a href="https://developer.apple.com/documentation/apple-silicon/cpu-optimization-guide">Apple Silicon CPU Optimization Guide</a>.</p>
<h2 id="whats-next">What’s Next</h2>
<p>Porting this to x86_64 macOS, the PMC infrastructure should be similar. The more immediate goal is integrating mperf into an automated benchmarking pipeline for the OCaml compiler, tracking instruction counts and IPC across commits the way we’ve been doing with <a href="https://lambdafoo.com/posts/2025-02-24-ocaml-frame-pointers.html">frame pointers</a> and <a href="https://lambdafoo.com/posts/2025-02-15-ocaml-ebpf-usdt.html">eBPF</a> on Linux.</p>
<p>This fits into the broader effort of making OCaml programs easier to profile. Frame pointer support gives us <code>perf</code> and eBPF integration on Linux; mperf gives us hardware counter access on macOS without reaching for Instruments. Using frame pointers on macOS gives us accurate backtraces in Instruments when used with <code>xctrace</code>.</p>
</div>
]]></description>
    <pubDate>Wed, 25 Mar 2026 00:00:00 UT</pubDate>
    <guid>https://lambdafoo.com/posts/2026-03-25-mperf-hardware-counters-macos.html</guid>
    <dc:creator>Tim McGilchrist</dc:creator>
</item>
<item>
    <title>More CFI and frame pointers work</title>
    <link>https://lambdafoo.com/posts/2026-02-07-more-cfi-and-frame-pointers.html</link>
    <description><![CDATA[<div class="post">
  <h1 class="post-title">More CFI and frame pointers work</h1>
  <span class="post-date">February  7, 2026</span>
  <p>This month I set myself a goal to wrap up different pieces of work on OCaml that have been taking up space in my notes and my mind from last year. I wanted a clean slate to focus on the new year. I also enjoy reading other developers notes like <a href="https://www.dra27.uk/blog/platform/2026/01/16/dusting-off-the-branches.html">Opening up old release branches</a> and <a href="https://tromey.com/blog/?p=1111">Faster Faster GDB Startup</a>. So here is my minor contribution and we are jumping straight in!</p>
<h2 id="what-is-cfi">What is CFI?</h2>
<p>If you’ve ever tried to get a backtrace in GDB on Power and gotten garbage, that’s because there was no CFI to help the debugger unwind the call stack. Call Frame Information (CFI) is a part of the DWARF standard, it’s used for describing the call frames and register usage of a language. A call frame is a specific area of memory that is allocated on the stack, which typically corresponds to a function call in the source language. However some functions won’t allocate a call frame like leaf functions or tail-recursive functions. I have some ASCII stack frame diagrams later.</p>
<p>Within each call frame, the CPU saves a set of registers, which ones depend on the architecture. When returning from a function, those registers need to be restored. The ABI document spells out the details; for Power that’s the 64-Bit ELF V2 ABI Specification.</p>
<p>The code that allocates space on the call frame stack and performs the save operations is called the function prologue, and the corresponding code that performs the restore operation and deallocates the frame is called the epilogue.</p>
<p>The debugger uses CFI to:</p>
<ol type="1">
<li>Generate a backtrace by <em>unwinding</em> the function call stack.</li>
<li>Restore the state of registers when visiting previous functions higher up in the call stack.</li>
<li>Find registers that get spilled to the stack when they get overwritten within a function.</li>
</ol>
<p>For our purposes, we care about working backtraces and recovering key registers like the return address and frame pointer.</p>
<p>The DWARF specification introduces the term CFA (Canonical Frame Address), is a specific point in the stack frame that is used as the base address for finding everything else in the frame. For example, the Link Register representing the address to return to when exiting a function is stored at CFA+16 on Power. With this terminology out of the way let’s look at how to apply it to Power (if you want more details checkout the excellent book <a href="https://nostarch.com/building-a-debugger">Building a Debugger</a>).</p>
<h3 id="power-cfi">Power CFI</h3>
<p>When Power architecture support was added back into OCaml 5 in <a href="https://github.com/ocaml/ocaml/pull/12276">ocaml#12276</a> there was no support for Call Frame Information. Without CFI it is difficult to debug problems on Power and do performance engineering as you don’t have working backtraces. So last year I started work on adding DWARF Call Frame Information (CFI) directives to the Power backend so that debuggers like GDB can unwind the stack through OCaml frames, C-to-OCaml transitions, and runtime functions (GC, C calls, exceptions, effects etc).</p>
<p>For the Power architecture, the calling conventions are defined in the <a href="https://openpowerfoundation.org/specifications/64bitelfabi/">64-Bit ELF V2 ABI Specification: Power Architecture</a> document and specifically we’re interested in the section <em>2.2.2. The Stack Frame</em> which details the calling conventions for C. What registers are used? Which registers should be saved between function calls? What do those registers represent (return addresses or stack pointers)?</p>
<p>For 64-Bit ELF V2 ABI this looks like:</p>
<pre><code>
                                  Higher addresses
                             ┌─────────────────────────────┐
     Caller&#39;s SP  ─────────► │  Back-chain (caller&#39;s)      │  8 bytes
                             ├─────────────────────────────┤
                             │  Floating-Point Register    │  8 × (32 − N) bytes
                             │  Save Area                  │  fN saved at
                             │  (optional, callee saves)   │  caller_SP − 8×(32−N)
                             ├─────────────────────────────┤
                             │  General-Purpose Register   │  8 × (32 − N) bytes
                             │  Save Area                  │  rN saved at
                             │  (optional, callee saves)   │  (below FPR save area)
                             ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤
                             │  Alignment padding          │  0 or 8 bytes
                             │  (for quadword alignment)   │  (if needed)
                             ├─────────────────────────────┤
                             │ Vector Register             │  16 × (32 − N) bytes
                             │ Save Area                   │  vN saved at
                             │ (optional, quadword aligned)│  (below GPR area + pad)
                             ├─────────────────────────────┤
                             │  Local Variable Space       │  variable
                             │  (optional)                 │
                             ├─────────────────────────────┤
     SP + 32      ─────────► │  Parameter Save Area        │  variable (optional)
                             │  (optional, 8 × num params) │  (quadword aligned)
                             ├─────────────────────────────┤
     SP + 24      ─────────► │  TOC Save (r2)              │  8 bytes
                             ├─────────────────────────────┤
     SP + 16      ─────────► │  LR Save                    │  8 bytes
                             ├─────────────────────────────┤
     SP + 8       ─────────► │  CR Save (word) + Reserved  │  4 + 4 bytes
                             ├─────────────────────────────┤
     SP           ─────────► │  Back-chain                 │  8 bytes
                             └─────────────────────────────┘
                                  Lower addresses
</code></pre>
<blockquote>
<p>The minimum stack frame size shall be 32 bytes. A minimum stack frame consists of the first 4 doublewords (back-chain doubleword, CR save word and reserved word, LR save doubleword, and TOC pointer doubleword), with padding to meet the 16-byte alignment requirement.</p>
</blockquote>
<p>For OCaml we retain this 32 byte structure for compatibility with C and layer OCaml specific structure on top. Here are the interesting bits:</p>
<ol type="1">
<li><em>LR is always at CFA+16</em>. OCaml uses the Power ELF V2 ABI which mandates the caller reserves SP+16 for the link register save. So we use CFI to declare this fact using <code>.cfi_offset 65, 16</code>.</li>
<li><em>Function prologue instructions reordered</em>. The old prologue adjusted the stack pointer register (SP) first (<em>addi</em>), then saved LR. The new prologue saves LR/TOC into the caller’s frame first, then adjusts SP with <em>stdu</em> (store-with-update, which atomically writes the back-chain pointer). This more closely matches the Power ABI convention that C compilers follow and is <em>probably</em> better for async-signal safety. The Table of Contents (or TOC) is a pointer used on Power for position-independent code support.</li>
<li><em>cfi_startproc moved before instructions</em>. This assembler directive tells the debugger where a function begins, previously it was added after the TOC setup and debug information. Now it’s directly after the symbol label so the unwinder covers the full function body and we avoid certain edgecases when setting breakpoints.</li>
<li><em>DWARF expressions for stack switches</em>. When switching between the OCaml stack and C stack (and the opposite direction), a moderately complex <em>.cfi_escape</em> expressions computes the CFA by dereferencing the saved stack pointer from the <em>c_stack_link</em> structure. This was the bulk of the work, understanding the structure of the C stack frames to load the right saved registers. Unfortunately there aren’t many tools to debug when you get this wrong, I ended up writing Python scripts to print out chunks of memory and work out which bits pointed to code or other stack frames. There is a project in here somewhere to run the stack machine for CFI and validating it against memory that I might get to later this year.</li>
<li><em>Signal frames</em>. Runtime entry points (caml_call_gc, caml_c_call, caml_call_realloc_stack, etc.) are marked with <em>.cfi_signal_frame</em> so the unwinder treats them as asynchronous interruption points, prints out “&lt;signal handler called&gt;” and avoids some stack integrity checks that cause GDB to stop unwinding.</li>
<li><em>Pre-initialized LR save area</em>. In caml_start_program, before calling into OCaml, the code stores the return address at SP+16 so the unwinder can find LR even if the first OCaml function has fun_frame_required = false. This was needed for exception handling code.</li>
</ol>
<p>Here’s a nice ASCII diagram of an OCaml Frame on Power:</p>
<pre><code>
                           Higher addresses
                      ┌─────────────────────────┐
        CFA+16  ───►  │  LR save (return addr)  │  8 bytes ◄── always at caller_SP+16
                      ├─────────────────────────┤
        CFA+8   ───►  │  TOC save (r2)          │  8 bytes ◄── always at caller_SP+8
                      ├─────────────────────────┤
        CFA     ───►  │                         │
                      ╞═════════════════════════╡ ◄── SP + frame_size  (= CFA)
                      │                         │
                      │  Reserved stack space   │  32 bytes
                      │  (ABI-mandated)         │
                      ├─────────────────────────┤
                      │  Float stack slots      │  num_float_slots × 8
                      ├─────────────────────────┤
                      │  Int stack slots        │  num_int_slots × 8
                      ├─────────────────────────┤
                      │  (stack_offset area)    │  trap frames (16)
                      │                         │  outgoing params, etc.
                      ├─────────────────────────┤
          SP    ───►  │  (16-byte aligned)      │
                      └─────────────────────────┘
                           Lower addresses</code></pre>
<p>Now we have working backtraces on Power for GDB. With working CFI in place, the next step was adding frame pointer support.</p>
<h3 id="more-frame-pointers">More frame pointers</h3>
<p>I wrote about the motivation for frame pointers in <a href="https://lambdafoo.com/posts/2025-02-24-ocaml-frame-pointers.html">Why do frame pointers matter for OCaml?</a>. At the end I mentioned I didn’t have working code for RISC-V, Power or s390x. This is the next chunk of work getting frame pointers for the full set of architectures.</p>
<h4 id="power-frame-pointers">Power Frame Pointers</h4>
<p>Power frame pointer support built on top of the earlier CFI work and adds save/restore instructions for maintaining r31 (the frame pointer register). Power ABI documents use the term <em>back-chain</em> to describe the saved frame pointer for the previous frame, which is also used in s390x documents later on. The main changes were:</p>
<ol type="1">
<li><em>Allocation pointer moved from r31 to r23</em>. The GC allocation pointer previously lived in r31, which is the standard Power ABI frame pointer register. Moving it to r23 (another callee-saved register) freed up r31 for its intended purpose.</li>
<li><em>Back-chain maintained for every frame</em>. Every frame allocation now uses stdu (store-with-update) instead of addi to adjust the stack pointer, which atomically writes the back-chain. This required some recalculation of instruction sizes to ensure we always emit valid branch instructions.</li>
<li><em>Trap frame was enlarged from 16 to 48 bytes</em>. This was the surprise! Trap frames are used by the OCaml runtime for handling exceptions and effects. The old 16-byte trap frame placed trap data at SP+32 and SP+40 (within the reserved area). With stdu-based allocation, the back-chain at new_SP points to old_SP, and the ABI expects old_SP+16 to hold LR. At trap_size=16, old_SP+16 = new_SP+48, which falls outside the allocated region. Increasing to 48 bytes ensures the LR save area at back_chain+16 lies within the trap frame, and trap data at new_SP+32/new_SP+40 doesn’t collide with the caller’s save area.</li>
<li><em>Runtime frame pointer fixups</em>. When the OCaml stack grows, the runtime needs to walk the stack and rewrite the saved FP values so the frame pointer chain remains valid.</li>
</ol>
<p>Now an OCaml Frame with frame pointers looks like this:</p>
<pre><code>                               Higher addresses
                          ┌─────────────────────────┐
            CFA+16  ───►  │  LR save (return addr)  │  8    ◄── ABI: caller_SP+16
                          ├─────────────────────────┤
            CFA+8   ───►  │  TOC save (r2)          │  8    ◄── ABI: caller_SP+8
                          ├─────────────────────────┤
            CFA     ───►  │  Back-chain (prev SP)   │  8    ◄── written by stdu
            ══════════════╪═════════════════════════╪══════════ FP after prologue
                          │                         │
                          │  Reserved stack space   │  32 bytes (ABI-mandated)
                          ├─────────────────────────┤
                          │  Float stack slots      │  num_float_slots × 8
                          ├─────────────────────────┤
                          │  Int stack slots        │  num_int_slots × 8
                          ├─────────────────────────┤
     fp_save_offset ───►  │  Saved r31 (old FP)     │  8    ◄── FP-only
                          ├─────────────────────────┤
                          │  (stack_offset area)    │  trap frames, outgoing
                          │                         │  params, etc.
                          ├─────────────────────────┤
            SP = FP ───►  │  (16-byte aligned)      │
                          └─────────────────────────┘
                               Lower addresses</code></pre>
<p>This change was fairly painful because I didn’t realise I needed to increase the trap size and ran into issues with instructions trashing saved registers on the stack. GDB support for watching memory addresses and breaking when something wrote to that address was really useful here. Go read the <a href="https://github.com/ocaml/ocaml/pull/14482">ocaml#14482</a> for more details.</p>
<h4 id="risc-v-frame-pointers">RISC-V Frame Pointers</h4>
<p>RISC-V frame pointers was started by Miod Vallat in 2024, who had half the tests passing before I picked this up again. The surprising thing about RISC-V is that the frame pointer points to CFA, <em>above the saved registers</em>, rather than to the frame record itself like ARM64 does. The RISC-V ABI convention is s0 = sp + frame_size, pointing to the slot just past the <em>{saved_s0, saved_ra}</em> pair at the top of the frame.</p>
<p>Comparing this to ARM64 makes the difference more obvious (note x29 is the ARM64 frame pointer register).</p>
<pre><code>    RISC-V                              ARM64
    ──────                              ─────
         ┌───────────┐                       ┌───────────┐
    s0 ──▶  CFA      │                       │  CFA      │
         ├───────────┤                       ├───────────┤
    -8   │  ra       │                  -8   │  x30 (lr) │
         ├───────────┤                       ├───────────┤
    -16  │  old s0   │            x29 ──▶-16 │  old x29  │
         ├───────────┤                       ├───────────┤
         │  locals   │                       │  locals   │
         └───────────┘                       └───────────┘

    s0 = CFA                           x29 = CFA - 16
    s0 = sp + frame_size               x29 = sp + frame_size - 16</code></pre>
<p>The main pain-point after working this out was adjusting the frame walking code in the runtime and adjusting the runtime assembly stubs. RISC-V frame pointers point to CFA (above the frame record), not to the frame record itself. When <em>caml_try_realloc_stack</em> rewrites frame pointers after growing a fiber stack (OCaml 5 features effect handlers which get implemented as fiber stacks in the runtime), it must account for this: fp-&gt;prev is a CFA value, and the next frame record is at fp-&gt;prev - 1 (i.e. CFA − 16 bytes).</p>
<p>This ticked off another architecture supporting frame pointers and with that the Tier-1 unsupported architectures list got shorter. <a href="https://github.com/ocaml/ocaml/pull/14506">ocaml#14506</a> has the full code.</p>
<h4 id="s390x-frame-pointers">s390x Frame Pointers</h4>
<p>Now, s390x frame pointers seemed like they would be straightforward. I had implemented other platforms like Power and ARM64, and had a good understanding of which areas I needed to modify:</p>
<ol type="1">
<li>enable the configure flag,</li>
<li>update the code emission logic for frame sizes,</li>
<li>fix up the prologue/epilogue assembly for back-chain handling,</li>
<li>add CFI directives, handle effect stack switching, and</li>
<li>update the runtime’s assembly stubs.</li>
</ol>
<p>The usual drill, I even had it hand-written on paper. What I didn’t account for is the difference in stack layout, which can be reduced to OCaml wants to store the back-chain and r14 at SP+0 and SP+8 respectively, while s390x ABI mandates back-chain at SP+0 and r14 at SP+112. The s390x ABI also requires a minimum of 160 bytes for the frame size. I’ll write about this another time but the result is I’m still working out what approach to take (the code is at <a href="https://github.com/tmcgilchrist/ocaml/pull/24">tmcgilchrist/ocaml#24</a>). All the options seem terrible right now.</p>
<h3 id="smaller-fixes">Smaller fixes</h3>
<p>While I was working in this area there were some smaller fixes to be made. Recall from earlier we need to write <em>.cfi_escape</em> expressions to describe how to find the C stack from OCaml (and the reverse). The expressions for s390x were slightly wrong and needed some tweaking; <a href="https://github.com/ocaml/ocaml/pull/14500">ocaml#14500</a> has more details. Once I had stack diagrams for s390x the work mostly involved writing Python to inspect memory and adjusting where the CFI expressions pointed to in memory.</p>
<p>FreeBSD frame pointers was requested during ICFP 2025 and I thought it would be a simple matter of changing the configure script to allow it. Nothing is ever simple. The frame pointer tests were failing because the first symbol in a module would clash with the <em>module.entry</em> symbol (representing the module initialisation code associated with every module in OCaml e.g. camlFoo.entry) breaking the frame pointer tests. Once I re-discovered that detail (<a href="https://github.com/ocaml/ocaml/issues/4690">ocaml#4690</a>), the fix was simple. This seems to be a common problem for LLVM toolchains. For good measure I documented how to use <em>dtrace</em> and <em>pmcstat</em> to do CPU profiling, see <a href="https://github.com/ocaml/ocaml/pull/14486">ocaml#14486</a>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>With all that OCaml now has working debugger support on Power, completing the set of Tier-1 architectures where such tools will work. This exceeds the support that existed in OCaml prior to 5.0 and completes the one thorny issue I knew about to achieve feature parity. Closing the CFI s390x bug I reported in 2024 resolved the last known issue with debuggers on that platform.</p>
<p>On top of that we have frame pointer support for more architectures than ever, if you’re debugging or profiling OCaml on Power, RISC-V, ARM64, or AMD64, you now get clean backtraces. As I wrote in the OCaml manual <a href="https://ocaml.org/manual/5.4/profil.html#s:ocamlprof-time-profiling">section on profiling</a> this is key to getting accurate profiling using Linux perf and macOS Instruments. FreeBSD users can profile with dtrace and pmcstat, and there is documentation on how to get started.</p>
<p>All these changes should eventually reach an OCaml release, if I’m lucky the smaller bug fixes will squeeze into 5.6 and the rest targeting 5.7. That leaves s390x frame pointers (still wrestling with the 160-byte minimum frame problem) and Windows (the elephant in the room). We shall see.</p>
</div>
]]></description>
    <pubDate>Sat, 07 Feb 2026 00:00:00 UT</pubDate>
    <guid>https://lambdafoo.com/posts/2026-02-07-more-cfi-and-frame-pointers.html</guid>
    <dc:creator>Tim McGilchrist</dc:creator>
</item>
<item>
    <title>Why do frame pointers matter for OCaml?</title>
    <link>https://lambdafoo.com/posts/2025-02-24-ocaml-frame-pointers.html</link>
    <description><![CDATA[<div class="post">
  <h1 class="post-title">Why do frame pointers matter for OCaml?</h1>
  <span class="post-date">February 24, 2025</span>
  <p>At the end of last year at Tarides, my colleagues and I worked on improving frame pointer support in OCaml. While pitching the work internally, someone asked: <strong>Why do frame pointers matter for OCaml?</strong> Here’s the answer I gave.</p>
<p>Frame pointers give us a reliable way to profile OCaml programs using standard tools like perf and eBPF. Instead of building OCaml-specific profiling solutions, we can reuse what already exists. Instruments on macOS, the whole ecosystem of eBPF tools on Linux. That’s a big win for a smaller community like OCaml’s. Targeting all Tier-1 supported platforms (AMD64, ARM64, RISC-V, s390x and Power) would give a consistent experience across them all, it’s embarrassing to omit a platform when listing features. As a bonus, frame pointers help debuggers when DWARF information is missing, providing a last resort option.</p>
<h1 id="what-are-frame-pointers">What are Frame Pointers?</h1>
<p>The frame pointer is a register (<code>%rbp</code> on AMD64 or <code>x29</code> on ARM64) that points to the base of the current stack frame. The stack frame (also known as the activation frame or the activation record) refers to the portion of the stack allocated to a single function call. This is where a function will save registers to or use for storage while performing work. By saving the frame pointer along with the return address, the call stack for OCaml can be calculated in a process called unwinding. As an example ARM64 call stack looks like this (note that the stack grows downwards to lower addresses):</p>
<pre class="assembly"><code>          Stack                        Instruction Text
   ┌────────────────────┐            ┌────────────────────┐
   │Return Address (x30)┼───────────►│ ancestor function  │
┌─►│Saved Frame Pointer │            │                    │
│  ┼────────────────────┼            │                    │
│  │Local allocations   │     ┌─────►│ parent function    │
│  │                    │     │      │                    │
│  ├────────────────────┤     │      │                    │
│  │Return Address (x30)├─────┼─────►│ grand-parent func  │
└──┤Saved Frame Pointer │     │      │                    │
┌─►┼────────────────────┼     │      │                    │
│  │Local allocations   │     │   ┌─►│ current function   │
│  │                    │     │   │  │                    │
│  ├────────────────────┤     │   │  └────────────────────┘
│  │Return Address (x30)├─────┘   │
└──┤Saved Frame Pointer │         │
   ┼────────────────────┼◄──┐     │      Registers
   │Local allocations   │   │     │ ┌──────────────────────┐
   │                    │   └─────┼─┤ x29 - frame pointer  │
   └────────────────────┘◄──┐     │ │ x30 - link register  │
                            └─────┼─┤ sp  - stack pointer  │
                                  └─┼ pc  - program counter│
                                    └──────────────────────┘</code></pre>
<p>More specifically, frame pointers can be used by profilers to unwind the stack in a language agnostic way. Many languages like C/C++, Rust, and Erlang support maintaining frame pointers.</p>
<h1 id="return-of-the-frame-pointers">Return of the Frame Pointers</h1>
<p>We started with the ambitious goal of adding frame pointer support to all the Tier-1 platforms supported by OCaml. That included the popular AMD64 and ARM64 platforms that cover most users, and less common platforms like RISC-V, Power and s390x.</p>
<p>The specific focus was to address the recognised limitations of perf when used with OCaml 5 programs (<a href="https://github.com/ocaml/ocaml/issues/12563">#12563</a>). OCaml 5 (aka multicore) introduced non-contiguous stacks as part of the implementation of effects (see the PLDI 2021 paper on <a href="https://kcsrk.info/papers/retro-concurrency_pldi_21.pdf">retrofitting effect handlers – Section 5.5</a>). At a high level, these stacks are stored in memory allocated by the runtime and not on the stack as would happen with C/C++. These non-contiguous stacks are essentially unknown to perf and will not work correctly with the copying nature of perf, it will copy the wrong things. So traces produced for OCaml 5 will appear truncated or contain incorrect values. This same problem occurs if you use DWARF call graphs, in particular perf will copy a chunk of the stack without decoding it via DWARF and for OCaml 5 onwards what it copies might not be stack. So the best solution is frame pointers.</p>
<p>We started by looking at <a href="https://github.com/ocaml/ocaml/pull/11144">#11144</a> which restored frame pointer support for AMD64 after the OCaml 5.0 release. It was clear that adding frame pointers required changes to both the backend assembly code generation and the OCaml runtime, and that the general design should follow the existing AMD64 approach. With that understanding the first step was to extend the <code>--enable-frame-pointers</code> autoconf file to allow configuring the compiler on macOS which only required a one line change of <code>[x86_64-*-linux*|x86_64-*-darwin*],</code> to recognise the new platform (more <a href="https://github.com/tmcgilchrist/ocaml/blob/c1eec79948f699f2c9d8425c61bcc29553243bf1/configure.ac#L2458-L2472">context</a>).</p>
<p>Now, that allowed configuring the compiler with <code>./configure --enable-frame-pointers</code> and since the work had already been done to codegen and runtime, we only needed to test the changes on AMD64 macOS.</p>
<p>The next platform we chose was ARM64, because it is a common platform for cloud vendors with most offering a Linux ARM64 option and all new Apple laptops come with an ARM64 CPU (aka Apple Silicon). Getting ARM64 working seemed like it would have the most impact, both for OCaml deployments and for local development where Apple laptops are quite common. Implementing again starting with a simple change to add new Linux and macOS cases to autoconf <code>[x86_64-*-linux*|x86_64-*-darwin*|aarch64-*-linux*|aarch64-*-darwin*],</code>. This exposes a configuration flag <code>Config.with_frame_pointers</code> that we can query when implementing the assembly code generation.</p>
<p>The assembly generation code is in the <code>asmcomp</code> directory of the <a href="https://github.com/ocaml/ocaml">OCaml sources</a>, the common code is in that directory and each platform supported has a sub-directory for it e.g. <code>asmcomp/arm64</code> for ARM64. Conceptually the changes required a modification to the calling convention for OCaml code to only use <code>x29</code> as a frame pointer (previously it was considered as a general purpose register), a second change to the calculation of stack frame sizes to allow for an extra register save, and finally emitting the correct assembly to save/restore <code>x29</code>.</p>
<p>The end result is that frame pointer save and restore is implemented using <code>stp</code> and <code>ldp</code> instructions plus an extra <code>add</code> instruction for updating <code>x29</code>. This results in an extra <code>add</code> plus saving an extra register for functions allocating a stack frame. In assembly this looks like:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode asm"><code class="sourceCode fasm"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="co">;; function prologue</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>	stp	x29<span class="op">,</span> x30<span class="op">,</span> <span class="op">[</span><span class="kw">sp</span><span class="op">,</span> <span class="op">#-</span><span class="dv">16</span><span class="op">]</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>	<span class="bu">sub</span>	<span class="kw">sp</span><span class="op">,</span> <span class="kw">sp</span><span class="op">,</span> <span class="op">#</span><span class="dv">16</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>	<span class="bu">add</span>	x29<span class="op">,</span>  <span class="kw">sp</span><span class="op">,</span> <span class="op">#</span><span class="dv">0</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="co">;; function epilogue</span></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a>	<span class="bu">add</span>	<span class="kw">sp</span><span class="op">,</span> <span class="kw">sp</span><span class="op">,</span> <span class="op">#</span><span class="dv">16</span></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a>	ldp	x29<span class="op">,</span> x30<span class="op">,</span> <span class="op">[</span><span class="kw">sp</span><span class="op">,</span> <span class="op">#-</span><span class="dv">16</span><span class="op">]</span></span></code></pre></div>
<p>Whereas previously the store and load would just operate on the <code>x30</code> the link register. The overhead is quite minimal. The compiler already maintained the <code>x29</code> register when calling into C and when using TSan. Plus there is minimal extra stack space usage, as we needed to be quad word aligned anyway, so the <code>x29</code> register is going into an already allocated space on the stack. I’m happy with how this turned out.</p>
<p>It’s interesting to note that the <a href="https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms">ABI on macOS</a> requires maintaining frame pointers, and certain popular Linux distributions like <a href="https://ubuntu.com/blog/ubuntu-performance-engineering-with-frame-pointers-by-default">Ubuntu</a>, <a href="https://pagure.io/fesco/issue/2923">Fedora</a> and <a href="https://gitlab.archlinux.org/archlinux/rfcs/-/merge_requests/26">Arch</a> are reenabling frame pointers in recent distributions. So I expect the impact on usability will be high and the runtime overhead quite low.</p>
<h1 id="what-we-achieved-and-whats-left">What we achieved and what’s left?</h1>
<p>The short version is that frame pointers are now available for the two most popular Unix platforms (AMD64 and ARM64), that covers most deployments of OCaml and many OCaml developers. The work is split across these PRs:</p>
<ul>
<li><p><a href="https://github.com/ocaml/ocaml/pull/13163">#13163</a>: Enabled frame pointers on macOS x86_64 worked with some minor autoconf changes to correctly detect macOS and updating the tests to remove Linux specific backtrace formatting. Available in OCaml 5.3.</p></li>
<li><p><a href="https://github.com/ocaml/ocaml/pull/13500">#13500</a>: Added frame pointers support for ARM64 on Linux and macOS. This will be released in OCaml 5.4.</p></li>
<li><p><a href="https://github.com/ocaml/ocaml/pull/13595">#13595</a>: Fixed a bug introduced in #13050 where the wrong Canonical Frame Address (CFA) register was used for calls from OCaml into non-allocating C code. This appeared as incorrect backtraces in debuggers like GDB when setting a breakpoint inside the C code.</p></li>
<li><p><a href="https://github.com/ocaml/ocaml/issues/13575">#13575</a>, <a href="https://github.com/ocaml/ocaml/pull/13635">#13635</a>: Maintain OCaml frame pointers correctly even when using C libraries that do not support them, allowing mixing OCaml code with frame pointers with an operating system environment that might not have them.</p></li>
<li><p><a href="https://github.com/ocaml/ocaml/pull/13751/">#13751</a>: Proposes a new section for the OCaml manual detailing how to use frame pointers with Linux perf highlighting what works and what does not work (DWARF stack unwinding). Hopefully this will get merged soon and close the documentation gap for using perf reliably with OCaml programs. Should be available in OCaml 5.4.</p></li>
</ul>
<p>The original target of all Tier-1 platforms was an ambitious one. Unfortunately, we didn’t manage to finish everything. The work for adding RISC-V frame pointer support was started but needs further work. The work in progress is available at <a href="https://github.com/tmcgilchrist/ocaml/pull/22">tmcgilchrist/ocaml#22</a>. The work on <a href="https://github.com/tmcgilchrist/ocaml/pull/24">s390x</a> and Power didn’t get much further than diagramming stack frames and reading ABI documents. In particular the Power backend is also missing CFI support which makes debugging very difficult and really needs to be added first before tackling frame pointer support. Essentially you don’t even have a working debugger while debugging any problems with your frame pointer changes.</p>
<p>Next steps are finding time or help with finishing off the remaining platforms, starting with RISC-V. If you’d like to help please get in touch. Then doing some comprehensive benchmarking for the ARM64 platform, my intuition is there is negligible overhead when adding frame pointers since it uses a load/store pair instruction, but firm results would be better.</p>
<p>If you’ve made it this far, thanks for reading.</p>
</div>
]]></description>
    <pubDate>Mon, 24 Feb 2025 00:00:00 UT</pubDate>
    <guid>https://lambdafoo.com/posts/2025-02-24-ocaml-frame-pointers.html</guid>
    <dc:creator>Tim McGilchrist</dc:creator>
</item>
<item>
    <title>Experimenting with OCaml and eBPF</title>
    <link>https://lambdafoo.com/posts/2025-02-15-ocaml-ebpf-usdt.html</link>
    <description><![CDATA[<div class="post">
  <h1 class="post-title">Experimenting with OCaml and eBPF</h1>
  <span class="post-date">February 15, 2025</span>
  <p>Building on top of the excellent book <em>BPF Performance Tools</em> by Brendan Gregg. How can we apply the techniques from Chapter 12 Languages to OCaml?</p>
<p>First OCaml is roughly equivalent to C, it’s a compiled language with a runtime written in C. It supports frame pointers using the <code>--enable-frame-pointers</code> configuration option on x86_64, with ARM64 support in OCaml 5.4. Eventually the code we’re interested in is C or looks roughly like C but with a different calling convention.</p>
<p>For tracing into the Linux kernel, you’ll need a distribution that is compiled with frame pointers like Ubuntu 24.04 and we can reuse the kernel’s own symbol table. There are some exceptions for inlined functions and some blacklisted functions that aren’t safe to trace. However for the pieces I’ve looked at like memory allocation and virtual memory, it is fine.</p>
<p>For the OCaml runtime written in C, it can be configured to include symbols, frame pointers and debuginfo for the portions written in C. The sections of the runtime written in assembly have symbols and frame pointers. For actual OCaml code it will have symbols and frame pointers, with limited debuginfo. Demo time!</p>
<h1 id="ocaml-function-tracing">OCaml Function Tracing</h1>
<p>Given this test program taken from a bug report <a href="https://github.com/ocaml/ocaml/issues/13123">#13123</a> against OCaml.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co">(* Build with:</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="co">  ocamlfind ocamlopt -package unix -package threads -thread -linkpkg -o liquidsoap_test.exe liquidsoap_test.ml *)</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> frame_size = <span class="fl">0.04</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> pcm_len = <span class="dt">int_of_float</span> (<span class="dv">44100</span>. *. frame_size)</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> channels = <span class="dv">2</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> mk_pcm () = <span class="dt">Array</span>.init channels (<span class="kw">fun</span> _ -&gt; <span class="dt">Array</span>.make pcm_len <span class="dv">0</span>.)</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> <span class="kw">rec</span> fn a =</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a>  <span class="kw">if</span> <span class="dt">Array</span>.length a &lt;&gt; <span class="dv">0</span> <span class="kw">then</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Gc</span>.full_major ();</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a>  <span class="kw">let</span> pcm = mk_pcm () <span class="kw">in</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a>  <span class="dt">ignore</span>(pcm);</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a>  Unix.sleepf <span class="fl">0.04</span>;</span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a>  fn [||]</span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> () =</span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a>  <span class="kw">let</span> deadweigth = <span class="dt">Array</span>.make (<span class="dv">40</span> * <span class="dv">1024</span> * <span class="dv">1024</span>) <span class="dv">1</span> <span class="kw">in</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a>  Unix.sleepf <span class="fl">0.04</span>;</span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a>  <span class="kw">let</span> th = Thread.create fn deadweigth <span class="kw">in</span></span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a>  Thread.join th</span></code></pre></div>
<p>And an opam switch created with frame pointers.</p>
<pre class="shell"><code>$ opam switch create 5.3.0 5.3.0+options ocaml-option-fp

$ ocamlfind ocamlopt -package unix -package threads -thread \
  -linkpkg -o liquidsoap_test.exe liquidsoap_test.ml</code></pre>
<p>Running this code will print the PID each time the <code>liquidsoap_test.exe</code> executable is run.</p>
<pre class="shell"><code>$ sudo bpftrace -e &#39;uprobe:/home/tsmc/ocaml-performance/liquidsoap_test.exe:caml_start_program { printf(&quot;OCaml run with process ID %d\n&quot;, pid); }&#39;</code></pre>
<p>Here we are using the information we know about OCaml startup, the <code>caml_start_program</code> is an assembly function that bridges the gap between the C startup code and OCaml, setting up the environment for the OCaml code. The section after <code>uprobe:</code> needs to point to the executable being run, change that if you want to trace something else.</p>
<p>Next, recall that we are dealing with a mix of regular C functions and OCaml functions. Listing the tracepoints available shows a mix of regular C functions prefixed with <code>caml_*</code> that are either part of the runtime or C primitives. OCaml compiler performs name mangling so anything coming from an OCaml source file will have a prefix <code>caml&lt;MODULE&gt;.</code> e.g. <code>camlStdlib__Domain*</code> for the <code>domain.ml</code> module from the standard library or <code>camlStdlib__Int.compare_296</code> for the compare function on Int. Armed with that knowledge. This command will list available probe points:</p>
<pre class="shell"><code>$ sudo bpftrace -l &#39;uprobe:/home/tsmc/ocaml-performance/liquidsoap_test.exe:*&#39;</code></pre>
<p>If we wanted to count the number of function calls in a binary, we could do it like so:</p>
<pre class="shell"><code>$ cat count.bt
# Printout matched program
uprobe:/home/tsmc/ocaml-performance/liquidsoap_test.exe:caml_start_program
{
  printf(&quot;OCaml run with process ID %d\n&quot;, pid);
}

# Trace function calls
uprobe:/home/tsmc/ocaml-performance/liquidsoap_test.exe:camlLiquidsoap_test*
{
    @[probe] = count();
}

$ sudo bpftrace count.bt
Attaching 5 probes...
OCaml run with process ID 128477
^C

@[uprobe:/home/tsmc/ocaml-performance/liquidsoap_test.exe:camlLiquidsoap_test.entry]: 1
@[uprobe:/home/tsmc/ocaml-performance/liquidsoap_test.exe:camlLiquidsoap_test.fn_327]: 1
@[uprobe:/home/tsmc/ocaml-performance/liquidsoap_test.exe:camlLiquidsoap_test.mk_pcm_273]: 1029
@[uprobe:/home/tsmc/ocaml-performance/liquidsoap_test.exe:camlLiquidsoap_test.fun_601]: 2058</code></pre>
<p>Another thing we could do is see how much time is spent in the minor GC promotion function.</p>
<p>OCaml uses a bump-pointer allocator for the minor heap, when that is full it will call a C function to scan the minor heap, destroy the junk, and promote anything that survives into the major heap. I know that the main entry point for this is called <code>caml_empty_minor_heap_promote</code>. So this script will instrument the entry and exit for that function and print out a histogram of the time taken.</p>
<pre class="shell"><code># cat gcprofile.bt

uprobe:/home/tsmc/ocaml-performance/liquidsoap_test.exe:caml_start_program
{
  printf(&quot;Attaching to OCaml process ID %d\n&quot;, pid);
}

uprobe:/home/tsmc/ocaml-performance/liquidsoap_test.exe:caml_empty_minor_heap_promote
{
  @t = nsecs;
}

uretprobe:/home/tsmc/ocaml-performance/liquidsoap_test.exe:caml_empty_minor_heap_promote / @t /
{
  @minor_gc_times = hist(nsecs - @t);
}</code></pre>
<p>What about the major GC? The design of that is more complicated but I know <code>major_collection_slice</code> does the majority of the work, so we attach there.</p>
<pre class="shell"><code># cat gcprofile_major.bt

uprobe:/home/tsmc/ocaml-performance/liquidsoap_test.exe:caml_start_program
{
  printf(&quot;Attaching to OCaml process ID %d\n&quot;, pid);
}

uprobe:/home/tsmc/ocaml-performance/liquidsoap_test.exe:major_collection_slice
{
  @t = nsecs;
}

uretprobe:/home/tsmc/ocaml-performance/liquidsoap_test.exe:major_collection_slice / @t /
{
  @major_gc_slice_times = hist(nsecs - @t);
}</code></pre>
<h1 id="take-away">Take Away</h1>
<p>OCaml programs can traced with eBPF and bpftrace. You need to install OCaml with frame pointers enabled and use a Linux distribution like Ubuntu 24.04 that also enables frame pointers, so you can trace into system libraries. The OCaml runtime and certain primitives use a symbol prefix of <code>caml_</code> and OCaml code uses a prefix of <code>caml&lt;MODULE&gt;</code> where <code>&lt;MODULE&gt;</code> is the OCaml module containing the code. This partially covers the functionality in <a href="https://ocaml.org/manual/5.3/profil.html">ocamlprof</a> which lets you profile function counts and branches taken in things like while, if and try. With eBPF we can count the function calls but more work needs to be done to support the branching constructs, essentially we need a USDT implementation for OCaml that understands OCaml’s name mangling and calling conventions. The upside is eBPF can be applied to any OCaml binary without needing a recompile.</p>
<p>Next steps are adding USDT probes to the OCaml runtime, so there is a static API for the GC, and after that expose USDT probe points from OCaml programs.</p>
</div>
]]></description>
    <pubDate>Sat, 15 Feb 2025 00:00:00 UT</pubDate>
    <guid>https://lambdafoo.com/posts/2025-02-15-ocaml-ebpf-usdt.html</guid>
    <dc:creator>Tim McGilchrist</dc:creator>
</item>
<item>
    <title>Building OCaml from assembly</title>
    <link>https://lambdafoo.com/posts/2024-08-30-building-ocaml-from-assembly.html</link>
    <description><![CDATA[<div class="post">
  <h1 class="post-title">Building OCaml from assembly</h1>
  <span class="post-date">August 30, 2024</span>
  <p>At work I’ve been focusing on improving the debugging experience with OCaml.
As part of that I’ve discovered how some of the pieces fit together, that might
be obvious in retrospect, but are interesting to at least me so I’m going to
post details about them here.</p>
<p>The first nugget is you can hand compile an OCaml program into a final executable.
What do I mean? You can ask the OCaml compiler to output all the assembly generated
that goes into a library or executable. Then take that an call the assembler yourself
to build it. First lets review how the compiler works.</p>
<h2 id="compilation-pipeline">Compilation Pipeline</h2>
<p>Here is a <em>grossly</em> simplified overview of the OCaml compiler. We feed in OCaml source code
in the form of ml/mli files, which flow through each stage and eventually end up
being emitted as either object files or textual assembly files. The first step from
OCaml Source to Parse Tree uses <a href="https://gallium.inria.fr/~fpottier/menhir/">menhir</a> to parse
and generate an untyped <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">AST</a> representing
the code in the source file. This is then type
checked into a typed tree, this is where the type theory happens. After that, there are some stages
where the typed tree is transformed into representations more suitable for generating assembly.
The final stage traverses the CMM/Linear AST generating assembly code for a specific
family of CPUs (like x86_64 or ARM64).</p>
<pre><code>                                      
 ┌──────────────┐   ┌──────────────┐  
 │ OCaml Source │   │  Parse Tree  │  
 │              ┼───►              │  
 └──────────────┘   └──────┬───────┘  
                           │          
 ┌──────────────┐   ┌──────▼───────┐  
 │    Lambda    │   │  Typed Tree  │  
 │              ◄───┼              │  
 └──────┬───────┘   └──────────────┘  
        │                             
 ┌──────▼───────┐   ┌──────────────┐  
 │  CMM/Linear  │   │    Emit      │  
 │              ┼───►   Assembly   │  
 └──────────────┘   └──────────────┘  
                                      </code></pre>
<p>Finally, this assembly is compiled by the system C compiler to produce object files or
executables to be run. So we could treat the OCaml compiler as a <em>fancy</em> way to
just generate assembly files, which we can then mess with to do things like add <a href="https://dwarfstd.org">DWARF
information</a> or optimise assembly routines, or just for pure fun.</p>
<h2 id="ocaml-source">OCaml source</h2>
<p>Starting with an OCaml program taken from <a href="https://doi.org/10.1145/3453483.3454039">Retrofitting Effect Handlers onto OCaml</a>. This program doesn’t compute anything interesting but it does show how OCaml’s FFI to C works and how to pass control between the two. So it is interesting for what it does.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>$ cat meander.ml</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="kw">external</span> ocaml_to_c</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>         : <span class="dt">unit</span> -&gt; <span class="dt">int</span> = <span class="st">&quot;ocaml_to_c&quot;</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="kw">exception</span> E1</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="kw">exception</span> E2</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> c_to_ocaml () = <span class="dt">raise</span> E1</span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> _ = <span class="dt">Callback</span>.register</span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a>          <span class="st">&quot;c_to_ocaml&quot;</span> c_to_ocaml</span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> omain () =</span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a>  <span class="kw">try</span> <span class="co">(* h1 *)</span></span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a>    <span class="kw">try</span> <span class="co">(* h2 *)</span> ocaml_to_c ()</span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a>    <span class="kw">with</span> E2 -&gt; <span class="dv">0</span></span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a>  <span class="kw">with</span> E1 -&gt; <span class="dv">42</span></span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> _ = <span class="kw">assert</span> (omain () = <span class="dv">42</span>)</span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a>$ cat meander_c.c</span>
<span id="cb2-17"><a href="#cb2-17" aria-hidden="true" tabindex="-1"></a><span class="ot">#include &lt;caml/mlvalues.h&gt;</span></span>
<span id="cb2-18"><a href="#cb2-18" aria-hidden="true" tabindex="-1"></a><span class="ot">#include &lt;caml/callback.h&gt;</span></span>
<span id="cb2-19"><a href="#cb2-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-20"><a href="#cb2-20" aria-hidden="true" tabindex="-1"></a>value ocaml_to_c (value <span class="dt">unit</span>) {</span>
<span id="cb2-21"><a href="#cb2-21" aria-hidden="true" tabindex="-1"></a>    caml_callback<span class="co">(*caml_named_value</span></span>
<span id="cb2-22"><a href="#cb2-22" aria-hidden="true" tabindex="-1"></a><span class="co">                  (&quot;c_to_ocaml&quot;), Val_unit);</span></span>
<span id="cb2-23"><a href="#cb2-23" aria-hidden="true" tabindex="-1"></a><span class="co">    return Val_int(0);</span></span>
<span id="cb2-24"><a href="#cb2-24" aria-hidden="true" tabindex="-1"></a><span class="co">}</span></span></code></pre></div>
<p>Reading from the bottom of the file, <code>meander.ml</code> asserts that the function <code>omain</code>
returns the value <code>42</code>. It gets that value by calling <code>ocaml_to_c</code> which is actually an
external C function defined in <code>meander_c.c</code>, imported into OCaml using
<code>external</code> in the first line of <code>meander.ml</code>. The C function calls back into
OCaml using <code>caml_callback</code> which executes the <code>c_to_ocaml</code> function. An exception is
raised, unwinding everything back to <code>omain</code> with it’s try/with blocks.</p>
<p>To compile this program we use the OCaml 5.2 compiler.</p>
<pre class="shell"><code>$ ocamlopt --version
5.2.0
$ ocamlopt meander_c.c meander.ml -o meander.exe
$ ./meander.exe
$ echo $?
0</code></pre>
<p>Running the program under macOS gives a successful exit code, so it must have got
<code>42</code> and the assertion passed. Try changing the value 42 to something else to check.</p>
<p>Next we will pull apart what the compiler is doing to generate the final
executable. Run <code>ocamlopt</code> with these flags:</p>
<pre class="shell"><code> $ ocamlopt meander_c.c meander.ml -o meander.exe -S -g -dstartup -verbose

+ cc  -O2 -fno-strict-aliasing -fwrapv -pthread -pthread  -D_FILE_OFFSET_BITS=64 -c -g -I&#39;/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml&#39; &#39;meander_c.c&#39;
+ cc -c -Wno-trigraphs  -o &#39;meander.o&#39; &#39;meander.s&#39;
+ cc -c -Wno-trigraphs  -o &#39;/var/folders/z_/7yzlrkjn6pd441zs1qhzpjv00000gn/T/camlstartup9b503b.o&#39; &#39;meander.exe.startup.s&#39;
+ cc -O2 -fno-strict-aliasing -fwrapv -pthread  -pthread   -o &#39;meander.exe&#39;  &#39;-L/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml&#39;  &#39;/var/folders/z_/7yzlrkjn6pd441zs1qhzpjv00000gn/T/camlstartup9b503b.o&#39; &#39;/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml/std_exit.o&#39; &#39;meander.o&#39; &#39;/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml/stdlib.a&#39; &#39;meander_c.o&#39; &#39;/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml/libasmrun.a&#39;     -lpthread</code></pre>
<p>Focusing on the <code>ocamlopt</code> command, the flag <code>-S</code> asks the compiler to generate the assembly
files for the OCaml source, <code>-g</code> asks for debug information to be included, <code>-dstartup</code>
generates the startup file that bridges between the C startup and OCaml (more on that later)
and <code>-verbose</code> tells <code>ocamlopt</code> to print out what commands it’s running.</p>
<p>So, what has been printed out? The first line is compiling the <code>meander_c.c</code> file into
an object file, the <code>meander_c.o</code> file in the current directory. Then we have a <code>meander.s</code>
file being compiled (assembled) into another object file. This is the output of compiling
the <code>meander.ml</code> OCaml source into assembly. The <code>--verbose</code> option doesn’t show how that
file gets created. The third line is compiling the startup file from <code>meander.exe.startup.s</code>
into another object file. The final step is calling the linker via <code>cc</code> to generate the final
<code>meander.exe</code> file. You can see all the object files from previous steps plus the OCaml stdlib
<code>_opam/lib/ocaml/stdlib.a</code> and <code>_opam/lib/ocaml/std_exit.o</code> from the local opam switch
plus the OCaml libraries being added to the search path as
<code>-L/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml</code>. It is not that dissimilar to building a
C program.</p>
<p>What about those assembly files? The <code>meander.s</code> is our ARM64 assembly file for <code>meander.ml</code>
open it up and search for <code>entry</code>. If you’re on Linux or another architecture like x86_64
the assembly will be different but the names will be the same. This is the entry point
called when executing the program, the OCaml runtime jumps to the symbol <code>_camlMeander.entry</code>.</p>
<pre class="assembly"><code>	.globl	_camlMeander.entry
L114:
	mov	x16, #34
	stp	x16, x30, [sp, #-16]!
	bl	_caml_call_realloc_stack
	ldp	x16, x30, [sp], #16
_camlMeander.entry:
	.cfi_startproc
	ldr	x16, [x28, #40]
	add	x16, x16, #328
	cmp	sp, x16
	bcc	L114
	sub	sp, sp, #16
	.cfi_adjust_cfa_offset	16
	.cfi_offset 30, -8
	str	x30, [sp, #8]</code></pre>
<p>Search for other symbols like <code>omain</code> and <code>c_to_ocaml</code></p>
<pre class="assembly"><code>	.globl	_camlMeander.omain_278
_camlMeander.omain_278:
	.loc	1	8
	.cfi_startproc
	sub	sp, sp, #16
	.cfi_adjust_cfa_offset	16
	.cfi_offset 30, -8
	str	x30, [sp, #8]
....
_camlMeander.c_to_ocaml_273:
	.file	1	&quot;meander.ml&quot;
	.loc	1	5
	.cfi_startproc
	sub	sp, sp, #16
	.cfi_adjust_cfa_offset	16
	.cfi_offset 30, -8
	str	x30, [sp, #8]</code></pre>
<p>All the code is there, we just need to assemble it. On my machine (macOS ARM64) running this
command will give me an executable <code>meander.exe</code> without even using <code>ocamlopt</code>.</p>
<pre class="shell"><code>$ gcc -O2 -fno-strict-aliasing -fwrapv -pthread -D_FILE_OFFSET_BITS=64 \
      -c -g -I&#39;/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml&#39; &#39;meander_c.c&#39;
$ gcc -c -Wno-trigraphs -o &#39;meander.o&#39; &#39;meander.s&#39;
$ gcc -c -Wno-trigraphs -o meanderCamlStartup.o meander.exe.startup.s
$ gcc -o &#39;meander.exe&#39; &#39;-L/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml&#39; &#39;meanderCamlStartup.o&#39; \
       &#39;/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml/std_exit.o&#39; &#39;meander.o&#39; \
       &#39;/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml/stdlib.a&#39; &#39;meander_c.o&#39; \
       &#39;/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml/libasmrun.a&#39; -lpthread</code></pre>
<p>Try it out, you’ll need to change <code>/Users/tsmc/code/ocaml/owee/_opam</code> to your local directory with
a local opam switch for OCaml 5.2.</p>
<h2 id="startup-file">Startup file</h2>
<p>What about that startup file? <code>meander.exe.startup.s</code> What is that for?
Open the file and search for <code>_caml_program</code>, this is the entry point called by the
startup code written in C.</p>
<pre class="shell"><code>_caml_program:
	.cfi_startproc
	ldr	x16, [x28, #40]
	add	x16, x16, #328
	cmp	sp, x16
	bcc	L136
	sub	sp, sp, #16
	.cfi_adjust_cfa_offset	16
	.cfi_offset 30, -8
	str	x30, [sp, #8]
L135:
	bl	_camlCamlinternalFormatBasics$entry
L137:
	adrp	x0, _caml_globals_inited@GOTPAGE
	ldr	x0, [x0, _caml_globals_inited@GOTPAGEOFF]
	ldr	x2, [x0, #0]
	add	x3, x2, #1
	dmb	ishld
	str	x3, [x0, #0]
	bl	_camlStdlib$entry</code></pre>
<p>The code is responsible for calling the <code>entry</code> initialisation function for all
imported modules. In <code>meander.ml</code> we only include a couple of functions from the
standard library so we have <code>_camlStdlib$entry</code>, <code>_camlStdlib__Sys$entry</code> etc then
we finally call <code>_camlMeander$entry</code> which we saw earlier.</p>
<p>We need this assembly file to generate an object file for linking into the final executable.
If not the linker won’t have <code>_caml_program</code> symbol available and none of the OCaml Stdlib will
be initialised. A fun exercise is to re-write this file to not call all those <code>entry</code> functions
but still provide <code>_caml_program</code> and call into <code>_camlMeander$entry</code>.</p>
<p>I made small <a href="https://github.com/ocaml/ocaml/pull/13217">PR #13217</a> to improve this behaviour
to loop over a table of functions to call rather than generating large slabs of identical code.</p>
<h2 id="bonus">Bonus</h2>
<p>Now you we don’t need the OCaml compiler to write OCaml.</p>
<p>But seriously the purpose for discovering this was to investigate adding DWARF debugging
information to OCaml on macOS. That’s a different topic for next time.</p>
</div>
]]></description>
    <pubDate>Fri, 30 Aug 2024 00:00:00 UT</pubDate>
    <guid>https://lambdafoo.com/posts/2024-08-30-building-ocaml-from-assembly.html</guid>
    <dc:creator>Tim McGilchrist</dc:creator>
</item>
<item>
    <title>Getting Started with LLDB on OCaml</title>
    <link>https://lambdafoo.com/posts/2024-08-03-lldb-ocaml.html</link>
    <description><![CDATA[<div class="post">
  <h1 class="post-title">Getting Started with LLDB on OCaml</h1>
  <span class="post-date">August  3, 2024</span>
  <p>This post is a companion to KC’s excellent <a href="https://kcsrk.info/ocaml/gdb/2024/01/20/gdb-ocaml/">Getting Started with GDB on OCaml</a> that shows how to debug OCaml programs with GDB. I wanted to demonstrate the same functionality using LLDB on Linux ARM64. The aim is to show the beginnings of debugging OCaml programs with LLDB and highlight a few LLDB tricks I’ve found.</p>
<p>We will start with the same program:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co">(* fib.ml *)</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> <span class="kw">rec</span> fib n =</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>  <span class="kw">if</span> n = <span class="dv">0</span> <span class="kw">then</span> <span class="dv">0</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>  <span class="kw">else</span> <span class="kw">if</span> n = <span class="dv">1</span> <span class="kw">then</span> <span class="dv">1</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>  <span class="kw">else</span> fib (n<span class="dv">-1</span>) + fib (n<span class="dv">-2</span>)</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> main () =</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>  <span class="kw">let</span> r = fib <span class="dv">20</span> <span class="kw">in</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>  <span class="dt">Printf</span>.printf <span class="st">&quot;fib(20) = %d</span><span class="ch">\n</span><span class="st">&quot;</span> r</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> _ = main ()</span></code></pre></div>
<p>Compiled with OCaml version 5.2.0.</p>
<pre class="shell"><code>$ ocamlopt --version
5.2.0
$ ocamlopt -g -o fib.exe fib.ml
$ ./fib.exe 20
fib(20) = 6765</code></pre>
<p>The program prints the 20th Fibonacci number, nothing special but interesting because it has recursion. Now start up an lldb session.</p>
<pre class="shell"><code>$ lldb ./fib.exe</code></pre>
<h2 id="setting-breakpoints">Setting breakpoints</h2>
<p>We want to set breakpoints in the <code>fib</code> function. The first way to set breakpoints is based on OCaml function names, due to a process called name mangling, they look slightly different in the executable. Since we don’t know the exact names we can use tab completion to help us.</p>
<pre class="shell"><code>(lldb) br s -n camlFib.fib_ # press tab to show the possible matches
(lldb) br s -n camlFib.fib_270 # There is only one matching ending 270
Breakpoint 1: where = fib.exe`camlFib.fib_270 + 76, address = 0x0000000000051084</code></pre>
<p>You can also set break points using lldb’s file name and number combination. This time we will set a breakpoint in the <code>main</code> function, which starts at line 6 in <code>fib.ml</code>.</p>
<pre class="shell"><code>(lldb) br s -f fib.ml -l 6
Breakpoint 2: where = fib.exe`camlFib.main_271, address = 0x0000000000050f48
(lldb)</code></pre>
<p>Now we can run the program.</p>
<pre class="shell"><code>Breakpoint 2: where = fib.exe`camlFib.main_272, address = 0x00000000000510c8
(lldb) run
Process 11987 launched: &#39;/home/tsmc/projects/fib.exe&#39; (aarch64)
Process 11987 stopped
* thread #1, name = &#39;fib.exe&#39;, stop reason = breakpoint 2.1
    frame #0: 0x0000aaaaaaaf10c8 fib.exe`camlFib.main_272 at fib.ml:7
   4   	  else if n = 1 then 1
   5   	  else fib (n-1) + fib (n-2)
   6   	
-&gt; 7   	let main () =
   8   	  let r = fib 20 in
   9   	  Printf.printf &quot;fib(20) = %d\n&quot; r
   10  	</code></pre>
<p>The program execution starts in the lldb session and we stop at the breakpoint at <code>main</code>. LLDB has a terminal UI mode for stepping through the file. This can be started up typing <code>gui</code> into the <code>lldb</code> prompt, it should look similar to this.</p>
<figure>
<img src="../images/lldb-aarch64-fib.png" alt="Terminal image of lldb running fib.ml showing gui" />
<figcaption aria-hidden="true">Terminal image of lldb running fib.ml showing gui</figcaption>
</figure>
<p>Note that we can see both breakpoints highlighted on the line numbers, the backtrace of how we got here and the current line is highlighted. Use <code>Esc</code> to exit the terminal UI mode and go back to the lldb prompt. We will use the lldb prompt for the rest of the post.</p>
<h2 id="examining-the-stack">Examining the stack</h2>
<p>You can step through the OCaml program with lldb commands <code>n</code> and <code>s</code>. After a few <code>n</code>’s, examine the backtrace using the <code>bt</code> command.</p>
<pre class="shell"><code>(lldb) bt
* thread #1, name = &#39;fib.exe&#39;, stop reason = breakpoint 1.1
  * frame #0: 0x0000aaaaaaaf1084 fib.exe`camlFib.fib_270 at fib.ml:5
    frame #1: 0x0000aaaaaaaf108c fib.exe`camlFib.fib_270 at fib.ml:5
    frame #2: 0x0000aaaaaaaf108c fib.exe`camlFib.fib_270 at fib.ml:5
    frame #3: 0x0000aaaaaaaf108c fib.exe`camlFib.fib_270 at fib.ml:5
    frame #4: 0x0000aaaaaaaf10f4 fib.exe`camlFib.main_272 at fib.ml:8
    frame #5: 0x0000aaaaaaaf11bc fib.exe`camlFib.entry at fib.ml:11
    frame #6: 0x0000aaaaaaaee684 fib.exe`caml_program + 476
    frame #7: 0x0000aaaaaab46b48 fib.exe`caml_start_program + 132
    frame #8: 0x0000aaaaaab46640 fib.exe`caml_main [inlined] caml_startup(argv=&lt;unavailable&gt;) at startup_nat.c:145:7
    frame #9: 0x0000aaaaaab4663c fib.exe`caml_main(argv=&lt;unavailable&gt;) at startup_nat.c:151:3
    frame #10: 0x0000aaaaaaaee310 fib.exe`main(argc=&lt;unavailable&gt;, argv=&lt;unavailable&gt;) at main.c:37:3
    frame #11: 0x0000fffff7d784c4 libc.so.6`__libc_start_call_main(main=(fib.exe`main at main.c:31:1), argc=1, argv=0x0000fffffffffb58) at libc_start_call_main.h:58:16
    frame #12: 0x0000fffff7d78598 libc.so.6`__libc_start_main_impl(main=0x0000aaaaaaba0de0, argc=16, argv=0x000000000000000f, init=&lt;unavailable&gt;, fini=&lt;unavailable&gt;, rtld_fini=&lt;unavailable&gt;, stack_end=&lt;unavailable&gt;) at libc-start.c:360:3
    frame #13: 0x0000aaaaaaaee3b0 fib.exe`_start + 48</code></pre>
<p>You can see the backtrace includes the recursive calls to <code>fib</code> function, the <code>main</code> function in <code>fib.ml</code>, followed by some assembly functions and a number of functions from the OCaml runtime. In between frame #8 and #5 is where the runtime, written in C, switches into assembly to setup the environment to execute the OCaml program. Then we actually enter the OCaml program at frame #5 via <code>camlFib.entry</code>. This function calls initialisation functions for the program and any dependencies like Stdlib that get used.</p>
<h2 id="examining-values">Examining values</h2>
<p>The support for examining OCaml values in LLDB, as you would for say C, is a bit lacking. Not enough information is being emitted by the OCaml compiler to do this yet. So we need to understand how OCaml represents values at runtime and what the OCaml calling conventions are. First we will look at examining values.</p>
<p>Here we are on ARM64 so our registers are named <code>x0-x30</code> with <code>sp</code> representing the stack pointer.
The first <a href="https://github.com/ocaml/ocaml/blob/5.2.0/asmcomp/arm64/proc.ml#L168-L172">16 arguments are passed in registers</a>, starting from register x0. So the arguments to the <code>fib</code> function should be in the <code>x0</code> register. We also know that the argument to fib is an integer. OCaml uses 63-bit tagged integers (on 64-bit machines) with the least-significant bit is 1. Given a machine word or a register holding an OCaml integer, the integer value is obtained by right shifting the value by 1.</p>
<p>Putting that all together, we can examine the arguments to <code>fib</code> at the breakpoint in <code>fib</code> like so.</p>
<pre class="shell"><code>(lldb) p $x0 &gt;&gt; 1
(unsigned long) 5</code></pre>
<p>Given we have a recursive fib function this printing corresponds to <code>fib(5)</code>. Have a go at moving up and down the recursive fib calls using <code>up</code> or <code>down</code> and print out the arguments. You can also examine the evaluation order of arguments in <code>fib</code>, noting that the evaluation order of arguments in OCaml is unspecified but 5.2.0 evaluates right-to-left.</p>
<h2 id="advanced-printing">Advanced printing</h2>
<p>Examining values using bit shifting is tedious. We can do better by writing our own printing functions in Python. The OCaml compiler distribution comes with some scripts to make examining OCaml values in LLDB easier. Note they have historically been used by OCaml maintainers to develop the compiler, so they might be a little rough or missing features (PRs to improve this situation are welcome). With that lets see what we can do.</p>
<p>Since we are using OCaml 5.2.0, we need to get that source code.</p>
<pre class="shell"><code># I&#39;m working within ~/projects directory on my machine
$ git clone https://github.com/ocaml/ocaml --branch 5.2.0</code></pre>
<p>Startup a new lldb session, load the lldb script, and get to a breakpoint in the recursive fib calls</p>
<pre class="shell"><code>lldb ./fib.exe
(lldb) command script import ../ocaml/tools/lldb.py
(lldb) br s -f fib.ml -l 1
Process 12014 launched: &#39;/home/tsmc/projects/fib.exe&#39; (aarch64)
Process 12014 stopped
* thread #1, name = &#39;fib.exe&#39;, stop reason = breakpoint 4.1
    frame #0: 0x0000aaaaaaaf1038 fib.exe`camlFib.fib_270 at fib.ml:2
   1   	(* fib.ml *)
-&gt; 2   	let rec fib n =
   3   	  if n = 0 then 0
   4   	  else if n = 1 then 1
   5   	  else fib (n-1) + fib (n-2)
   6   	
   7   	let main () =
</code></pre>
<p>As earlier, the first argument is in <code>x0</code> register. We can examine the value now with the python script.</p>
<pre class="shell"><code>(lldb) p (value)$x0
(value) 41 caml:20</code></pre>
<p><code>value</code> is the type of OCaml values defined in the OCaml runtime. The script <code>tools/lldb.py</code> installs a pretty printer for the values of type <code>value</code>. Here is pretty prints the first argument which is <code>20</code></p>
<p>We can also print other kinds of OCaml values. Create this file with some interesting OCaml values:</p>
<pre class="shell"><code>$ cat test_blocks.ml
(* test_blocks.ml *)

type t = {s : string; i : int}

let main a b =
  print_endline &quot;Hello, world!&quot;;
  print_endline a;
  print_endline b.s

let _ = main &quot;foo&quot; {s = &quot;bar&quot;; i = 42}</code></pre>
<p>Now we need to compile it, start an lldb session and break on the main function.</p>
<pre class="shell"><code>$ ocamlopt -g -o test_blocks.exe test_blocks.ml
$ lldb ./test_blocks.exe
(lldb) target create &quot;./test_blocks.exe&quot;
Current executable set to &#39;/home/tsmc/projects/test_blocks.exe&#39; (aarch64).
(lldb) command script import ../ocaml/tools/lldb.py
OCaml support module loaded. Values of type &#39;value&#39; will now
print as OCaml values, and an &#39;ocaml&#39; command is available for
heap exploration (see &#39;help ocaml&#39; for more information).
(lldb) br s -n camlTest_blocks.main_273
Breakpoint 1: where = test_blocks.exe`camlTest_blocks.main_273 + 40, address = 0x0000000000019ab0
(lldb) run
Process 12043 launched: &#39;/home/tsmc/projects/test_blocks.exe&#39; (aarch64)
Process 12043 stopped
* thread #1, name = &#39;test_blocks.exe&#39;, stop reason = breakpoint 1.1
    frame #0: 0x0000aaaaaaab9ab0 test_blocks.exe`camlTest_blocks.main_273 at test_blocks.ml:4
   1   	type t = {s : string; i : int}
   2   	
   3   	let main a b =
-&gt; 4   	  print_endline &quot;Hello, world!&quot;;
   5   	  print_endline a;
   6   	  print_endline b.s
   7   	
(lldb)</code></pre>
<p>Let’s examine the two arguments to main</p>
<pre class="shell"><code>(lldb) p (value)$x0
(value) 187649984891864 caml(-):&#39;Hello, world!&#39;&lt;13&gt;
(lldb) p (value)$x1
(value) 187649984891808 caml(-):(&#39;bar&#39;, 42)</code></pre>
<p>What is going on here, didn’t we say the first argument is in <code>x0</code>? What has happened here is our breakpoint has been set a little after we have entered the function and the original value for <code>x0</code> has been stored on the stack and <code>x0</code> register has been reused to store arguments to <code>print_endline "Hello, world!";</code>. The second argument in <code>x1</code> is as expected.</p>
<p>To find the original <code>x0</code> value we need to look at assembly (don’t worry too much about the specifics of ARM assembly).</p>
<pre class="shell"><code>(lldb) dis
test_blocks.exe`camlTest_blocks.main_273:
    0xaaaaaaab9a88 &lt;+0&gt;:  ldr    x16, [x28, #0x28]
    0xaaaaaaab9a8c &lt;+4&gt;:  add    x16, x16, #0x158
    0xaaaaaaab9a90 &lt;+8&gt;:  cmp    sp, x16
    0xaaaaaaab9a94 &lt;+12&gt;: b.lo   0xaaaaaaab9a78 ; camlStd_exit.code_end
    0xaaaaaaab9a98 &lt;+16&gt;: sub    sp, sp, #0x20
    0xaaaaaaab9a9c &lt;+20&gt;: str    x30, [sp, #0x18]
    0xaaaaaaab9aa0 &lt;+24&gt;: str    x0, [sp]
    0xaaaaaaab9aa4 &lt;+28&gt;: str    x1, [sp, #0x8]
(lldb) reg r sp
      sp = 0x0000aaaaaab3d160
(lldb) memory read -s8 -fx -l2 0x0000aaaaaab3d160
0xaaaaaab3d160: 0x0000aaaaaab10bc8 0x0000aaaaaab10ba0
0xaaaaaab3d170: 0x0000fffffffff8e0 0x0000aaaaaaab9b38
0xaaaaaab3d180: 0x0000000000000000 0x0000aaaaaaab94fc
0xaaaaaab3d190: 0x0000000000000000 0x0000aaaaaaae05c8
(lldb) p (value)0x0000aaaaaab10bc8
(value) 187649984891848 caml(-):&#39;foo&#39;&lt;3&gt;</code></pre>
<p>The disassembled code is the function prologue code, which is saving <code>x0</code> onto the stack using <code>str x0, [sp]</code>. To get the original value for <code>x0</code> we read sp (Stack Pointer), retrieve the data at that address and then print it using <code>value</code>. We get back to our argument passed to main, which was <code>foo</code> and can confirm that by looking at the source code.</p>
<h2 id="extras">Extras</h2>
<p>A few useful extras for debugging OCaml programs.</p>
<p>You can set breakpoints based on addresses, this is useful when you know a specific instruction you want to break on. From the previous session, set a breakpoint on the <code>sub sp, sp, #0x20</code> address.</p>
<pre class="shell"><code>(lldb) br s -a 0xaaaaaaab9a98
Breakpoint 7: where = test_blocks.exe`camlTest_blocks.main_273 + 16, address = 0x0000aaaaaaab9a98
(lldb) run
There is a running process, kill it and restart?: [Y/n] y
Process 12070 exited with status = 9 (0x00000009) killed
Process 12078 launched: &#39;/home/tsmc/projects/test_blocks.exe&#39; (aarch64)
Process 12078 stopped
* thread #1, name = &#39;test_blocks.exe&#39;, stop reason = breakpoint 7.1
    frame #0: 0x0000aaaaaaab9a98 test_blocks.exe`camlTest_blocks.main_273 at test_blocks.ml:3
   1   	type t = {s : string; i : int}
   2   	
-&gt; 3   	let main a b =
   4   	  print_endline &quot;Hello, world!&quot;;
   5   	  print_endline a;
   6   	  print_endline b.s
   7   	
(lldb) p (value)$x0
(value) 187649984891848 caml(-):&#39;foo&#39;&lt;3&gt;</code></pre>
<p>Now we can print out the value of <code>x0</code> before it gets saved on the stack.</p>
<p>We can also lookup symbols in the executable using <code>image lookup -r -n &lt;symbol_name&gt;</code> if we are not sure of the specific name we want.</p>
<pre class="shell"><code>(lldb) image lookup -r -n camlTest
4 matches found in /home/tsmc/projects/test_blocks.exe:
        Address: test_blocks.exe[0x0000000000019a78] (test_blocks.exe.PT_LOAD[0]..text + 1912)
        Summary: test_blocks.exe`camlStd_exit.code_end
        Address: test_blocks.exe[0x0000000000019b48] (test_blocks.exe.PT_LOAD[0]..text + 2120)
        Summary: test_blocks.exe`camlTest_blocks.code_end
        Address: test_blocks.exe[0x0000000000019ae0] (test_blocks.exe.PT_LOAD[0]..text + 2016)
        Summary: test_blocks.exe`camlTest_blocks.entry
        Address: test_blocks.exe[0x0000000000019a88] (test_blocks.exe.PT_LOAD[0]..text + 1928)
        Summary: test_blocks.exe`camlTest_blocks.main_273</code></pre>
<p>Finally setting breakpoints on macOS with LLDB is slightly broken so you need to lookup the symbol name and then set the breakpoint based on the address of the symbol. We can combine <code>image lookup</code> with setting breakpoints on addresses to debug on macOS.</p>
<h2 id="more-for-later">More for later</h2>
<p>There is a lot more to say about debugging OCaml programs using LLDB and there is ongoing work to improve debugger support in OCaml. Get in touch if you would like to be involved.</p>
</div>
]]></description>
    <pubDate>Sat, 03 Aug 2024 00:00:00 UT</pubDate>
    <guid>https://lambdafoo.com/posts/2024-08-03-lldb-ocaml.html</guid>
    <dc:creator>Tim McGilchrist</dc:creator>
</item>
<item>
    <title>Debugging OCaml with Emacs</title>
    <link>https://lambdafoo.com/posts/2024-03-25-ocaml-debugging-with-emacs.html</link>
    <description><![CDATA[<div class="post">
  <h1 class="post-title">Debugging OCaml with Emacs</h1>
  <span class="post-date">March 25, 2024</span>
  <p>This post started as a summary of my March Hacking Days effort at <a href="https://tarides.com">Tarides</a>.</p>
<p>I have been working on improving the debugging situation for OCaml and wanted to see how easily I could setup debug support in Emacs using DAP. Debug Adapter Protocol (DAP) is a wire protocol for communicating between an editor or IDE and a debug server like <a href="https://lldb.llvm.org">LLDB</a> or <a href="https://sourceware.org/gdb/">GDB</a>, providing an abstraction over debugging, similar to how <a href="https://microsoft.github.io/language-server-protocol/">Language Server Protocol (LSP)</a> provides language support for editors.</p>
<p>OCaml comes with support for debugging native programs with GDB and LLDB, and bytecode code programs using <a href="https://v2.ocaml.org/manual/debugger.html">ocamldebug</a> and <a href="https://github.com/hackwaly/ocamlearlybird">earlybird</a>. In this post we will cover setting up and debugging both kinds of programs. I am using an M3 Mac so all examples will show ARM64 assembly and macOS specific paths. The same setup should work on Linux. I use <a href="https://github.com/bbatsov/prelude">prelude</a> to configure my Emacs with my own customistations in <code>.emacs/personal</code>, adjust for your own personal Emacs setup.</p>
<p>Let’s start with the following program to compute Fibonacci sequence:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co">(* fib.ml *)</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> <span class="kw">rec</span> fib n =</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>  <span class="kw">if</span> n = <span class="dv">0</span> <span class="kw">then</span> <span class="dv">0</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>  <span class="kw">else</span> <span class="kw">if</span> n = <span class="dv">1</span> <span class="kw">then</span> <span class="dv">1</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>  <span class="kw">else</span> fib (n<span class="dv">-1</span>) + fib (n<span class="dv">-2</span>)</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> main () =</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>  <span class="kw">let</span> r = fib <span class="dv">20</span> <span class="kw">in</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>  <span class="dt">Printf</span>.printf <span class="st">&quot;fib(20) = %d&quot;</span> r</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> _ = main ()</span></code></pre></div>
<p>And this <code>dune</code> configuration in the same directory:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>(executable</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> (name fib)</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> (modules fib)</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> (modes exe byte))</span></code></pre></div>
<p>And this <code>dune-project</code> configuration in the same directory:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>(lang dune <span class="fl">3.11</span>)</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>(map_workspace_root <span class="kw">false</span>)</span></code></pre></div>
<p>Create an empty <code>opam</code> switch in same directory and install dune:</p>
<pre class="shell"><code>$ opam switch create . 5.1.1 --no-install
$ opam install dune</code></pre>
<p>This gives us everything we need to try out all the different debuggers.</p>
<h2 id="emacs-configuration">Emacs configuration</h2>
<p>Emacs has <a href="https://github.com/emacs-lsp/dap-mode">dap-mode</a> that provides everything we need for DAP integration. Install it using <code>M-x package-install</code> and choose the <code>dap-mode</code> package. I have the following lines in my <code>.emacs/personal/init.el</code> that will require the packages we need and setup some convenient key bindings:</p>
<pre class="emacs-lisp"><code>; Require dap-mode plus the two extra files we need
(require &#39;dap-mode)
(require &#39;dap-codelldb)
(require &#39;dap-ocaml)

; Setup key bindings using use-package.
(use-package dap-mode
  :bind ((&quot;C-c M-n&quot; . dap-next)
         (&quot;C-c M-s&quot; . dap-step-in)
         (&quot;C-c M-a&quot; . dap-step-out)
         (&quot;C-c M-w&quot; . dap-continue)))</code></pre>
<p>Save and restart Emacs, then we can move onto setting up bytecode debugging.</p>
<h2 id="bytecode-debugging">Bytecode debugging</h2>
<p>The <a href="https://github.com/hackwaly/ocamlearlybird">earlybird</a> project provides DAP support for debugging OCaml bytecode. OCaml has a bytecode compiler that produces portable bytecode executables which can be run with <code>ocamlrun</code>, the interpreter for OCaml bytecode. Earlybird uses the (undocumented) protocol of <code>ocamldebug</code> to communicate with a bytecode executable, inheriting the same <a href="https://v2.ocaml.org/manual/debugger.html">functionality as ocamldebug</a>.</p>
<p>Start by installing the <code>earlybird</code> package:</p>
<pre class="shell"><code>opam install earlybird</code></pre>
<p>Then create a file in <code>.vscode/launch.json</code> with this configuration:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode json"><code class="sourceCode json"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&quot;version&quot;</span><span class="fu">:</span> <span class="st">&quot;0.2.0&quot;</span><span class="fu">,</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&quot;configurations&quot;</span><span class="fu">:</span> <span class="ot">[</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>        <span class="fu">{</span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;name&quot;</span><span class="fu">:</span> <span class="st">&quot;OCaml earlybird (experimental)&quot;</span><span class="fu">,</span></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;type&quot;</span><span class="fu">:</span> <span class="st">&quot;ocaml.earlybird&quot;</span><span class="fu">,</span></span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;request&quot;</span><span class="fu">:</span> <span class="st">&quot;launch&quot;</span><span class="fu">,</span></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;program&quot;</span><span class="fu">:</span> <span class="st">&quot;./_build/default/fib.bc&quot;</span><span class="fu">,</span></span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;stopOnEntry&quot;</span><span class="fu">:</span> <span class="kw">true</span><span class="fu">,</span></span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;cwd&quot;</span><span class="fu">:</span> <span class="st">&quot;${workspaceFolder}&quot;</span></span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a>        <span class="fu">}</span><span class="ot">,</span></span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a>    <span class="ot">]</span></span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a><span class="fu">}</span></span></code></pre></div>
<p>Build the project with <code>dune build</code> to create the <code>fib.bc</code> bytecode file. Finally start a debugger by running <code>M-x dap-debug</code>. It will prompt you to choose a session, we want <code>OCaml earlybird (experimental)</code> from the named configuration above. It will start earlybird and immediately stop it before executing any OCaml code.</p>
<figure>
<img src="/images/earlybird-dap-template.png" alt="Starting earlybird from Emacs" />
<figcaption aria-hidden="true">Starting earlybird from Emacs</figcaption>
</figure>
<p>To set breakpoints you need to open the OCaml source file in <code>_build/default/fib.ml</code> and click on the source lines you want to stop at. Here is what it looks like after a few recursions. Use the buttons to control the debugger or use the keybindings we added to step through the execution. Curiously they are not pre-defined but here I’ve tried to reuse mappings from <a href="https://v2.ocaml.org/manual/debugger.html#s:inf-debugger">ocamldebug</a>.</p>
<figure>
<img src="/images/earlybird-dap-startup.png" alt="Running earlybird through fibonacci" />
<figcaption aria-hidden="true">Running earlybird through fibonacci</figcaption>
</figure>
<h2 id="native-debugging">Native debugging</h2>
<p>OCaml can also produce native binaries that can be debugged using native debuggers like GDB or LLDB, depending on your platform. Here we will use LLDB on macOS, but Linux LLDB works too – just change the name of the program you want to debug.</p>
<p>Add another section to <code>.vscode/launch.json</code> for starting lldb.</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode json"><code class="sourceCode json"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>        <span class="fu">{</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;type&quot;</span><span class="fu">:</span> <span class="st">&quot;lldb&quot;</span><span class="fu">,</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;request&quot;</span><span class="fu">:</span> <span class="st">&quot;launch&quot;</span><span class="fu">,</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;name&quot;</span><span class="fu">:</span> <span class="st">&quot;LLDB with ocamlopt&quot;</span><span class="fu">,</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;program&quot;</span><span class="fu">:</span> <span class="st">&quot;./fib.exe&quot;</span><span class="fu">,</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;args&quot;</span><span class="fu">:</span> <span class="ot">[]</span><span class="fu">,</span></span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;stopOnEntry&quot;</span><span class="fu">:</span> <span class="kw">true</span><span class="fu">,</span></span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;cwd&quot;</span><span class="fu">:</span> <span class="st">&quot;${workspaceFolder}&quot;</span></span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a>        <span class="fu">}</span><span class="er">,</span></span></code></pre></div>
<p>Run <code>M-x dap-codelldb-setup</code> which will download the <code>codelldb</code> DAP program that we are using to communicate with LLDB. This gets installed into <code>.extension/vscode/codelldb</code>. Now compile the fib program with <code>ocamlopt -g -o fib.exe fib.ml</code> and startup a debugger session with <code>M-x dap-debug</code> choose the <code>LLDB with ocamlopt</code> option. You should see something similar to:</p>
<figure>
<img src="/images/lldb-dap-startup.png" alt="codelldb dap startup" />
<figcaption aria-hidden="true">codelldb dap startup</figcaption>
</figure>
<p>Now DAP as setup with LLDB and macOS, is a little broken and is missing support for setting breakpoints on symbols and line numbers in source code. Fixes for both will be comming soon. Linux LLDB works better in this scenario. Setting breakpoints using line numbers in source code requires fixes to the OCaml compiler, while setting breakpoints on symbols is supported in <code>codelldb</code> but not exposed into <code>dap-mode</code>.</p>
<p>The second option is debugging native binaries built with Dune, this is slightly different for two reasons. First Dune places the executable into <code>_build/default/fib.exe</code> and second Dune produces slightly different symbols. Start by adding a new section in <code>.vscode/launch.json</code> for Dune built executables:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode json"><code class="sourceCode json"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>        <span class="fu">{</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;type&quot;</span><span class="fu">:</span> <span class="st">&quot;lldb&quot;</span><span class="fu">,</span></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;request&quot;</span><span class="fu">:</span> <span class="st">&quot;launch&quot;</span><span class="fu">,</span></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;name&quot;</span><span class="fu">:</span> <span class="st">&quot;LLDB with Dune&quot;</span><span class="fu">,</span></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;program&quot;</span><span class="fu">:</span> <span class="st">&quot;./_build/default/fib.exe&quot;</span><span class="fu">,</span></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;args&quot;</span><span class="fu">:</span> <span class="ot">[]</span><span class="fu">,</span></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;stopOnEntry&quot;</span><span class="fu">:</span> <span class="kw">true</span><span class="fu">,</span></span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;cwd&quot;</span><span class="fu">:</span> <span class="st">&quot;${workspaceFolder}&quot;</span></span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a>        <span class="fu">}</span><span class="er">,</span></span></code></pre></div>
<p>Remove the old <code>fib.exe</code> in the project directory (dune will complain if you don’t) and run <code>dune build</code>. Startup a new DAP session with <code>M-x dap-debug</code> and choose <code>LLDB with Dune</code>. You should see the same debugger session as before.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Debugging OCaml with DAP inside Emacs is possible. There are working options for both bytecode programs and native programs which work reasonably well.</p>
<p>Use <code>dap-mode</code> with:</p>
<pre class="emacs-lisp"><code>(require &#39;dap-mode)
(require &#39;dap-codelldb)
(require &#39;dap-ocaml)

(use-package dap-mode
  :bind ((&quot;C-c M-n&quot; . dap-next)
         (&quot;C-c M-s&quot; . dap-step-in)
         (&quot;C-c M-a&quot; . dap-step-out)
         (&quot;C-c M-w&quot; . dap-continue)))
</code></pre>
<p>and a <code>launch.json</code> of</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode json"><code class="sourceCode json"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&quot;version&quot;</span><span class="fu">:</span> <span class="st">&quot;0.2.0&quot;</span><span class="fu">,</span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&quot;configurations&quot;</span><span class="fu">:</span> <span class="ot">[</span></span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a>        <span class="fu">{</span></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;name&quot;</span><span class="fu">:</span> <span class="st">&quot;OCaml earlybird (experimental)&quot;</span><span class="fu">,</span></span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;type&quot;</span><span class="fu">:</span> <span class="st">&quot;ocaml.earlybird&quot;</span><span class="fu">,</span></span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;request&quot;</span><span class="fu">:</span> <span class="st">&quot;launch&quot;</span><span class="fu">,</span></span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;program&quot;</span><span class="fu">:</span> <span class="st">&quot;./_build/default/fib.bc&quot;</span><span class="fu">,</span></span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;stopOnEntry&quot;</span><span class="fu">:</span> <span class="kw">true</span><span class="fu">,</span></span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;cwd&quot;</span><span class="fu">:</span> <span class="st">&quot;${workspaceFolder}&quot;</span></span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a>        <span class="fu">}</span><span class="ot">,</span></span>
<span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a>        <span class="fu">{</span></span>
<span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;type&quot;</span><span class="fu">:</span> <span class="st">&quot;lldb&quot;</span><span class="fu">,</span></span>
<span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;request&quot;</span><span class="fu">:</span> <span class="st">&quot;launch&quot;</span><span class="fu">,</span></span>
<span id="cb11-15"><a href="#cb11-15" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;name&quot;</span><span class="fu">:</span> <span class="st">&quot;LLDB with Dune&quot;</span><span class="fu">,</span></span>
<span id="cb11-16"><a href="#cb11-16" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;program&quot;</span><span class="fu">:</span> <span class="st">&quot;./_build/default/fib.exe&quot;</span><span class="fu">,</span></span>
<span id="cb11-17"><a href="#cb11-17" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;args&quot;</span><span class="fu">:</span> <span class="ot">[]</span><span class="fu">,</span></span>
<span id="cb11-18"><a href="#cb11-18" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;stopOnEntry&quot;</span><span class="fu">:</span> <span class="kw">true</span><span class="fu">,</span></span>
<span id="cb11-19"><a href="#cb11-19" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;cwd&quot;</span><span class="fu">:</span> <span class="st">&quot;${workspaceFolder}&quot;</span></span>
<span id="cb11-20"><a href="#cb11-20" aria-hidden="true" tabindex="-1"></a>        <span class="fu">}</span><span class="ot">,</span></span>
<span id="cb11-21"><a href="#cb11-21" aria-hidden="true" tabindex="-1"></a>        <span class="fu">{</span></span>
<span id="cb11-22"><a href="#cb11-22" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;type&quot;</span><span class="fu">:</span> <span class="st">&quot;lldb&quot;</span><span class="fu">,</span></span>
<span id="cb11-23"><a href="#cb11-23" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;request&quot;</span><span class="fu">:</span> <span class="st">&quot;launch&quot;</span><span class="fu">,</span></span>
<span id="cb11-24"><a href="#cb11-24" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;name&quot;</span><span class="fu">:</span> <span class="st">&quot;LLDB with ocamlopt&quot;</span><span class="fu">,</span></span>
<span id="cb11-25"><a href="#cb11-25" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;program&quot;</span><span class="fu">:</span> <span class="st">&quot;./fib.exe&quot;</span><span class="fu">,</span></span>
<span id="cb11-26"><a href="#cb11-26" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;args&quot;</span><span class="fu">:</span> <span class="ot">[]</span><span class="fu">,</span></span>
<span id="cb11-27"><a href="#cb11-27" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;stopOnEntry&quot;</span><span class="fu">:</span> <span class="kw">true</span><span class="fu">,</span></span>
<span id="cb11-28"><a href="#cb11-28" aria-hidden="true" tabindex="-1"></a>            <span class="dt">&quot;cwd&quot;</span><span class="fu">:</span> <span class="st">&quot;${workspaceFolder}&quot;</span></span>
<span id="cb11-29"><a href="#cb11-29" aria-hidden="true" tabindex="-1"></a>        <span class="fu">}</span></span>
<span id="cb11-30"><a href="#cb11-30" aria-hidden="true" tabindex="-1"></a>    <span class="ot">]</span></span>
<span id="cb11-31"><a href="#cb11-31" aria-hidden="true" tabindex="-1"></a><span class="fu">}</span></span></code></pre></div>
<p>The same setup will work under VSCode with the <code>CodeLLDB</code> and <code>OCaml Platform</code> extensions installed. Happy Emacs debugging.</p>
<h2 id="future-work">Future Work</h2>
<p>I’m working on improving the OCaml debugging experience on macOS and Linux. Currently the macOS LLDB experience is behind that on Linux LLDB, so that is the first goal. Then I want to improve the DWARF encodings for OCaml and generally improve the native debugger experience.</p>
</div>
]]></description>
    <pubDate>Mon, 25 Mar 2024 00:00:00 UT</pubDate>
    <guid>https://lambdafoo.com/posts/2024-03-25-ocaml-debugging-with-emacs.html</guid>
    <dc:creator>Tim McGilchrist</dc:creator>
</item>
<item>
    <title>ICFP 2022 Review</title>
    <link>https://lambdafoo.com/posts/2022-10-11-icfp-2022-review.html</link>
    <description><![CDATA[<div class="post">
  <h1 class="post-title">ICFP 2022 Review</h1>
  <span class="post-date">October 11, 2022</span>
  <p>I wrote up a <a href="https://tarides.com/blog/2022-10-10-icfp-2022-review">highlights of ICFP 2022</a> for the Tarides blog. It was great to get back to in-person conferences again and getting the chance to meet people. Thanks to my employer Tarides for covering the cost.</p>
<p>For me personally the OCaml Workshop was fantastic beginning to end, read the blog post for the full details.
Outside of OCaml I spent time in the Haskell Implementors Workshop, hearing about the new features for GHC and excited by the progress that Cabal is making.</p>
<p>My take away research topics are:</p>
<ul>
<li>Typed Effect Systems especially <a href="https://github.com/koka-lang/koka">Koka</a>.</li>
<li>Delimited Continuations for both OCaml and Haskell.</li>
<li>Lockfree data structures, <a href="https://github.com/ocaml-multicore/reagents">Reagents</a> and <a href="https://hackage.haskell.org/package/stm">STM</a>.</li>
</ul>
</div>
]]></description>
    <pubDate>Tue, 11 Oct 2022 00:00:00 UT</pubDate>
    <guid>https://lambdafoo.com/posts/2022-10-11-icfp-2022-review.html</guid>
    <dc:creator>Tim McGilchrist</dc:creator>
</item>
<item>
    <title>OCaml with Emacs in 2022</title>
    <link>https://lambdafoo.com/posts/2022-09-07-ocaml-with-emacs-2022.html</link>
    <description><![CDATA[<div class="post">
  <h1 class="post-title">OCaml with Emacs in 2022</h1>
  <span class="post-date">September  7, 2022</span>
  <p>I am revisiting my <a href="https://lambdafoo.com/posts/2021-10-29-getting-started-with-ocaml.html">OCaml setup post from 2021</a> because I needed to setup a new macOS machine. The official OCaml site points newcomers to <a href="https://ocaml.org/docs/up-and-running#editor-support-for-ocaml">Visual Studio Code</a> which is a fine choice to get started. However I am using <a href="https://www.gnu.org/s/emacs/">Emacs</a> and have done so for over 20 years, and did not find a good description of how to set things up with Emacs. Here I could digress into why Emacs but I will just strongly encourage any developers to invest heavily in learning their editor with Emacs being a fine choice.</p>
<h2 id="beginnings">Beginnings</h2>
<p>On macOS I use the pre-compiled GUI version of Emacs from <a href="https://emacsformacosx.com">emacsformacosx</a> preferring that over compiling it by hand or using the version in <a href="https://brew.sh">homebrew</a>. Both of which I have done previously but find the emacsformacos version saves me time and effort, plus the GUI version was removed from homebrew at some point in the past.</p>
<p>Next I choose to use an Emacs distro over the base Emacs setup, again this is a time saving choice and especially useful if you are new to Emacs. Use <a href="https://github.com/bbatsov/prelude">Prelude</a>, which is an enhanced Emacs 25.1+ distribution that should make your experience with Emacs both more pleasant and more powerful. It gives a great modern setup for Emacs with minimal fuss. Once that is cloned and installed the Lisp config begins.</p>
<h2 id="prelude-onfiguration">Prelude onfiguration</h2>
<p>Prelude provides a base experience of packages available with some configuration. The configuration goes into <code>~/.emacs.d/tsmc/prelude-modules.el</code> where <code>tsmc</code> is your macOS user. The same path would apply for Linux. A sample prelude-modules.ml is provided in https://github.com/bbatsov/prelude/blob/master/sample/prelude-modules.el</p>
<p>I choose the following modules to enable with <code>prelude-lsp</code> and <code>prelude-ocaml</code> being the core OCaml related choices. The other bits are optional but useful for editing lisp or navigating code.</p>
<pre class="emacs-lisp"><code>(require &#39;prelude-ivy) ;; A mighty modern alternative to ido
(require &#39;prelude-company)
(require &#39;prelude-emacs-lisp)
(require &#39;prelude-lisp) ;; Common setup for Lisp-like languages
(require &#39;prelude-lsp) ;; Base setup for the Language Server Protocol
(require &#39;prelude-ocaml)</code></pre>
<p>Now for the customisation to get LSP working properly.
There are 3 main pieces:</p>
<ul>
<li>direnv - for automatically configuring shell environments</li>
<li>ocaml-lsp-server - the core lsp implementation for OCaml</li>
<li>lsp-mode - the Emacs mode that drives everything</li>
</ul>
<h2 id="direnv-the-necessary-magic">direnv the necessary magic</h2>
<p>direnv is a small program to load/unload environment variables based on $PWD (current working directory). This program ensures that when you open an OCaml file the correct opam switch is chosen and the tools installed in that switch are made available to Emacs. Opam is the <a href="https://opam.ocaml.org">OCaml package manager</a> and manages local sandboxes of packages called switches. Without direnv Emacs will not find the correct tools and you would need to mess with Emacs PATHS to get it right. I have done that and it is much simplier with direnv.</p>
<p>So <code>brew install direnv</code> and create a <code>.envrc</code> file in an OCaml project with <code>eval $(opam env --set-switch)</code> inside. Compared to my previous post I have been using local opam switches which exist inside an OCaml project. They are created as <code>opam switch create . 4.14.0 --with-test --deps-only -y</code> and appear as an <code>_opam</code> directory in the project root. Next run <code>direnv allow</code> to tell direnv it is safe to use the <code>.envrc</code> file in this directory. The reason I have switched is I often need to test different OCaml versions so removing the <code>_opam</code> directory and recreating it is the simplier option.</p>
<h2 id="ocaml-lsp-server">OCaml LSP Server</h2>
<p>OCaml LSP server needs to be installed in the current switch so run <code>opam update &amp;&amp; opam install ocaml-lsp-server -y</code>, this will make ocaml-lsp-server available to Emacs via direnv.</p>
<p>There is an opportunity here to use Emacs Lisp to install <code>ocaml-lsp-server</code> if it was missing or to allow lsp-mode to download and install it itself. I would like to have this working in future. Next back into Lisp.</p>
<h2 id="emacs-lsp-mode">Emacs LSP mode</h2>
<p>Create a file init.el in <code>~/.emacs.d/tsmc/</code> substituting your Unix user name for <code>tsmc</code>. Thanks to emacs-prelude the configuration is very small.</p>
<pre class="emacs-lisp"><code>;;; init.el --- @tsmc configuration entry point.

(prelude-require-packages &#39;(use-package direnv))
;; Use direnv to select the correct opam switch and set the path
;; that Emacs will use to run commands like ocamllsp, merlin or dune build.

(use-package lsp-mode
  :hook
  (tuareg-mode . lsp))
;; Attach lsp hook to modes that require it, here we bind to tuareg-mode rather than
;; prelude-ocaml. For unknown reasons the latter does not bind properly and does not
;; start lsp-mode

(provide &#39;tsmc)
;;; init.el ends here</code></pre>
<p>We require a few packages <code>use-package</code> and <code>direnv</code>, and then tell Emacs to start lsp-mode when <code>tuareg-mode</code> is started. Tuareg-mode is one of the OCaml modes available for Emacs, the other being <code>caml-mode</code> which I have not really used. Now quit and restart Emacs. Opening an ml file inside the project you started earlier and ocaml-lsp should startup.</p>
<p>The types for expressions and modules will display on mouse hover or beside the definition. Hovering the mouse over a function or type will display the type plus the documentation comments for it. A successful <code>dune build</code> for the project is required to generate the data used by ocaml-lsp-server. At this point in time <code>prelude</code> relies on <code>merlin</code> an assistant for editing OCaml code, that is used by <code>ocaml-lsp-server</code> internally but also available as standalone tool. So I often have both installed, <code>opam install merlin</code> should be enough to get it installed too.</p>
<p>At this point I am mostly happy, the types and documentation displays as required. Navigating using <code>M-.</code> shows a preview of the type / function under point and return will take me to the definition. This is vastly improved in OCaml 4.14 (with the work on Shapes) which I have switched to for everything I can. Switching between ml and mli files is <code>C-c C-a</code> and more, simply visit the <code>M-x describe-mode</code> to show everything available.</p>
<p>The annoyances are more fundamental to how LSP wants to work. It uses what I am calling a push based interaction, where it generates the information for types and documentation in the background and pushes it into the Emacs buffer. You never need to ask what is the type, it will display for you. Sometimes I want to ask for what a type is inside an expression, with LSP you are encouraged to mouse hover over something rather than having a key binding for it. So far I haven’t found the lisp function that drives the hover functionality but when I do I will bind it to a key. The second issue is also around mouse usage to drive LSP functionality like rename or annotate types. I would strongly prefer a key chord driven approach to that. Again I will set this up once I find the right lsp functions. For now I use <code>C-c C-t</code> from merlin to summon the types for things.</p>
<p>Overall the experience is solid. Types and docs appear as required. Navigation works. The speed has been good so far. LSP mode is less janky than it was 1 year ago.</p>
<h2 id="alternatives">Alternatives</h2>
<p>There is a fine alternative LSP mode, <a href="https://github.com/joaotavora/eglot">Eglot</a> for Emacs. It takes a more minimal approach and uses a pull based interaction. Where you ask for the information based on key bindings vs the information being pushed at you via UI elements. For example, the type of a function is requested rather than shown by default.</p>
<p>The corresponding configuration I was using previously is:</p>
<pre class="emacs-lisp"><code>(use-package eglot
  :config
  (define-key eglot-mode-map
    (kbd &quot;C-c C-t&quot;) #&#39;eldoc-print-current-symbol-info)

  :hook
  ((tuareg-mode . eglot-ensure)))</code></pre>
<p>Again using <code>use-package</code> to configure the mode, the hooks are triggering Eglot to be loaded when <code>tuareg-mode</code> is. Using the <code>eglot-ensure</code> function which starts an Eglot session for current buffer if there isn’t one. No further configuration is needed in Emacs as Eglot knows the LSP server is called <code>ocamllsp</code> and will look for it on the Unix PATH.</p>
<h2 id="summary">Summary</h2>
<p>Getting started with OCaml using Emacs can be a struggle. Emacs is a fine editor but the documentation can be difficult to handle. Hopefully following through this setup will yield a working Emacs / LSP setup for OCaml.</p>
<p>In future I want to try binding more things to keys so I use the mouse less and streamline the installing of the ocaml lsp server. Then after that adding support for more interesting code interactions like extracting modules or hoisting let bindings would be nice to have. Happy hacking!</p>
</div>
]]></description>
    <pubDate>Wed, 07 Sep 2022 00:00:00 UT</pubDate>
    <guid>https://lambdafoo.com/posts/2022-09-07-ocaml-with-emacs-2022.html</guid>
    <dc:creator>Tim McGilchrist</dc:creator>
</item>
<item>
    <title>Getting Started with OCaml in 2021</title>
    <link>https://lambdafoo.com/posts/2021-10-29-getting-started-with-ocaml.html</link>
    <description><![CDATA[<div class="post">
  <h1 class="post-title">Getting Started with OCaml in 2021</h1>
  <span class="post-date">October 29, 2021</span>
  <p>OCaml is an awesome language with many fine features. I enjoy using it immensely!</p>
<p>Unfortunately, it suffers from a perceived weakness in how to get started. Like any new skill, there can be a learning curve. The tools are all there, but combining them for a good developer experience might seem difficult at first.</p>
<p>Often I’ve found that the barrier for getting into a new langauge is less about the
new features of that language and more about learning the tools to become productive in that
language. The package managers, build tools, and editor integration of a new language can be confusing, making for an awful experience.</p>
<p>Perhaps my opinionated guide to getting started with OCaml in 2021 will help reduce any mental blocks against trying out this excellent language.</p>
<h2 id="install-opam">Install Opam</h2>
<p>First it’s necessary to install OCaml and Opam.
<a href="https://opam.ocaml.org">Opam</a> is the default package manager for OCaml projects.
Ignore the other options for now, once you know more about what you want, you can make
an informed choice. For now if you speak OPAM, you’ll get the most out of the community.</p>
<p>On Linux, use your local package manger, e.g., <code>apt-get install opam</code> for Debian and <code>apt install opam</code>
for Ubuntu. For MacOS, use homebrew <code>brew install opam</code>. I’ll assume if you run
something else, you can handle looking up <a href="https://opam.ocaml.org/doc/Install.html#Using-your-distribution-39-s-package-system">how to install things</a>.</p>
<p>On my Mac I get Opam 2.1.0:</p>
<pre class="shell"><code>$ opam --version
2.1.0</code></pre>
<p>Once you’ve got Opam installed, you should be able to move on to the next step.</p>
<h2 id="choose-an-ocaml-version">Choose an OCaml Version</h2>
<p>I strongly recommended that you pick a single OCaml version that your project will compile against.
Supporting multiple compiler versions is possible and usually not too diffcult, but it complicates
the process right now.</p>
<p>Running <code>opam switch list-available</code> will show you a long list of every possible OCaml compiler.
Choose the latest mainline compiler identifed by <code>Official release X.XX.X</code> where currently the latest
is <code>4.13.0</code>. Ignore the others.</p>
<pre class="shell"><code>opam switch list-available
...
ocaml-variants                         4.12.0+domains                         OCaml 4.12.0, with support for multicore domains
ocaml-variants                         4.12.0+domains+effects                 OCaml 4.12.0, with support for multicore domains and effects
ocaml-variants                         4.12.0+options                         Official release of OCaml 4.12.0
ocaml-base-compiler                    4.12.1                                 Official release 4.12.1
ocaml-variants                         4.12.1+options                         Official release of OCaml 4.12.1
ocaml-variants                         4.12.2+trunk                           Latest 4.12 development
ocaml-base-compiler                    4.13.0~alpha1                          First alpha release of OCaml 4.13.0
ocaml-variants                         4.13.0~alpha1+options                  First alpha release of OCaml 4.13.0
ocaml-base-compiler                    4.13.0~alpha2                          Second alpha release of OCaml 4.13.0
ocaml-variants                         4.13.0~alpha2+options                  Second alpha release of OCaml 4.13.0
ocaml-base-compiler                    4.13.0~beta1                           First beta release of OCaml 4.13.0
ocaml-variants                         4.13.0~beta1+options                   First beta release of OCaml 4.13.0
ocaml-base-compiler                    4.13.0~rc1                             First release candidate of OCaml 4.13.0
ocaml-variants                         4.13.0~rc1+options                     First release candidate of OCaml 4.13.0
ocaml-base-compiler                    4.13.0~rc2                             Second release candidate of OCaml 4.13.0
ocaml-variants                         4.13.0~rc2+options                     Second release candidate of OCaml 4.13.0
ocaml-base-compiler                    4.13.0                                 Official release 4.13.0
ocaml-variants                         4.13.0+options                         Official release of OCaml 4.13.0
ocaml-variants                         4.13.1+trunk                           Latest 4.13 developmet
ocaml-variants                         4.14.0+trunk                           Current trunk
...</code></pre>
<p>At this point, install the latest OCaml 4.13.0:</p>
<pre class="shell"><code>$ opam switch create 4.13.0

&lt;&gt;&lt;&gt; Installing new switch packages &lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;  🐫
Switch invariant: [&quot;ocaml-base-compiler&quot; {= &quot;4.13.0&quot;} | &quot;ocaml-system&quot; {= &quot;4.13.0&quot;}]

&lt;&gt;&lt;&gt; Processing actions &lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;&lt;&gt;  🐫
∗ installed base-bigarray.base
∗ installed base-threads.base
∗ installed base-unix.base
∗ installed ocaml-options-vanilla.1
⬇ retrieved ocaml-base-compiler.4.13.0  (https://opam.ocaml.org/cache)
∗ installed ocaml-base-compiler.4.13.0
∗ installed ocaml-config.2
∗ installed ocaml.4.13.0
Done.</code></pre>
<p>You can start using this version by typing the following:</p>
<pre class="shell"><code>$ opam switch set 4.13.0</code></pre>
<p>And verify which switch you are using:</p>
<pre class="shell"><code>$ opam switch show
4.13.0</code></pre>
<p>When you work with several OCaml projects, it’s best to create a switch per project, as it keeps
each project isolated and prevents issues with installing conflicting versions of libraries.
For example, I use a naming scheme of <code>ocaml-version-project-name</code>,
e.g., <code>4.13.0-ocurrent</code>. Then in each project directory, run <code>opam switch link 4.13.0-ocurrent</code>
to setup that named switch for that specific directory. Opam will take care of setting that switch
in your shell when you change into that directory.</p>
<h2 id="creating-your-project-directory">Creating Your Project Directory</h2>
<p>For this step we need the <a href="https://dune.readthedocs.io">Dune</a> build tool, so go ahead and install it with <code>opam install dune</code>.
Dune comes with a simple scaffolding command to create an empty project that is really useful
to get started.</p>
<p>I’m calling my project <code>box</code>, so run:</p>
<pre class="shell"><code>$ dune init proj box
Success: initialized project component named box</code></pre>
<p>In the project generated, we get a library component, a CLI, and a test component, which will
all compile out of the box.</p>
<pre class="shell"><code>$ cd box
$ tree
.
├── bin
│   ├── dune
│   └── main.ml
├── box.opam
├── lib
│   └── dune
└── test
    ├── box.ml
    └── dune

3 directories, 6 files</code></pre>
<p>Lets try a compile:</p>
<pre class="shell"><code>$ dune build @all
Info: Creating file dune-project with this contents:
| (lang dune 2.8)
| (name box)
</code></pre>
<p>Running the CLI:</p>
<pre class="shell"><code>$ dune exec bin/main.exe
Hello, World!</code></pre>
<p>Each of the <code>bin</code>, <code>lib</code>, and <code>test</code> directories contains the source code in the form of <code>*.ml</code> files,
along with a <code>dune</code> file which tells Dune how to build the source and on what libraries it depends.
The box <code>bin\dune</code> file declares it’s an <code>executable</code> with a name <code>box</code> and depends on the <code>box</code>
library.</p>
<pre class="shell"><code>(executable
 (public_name box)
 (name main)
 (libraries box))</code></pre>
<h2 id="adding-a-dependency">Adding a Dependency</h2>
<p>CLI tools require command line parsing, <code>Cmdliner</code> is a common library that implements CLI parsing.
We need to add it in two places: first in the <code>dune-project</code> file, to get it installed, and then
in <code>bin/dune</code>, to say where we’re using it.</p>
<p>One small digression, when generating our project, <code>dune</code> created an <code>box.opam</code> file. This describes
our project to Opam, telling it what libraries it requires and what the project does. You need this
if you ever publish a package for other people to use. Newer versions of Dune can generate the <code>box.opam</code>
file from a <code>dune-project</code> file. Having a single source of information is helpful, so lets create that file:</p>
<pre class="shell"><code>(lang dune 2.8)
(name box)

(generate_opam_files true)

(package
 (name box)
 (depends
  (ocaml (&gt;= 4.13.0))
  (cmdliner (&gt;= 0.9.8)))
 (synopsis &quot;Box cli&quot;))</code></pre>
<p>Remove the <code>rm box.opam</code> file to test the generation. Now run <code>dune build @all</code> to regenerate the Opam
file. This file should be checked in, and any further edits should be at the top-level <code>dune-project</code>
file, which should look like this:</p>
<pre class="shell"><code>$ cat box.opam
# This file is generated by dune, edit dune-project instead
opam-version: &quot;2.0&quot;
synopsis: &quot;Box cli&quot;
depends: [
  &quot;dune&quot; {&gt;= &quot;2.8&quot;}
  &quot;ocaml&quot; {&gt;= &quot;4.13.0&quot;}
  &quot;cmdliner&quot; {&gt;= &quot;0.9.8&quot;}
  &quot;odoc&quot; {with-doc}
]
build: [
  [&quot;dune&quot; &quot;subst&quot;] {dev}
  [
    &quot;dune&quot;
    &quot;build&quot;
    &quot;-p&quot;
    name
    &quot;-j&quot;
    jobs
    &quot;@install&quot;
    &quot;@runtest&quot; {with-test}
    &quot;@doc&quot; {with-doc}
  ]
]
</code></pre>
<p>The final step is to actually install the <code>cmdliner</code> library. Run <code>opam install . --deps-only -ty</code>,
which will look at the <code>*.opam</code> files present and install just their dependencies with the correct
version bounds.
The <code>-y</code> says yes to installing the packages. You can remove it if you like by pressing <code>Y</code> or if
you want to review what will be installed.
<code>-t</code> will run the package tests, which isn’t always necessary, but it’s sometimes useful for certain
packages with native C components.</p>
<p>Alternatively you could run <code>opam install cmdliner</code>, as this doesn’t look at version constraints in <code>*.opam</code> files, you might not get what you expect.</p>
<h2 id="editor-tooling">Editor Tooling</h2>
<p>Finally, you’ll want to get comfy with your chosen editor. If you have a preference, you should use the native LSP support in that editor, along with installing <code>opam install ocaml-lsp-server</code>. OCaml is standardising on the LSP protocol for editor interaction. If you have no editor preference, then start with <a href="https://code.visualstudio.com">VSCode</a> and install the OCaml LSP package from the Marketplace.</p>
<p>Personally, I’m using Emacs with the LSP mode <code>eglot</code>, which works really nicely, along with some customisations to bind certain LSP actions to keys. I highly recommend getting into Emacs as an editor because the customisation via a fully-featured language, like Lisp, is fantastic if you live in your editor like I do.</p>
<p>This post is an update to an earlier post by <a href="https://adambard.com/blog/getting-started-with-ocaml/">Adam</a> in 2017, and I hope this short tutorial helps get you started with OCaml!</p>
</div>
]]></description>
    <pubDate>Fri, 29 Oct 2021 00:00:00 UT</pubDate>
    <guid>https://lambdafoo.com/posts/2021-10-29-getting-started-with-ocaml.html</guid>
    <dc:creator>Tim McGilchrist</dc:creator>
</item>

    </channel>
</rss>
