Perpetually Curious Blog

Building OCaml from assembly

Tim McGilchrist — false

Building OCaml from assembly

August 30, 2024

At work I’ve been focusing on improving the debugging experience with OCaml. As part of that I’ve discovered how some of the pieces fit together, that might be obvious in retrospect, but are interesting to at least me so I’m going to post details about them here.

The first nugget is you can hand compile an OCaml program into a final executable. What do I mean? You can ask the OCaml compiler to output all the assembly generated that goes into a library or executable. Then take that an call the assembler yourself to build it. First lets review how the compiler works.

Compilation Pipeline

Here is a grossly simplified overview of the OCaml compiler. We feed in OCaml source code in the form of ml/mli files, which flow through each stage and eventually end up being emitted as either object files or textual assembly files. The first step from OCaml Source to Parse Tree uses menhir to parse and generate an untyped AST representing the code in the source file. This is then type checked into a typed tree, this is where the type theory happens. After that, there are some stages where the typed tree is transformed into representations more suitable for generating assembly. The final stage traverses the CMM/Linear AST generating assembly code for a specific family of CPUs (like x86_64 or ARM64).

                                      
 ┌──────────────┐   ┌──────────────┐  
 │ OCaml Source │   │  Parse Tree  │  
 │              ┼───►              │  
 └──────────────┘   └──────┬───────┘  
                           │          
 ┌──────────────┐   ┌──────▼───────┐  
 │    Lambda    │   │  Typed Tree  │  
 │              ◄───┼              │  
 └──────┬───────┘   └──────────────┘  
        │                             
 ┌──────▼───────┐   ┌──────────────┐  
 │  CMM/Linear  │   │    Emit      │  
 │              ┼───►   Assembly   │  
 └──────────────┘   └──────────────┘

Finally, this assembly is compiled by the system C compiler to produce object files or executables to be run. So we could treat the OCaml compiler as a fancy way to just generate assembly files, which we can then mess with to do things like add DWARF information or optimise assembly routines, or just for pure fun.

OCaml source

Starting with an OCaml program taken from Retrofitting Effect Handlers onto OCaml. This program doesn’t compute anything interesting but it does show how OCaml’s FFI to C works and how to pass control between the two. So it is interesting for what it does.

$ cat meander.ml
external ocaml_to_c
         : unit -> int = "ocaml_to_c"
exception E1
exception E2
let c_to_ocaml () = raise E1
let _ = Callback.register
          "c_to_ocaml" c_to_ocaml
let omain () =
  try (* h1 *)
    try (* h2 *) ocaml_to_c ()
    with E2 -> 0
  with E1 -> 42
let _ = assert (omain () = 42)

$ cat meander_c.c
#include 
#include 

value ocaml_to_c (value unit) {
    caml_callback(*caml_named_value
                  ("c_to_ocaml"), Val_unit);
    return Val_int(0);
}

Reading from the bottom of the file, meander.ml asserts that the function omain returns the value 42. It gets that value by calling ocaml_to_c which is actually an external C function defined in meander_c.c, imported into OCaml using external in the first line of meander.ml. The C function calls back into OCaml using caml_callback which executes the c_to_ocaml function. An exception is raised, unwinding everything back to omain with it’s try/with blocks.

To compile this program we use the OCaml 5.2 compiler.

$ ocamlopt --version
5.2.0
$ ocamlopt meander_c.c meander.ml -o meander.exe
$ ./meander.exe
$ echo $?
0

Running the program under macOS gives a successful exit code, so it must have got 42 and the assertion passed. Try changing the value 42 to something else to check.

Next we will pull apart what the compiler is doing to generate the final executable. Run ocamlopt with these flags:

 $ ocamlopt meander_c.c meander.ml -o meander.exe -S -g -dstartup -verbose

+ cc  -O2 -fno-strict-aliasing -fwrapv -pthread -pthread  -D_FILE_OFFSET_BITS=64 -c -g -I'/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml' 'meander_c.c'
+ cc -c -Wno-trigraphs  -o 'meander.o' 'meander.s'
+ cc -c -Wno-trigraphs  -o '/var/folders/z_/7yzlrkjn6pd441zs1qhzpjv00000gn/T/camlstartup9b503b.o' 'meander.exe.startup.s'
+ cc -O2 -fno-strict-aliasing -fwrapv -pthread  -pthread   -o 'meander.exe'  '-L/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml'  '/var/folders/z_/7yzlrkjn6pd441zs1qhzpjv00000gn/T/camlstartup9b503b.o' '/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml/std_exit.o' 'meander.o' '/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml/stdlib.a' 'meander_c.o' '/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml/libasmrun.a'     -lpthread

Focusing on the ocamlopt command, the flag -S asks the compiler to generate the assembly files for the OCaml source, -g asks for debug information to be included, -dstartup generates the startup file that bridges between the C startup and OCaml (more on that later) and -verbose tells ocamlopt to print out what commands it’s running.

So, what has been printed out? The first line is compiling the meander_c.c file into an object file, the meander_c.o file in the current directory. Then we have a meander.s file being compiled (assembled) into another object file. This is the output of compiling the meander.ml OCaml source into assembly. The --verbose option doesn’t show how that file gets created. The third line is compiling the startup file from meander.exe.startup.s into another object file. The final step is calling the linker via cc to generate the final meander.exe file. You can see all the object files from previous steps plus the OCaml stdlib _opam/lib/ocaml/stdlib.a and _opam/lib/ocaml/std_exit.o from the local opam switch plus the OCaml libraries being added to the search path as -L/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml. It is not that dissimilar to building a C program.

What about those assembly files? The meander.s is our ARM64 assembly file for meander.ml open it up and search for entry. If you’re on Linux or another architecture like x86_64 the assembly will be different but the names will be the same. This is the entry point called when executing the program, the OCaml runtime jumps to the symbol _camlMeander.entry.

	.globl	_camlMeander.entry
L114:
	mov	x16, #34
	stp	x16, x30, [sp, #-16]!
	bl	_caml_call_realloc_stack
	ldp	x16, x30, [sp], #16
_camlMeander.entry:
	.cfi_startproc
	ldr	x16, [x28, #40]
	add	x16, x16, #328
	cmp	sp, x16
	bcc	L114
	sub	sp, sp, #16
	.cfi_adjust_cfa_offset	16
	.cfi_offset 30, -8
	str	x30, [sp, #8]

Search for other symbols like omain and c_to_ocaml

	.globl	_camlMeander.omain_278
_camlMeander.omain_278:
	.loc	1	8
	.cfi_startproc
	sub	sp, sp, #16
	.cfi_adjust_cfa_offset	16
	.cfi_offset 30, -8
	str	x30, [sp, #8]
....
_camlMeander.c_to_ocaml_273:
	.file	1	"meander.ml"
	.loc	1	5
	.cfi_startproc
	sub	sp, sp, #16
	.cfi_adjust_cfa_offset	16
	.cfi_offset 30, -8
	str	x30, [sp, #8]

All the code is there, we just need to assemble it. On my machine (macOS ARM64) running this command will give me an executable meander.exe without even using ocamlopt.

$ gcc -O2 -fno-strict-aliasing -fwrapv -pthread -D_FILE_OFFSET_BITS=64 \
      -c -g -I'/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml' 'meander_c.c'
$ gcc -c -Wno-trigraphs -o 'meander.o' 'meander.s'
$ gcc -c -Wno-trigraphs -o meanderCamlStartup.o meander.exe.startup.s
$ gcc -o 'meander.exe' '-L/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml' 'meanderCamlStartup.o' \
       '/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml/std_exit.o' 'meander.o' \
       '/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml/stdlib.a' 'meander_c.o' \
       '/Users/tsmc/code/ocaml/owee/_opam/lib/ocaml/libasmrun.a' -lpthread

Try it out, you’ll need to change /Users/tsmc/code/ocaml/owee/_opam to your local directory with a local opam switch for OCaml 5.2.

Startup file

What about that startup file? meander.exe.startup.s What is that for? Open the file and search for _caml_program, this is the entry point called by the startup code written in C.

_caml_program:
	.cfi_startproc
	ldr	x16, [x28, #40]
	add	x16, x16, #328
	cmp	sp, x16
	bcc	L136
	sub	sp, sp, #16
	.cfi_adjust_cfa_offset	16
	.cfi_offset 30, -8
	str	x30, [sp, #8]
L135:
	bl	_camlCamlinternalFormatBasics$entry
L137:
	adrp	x0, _caml_globals_inited@GOTPAGE
	ldr	x0, [x0, _caml_globals_inited@GOTPAGEOFF]
	ldr	x2, [x0, #0]
	add	x3, x2, #1
	dmb	ishld
	str	x3, [x0, #0]
	bl	_camlStdlib$entry

The code is responsible for calling the entry initialisation function for all imported modules. In meander.ml we only include a couple of functions from the standard library so we have _camlStdlib$entry, _camlStdlib__Sys$entry etc then we finally call _camlMeander$entry which we saw earlier.

We need this assembly file to generate an object file for linking into the final executable. If not the linker won’t have _caml_program symbol available and none of the OCaml Stdlib will be initialised. A fun exercise is to re-write this file to not call all those entry functions but still provide _caml_program and call into _camlMeander$entry.

I made small PR #13217 to improve this behaviour to loop over a table of functions to call rather than generating large slabs of identical code.

Bonus

Now you we don’t need the OCaml compiler to write OCaml.

But seriously the purpose for discovering this was to investigate adding DWARF debugging information to OCaml on macOS. That’s a different topic for next time.

Getting Started with LLDB on OCaml

Tim McGilchrist — true

Getting Started with LLDB on OCaml

August 3, 2024

This post is a companion to KC’s excellent Getting Started with GDB on OCaml that shows how to debug OCaml programs with GDB. I wanted to demonstrate the same functionality using LLDB on Linux ARM64. The aim is to show the beginnings of debugging OCaml programs with LLDB and highlight a few LLDB tricks I’ve found.

We will start with the same program:

(* fib.ml *)
let rec fib n =
  if n = 0 then 0
  else if n = 1 then 1
  else fib (n-1) + fib (n-2)

let main () =
  let r = fib 20 in
  Printf.printf "fib(20) = %d\n" r

let _ = main ()

Compiled with OCaml version 5.2.0.

$ ocamlopt --version
5.2.0
$ ocamlopt -g -o fib.exe fib.ml
$ ./fib.exe 20
fib(20) = 6765

The program prints the 20th Fibonacci number, nothing special but interesting because it has recursion. Now start up an lldb session.

$ lldb ./fib.exe

Setting breakpoints

We want to set breakpoints in the fib function. The first way to set breakpoints is based on OCaml function names, due to a process called name mangling, they look slightly different in the executable. Since we don’t know the exact names we can use tab completion to help us.

(lldb) br s -n camlFib.fib_ # press tab to show the possible matches
(lldb) br s -n camlFib.fib_270 # There is only one matching ending 270
Breakpoint 1: where = fib.exe`camlFib.fib_270 + 76, address = 0x0000000000051084

You can also set break points using lldb’s file name and number combination. This time we will set a breakpoint in the main function, which starts at line 6 in fib.ml.

(lldb) br s -f fib.ml -l 6
Breakpoint 2: where = fib.exe`camlFib.main_271, address = 0x0000000000050f48
(lldb)

Now we can run the program.

Breakpoint 2: where = fib.exe`camlFib.main_272, address = 0x00000000000510c8
(lldb) run
Process 11987 launched: '/home/tsmc/projects/fib.exe' (aarch64)
Process 11987 stopped
* thread #1, name = 'fib.exe', stop reason = breakpoint 2.1
    frame #0: 0x0000aaaaaaaf10c8 fib.exe`camlFib.main_272 at fib.ml:7
   4   	  else if n = 1 then 1
   5   	  else fib (n-1) + fib (n-2)
   6   	
-> 7   	let main () =
   8   	  let r = fib 20 in
   9   	  Printf.printf "fib(20) = %d\n" r
   10

The program execution starts in the lldb session and we stop at the breakpoint at main. LLDB has a terminal UI mode for stepping through the file. This can be started up typing gui into the lldb prompt, it should look similar to this.

Note that we can see both breakpoints highlighted on the line numbers, the backtrace of how we got here and the current line is highlighted. Use Esc to exit the terminal UI mode and go back to the lldb prompt. We will use the lldb prompt for the rest of the post.

Examining the stack

You can step through the OCaml program with lldb commands n and s. After a few n’s, examine the backtrace using the bt command.

(lldb) bt
* thread #1, name = 'fib.exe', stop reason = breakpoint 1.1
  * frame #0: 0x0000aaaaaaaf1084 fib.exe`camlFib.fib_270 at fib.ml:5
    frame #1: 0x0000aaaaaaaf108c fib.exe`camlFib.fib_270 at fib.ml:5
    frame #2: 0x0000aaaaaaaf108c fib.exe`camlFib.fib_270 at fib.ml:5
    frame #3: 0x0000aaaaaaaf108c fib.exe`camlFib.fib_270 at fib.ml:5
    frame #4: 0x0000aaaaaaaf10f4 fib.exe`camlFib.main_272 at fib.ml:8
    frame #5: 0x0000aaaaaaaf11bc fib.exe`camlFib.entry at fib.ml:11
    frame #6: 0x0000aaaaaaaee684 fib.exe`caml_program + 476
    frame #7: 0x0000aaaaaab46b48 fib.exe`caml_start_program + 132
    frame #8: 0x0000aaaaaab46640 fib.exe`caml_main [inlined] caml_startup(argv=) at startup_nat.c:145:7
    frame #9: 0x0000aaaaaab4663c fib.exe`caml_main(argv=) at startup_nat.c:151:3
    frame #10: 0x0000aaaaaaaee310 fib.exe`main(argc=, argv=) at main.c:37:3
    frame #11: 0x0000fffff7d784c4 libc.so.6`__libc_start_call_main(main=(fib.exe`main at main.c:31:1), argc=1, argv=0x0000fffffffffb58) at libc_start_call_main.h:58:16
    frame #12: 0x0000fffff7d78598 libc.so.6`__libc_start_main_impl(main=0x0000aaaaaaba0de0, argc=16, argv=0x000000000000000f, init=, fini=, rtld_fini=, stack_end=) at libc-start.c:360:3
    frame #13: 0x0000aaaaaaaee3b0 fib.exe`_start + 48

You can see the backtrace includes the recursive calls to fib function, the main function in fib.ml, followed by some assembly functions and a number of functions from the OCaml runtime. In between frame #8 and #5 is where the runtime, written in C, switches into assembly to setup the environment to execute the OCaml program. Then we actually enter the OCaml program at frame #5 via camlFib.entry. This function calls initialisation functions for the program and any dependencies like Stdlib that get used.

Examining values

The support for examining OCaml values in LLDB, as you would for say C, is a bit lacking. Not enough information is being emitted by the OCaml compiler to do this yet. So we need to understand how OCaml represents values at runtime and what the OCaml calling conventions are. First we will look at examining values.

Here we are on ARM64 so our registers are named x0-x30 with sp representing the stack pointer. The first 16 arguments are passed in registers, starting from register x0. So the arguments to the fib function should be in the x0 register. We also know that the argument to fib is an integer. OCaml uses 63-bit tagged integers (on 64-bit machines) with the least-significant bit is 1. Given a machine word or a register holding an OCaml integer, the integer value is obtained by right shifting the value by 1.

Putting that all together, we can examine the arguments to fib at the breakpoint in fib like so.

(lldb) p $x0 >> 1
(unsigned long) 5

Given we have a recursive fib function this printing corresponds to fib(5). Have a go at moving up and down the recursive fib calls using up or down and print out the arguments. You can also examine the evaluation order of arguments in fib, noting that the evaluation order of arguments in OCaml is unspecified but 5.2.0 evaluates right-to-left.

Advanced printing

Examining values using bit shifting is tedious. We can do better by writing our own printing functions in Python. The OCaml compiler distribution comes with some scripts to make examining OCaml values in LLDB easier. Note they have historically been used by OCaml maintainers to develop the compiler, so they might be a little rough or missing features (PRs to improve this situation are welcome). With that lets see what we can do.

Since we are using OCaml 5.2.0, we need to get that source code.

# I'm working within ~/projects directory on my machine
$ git clone https://github.com/ocaml/ocaml --branch 5.2.0

Startup a new lldb session, load the lldb script, and get to a breakpoint in the recursive fib calls

lldb ./fib.exe
(lldb) command script import ../ocaml/tools/lldb.py
(lldb) br s -f fib.ml -l 1
Process 12014 launched: '/home/tsmc/projects/fib.exe' (aarch64)
Process 12014 stopped
* thread #1, name = 'fib.exe', stop reason = breakpoint 4.1
    frame #0: 0x0000aaaaaaaf1038 fib.exe`camlFib.fib_270 at fib.ml:2
   1   	(* fib.ml *)
-> 2   	let rec fib n =
   3   	  if n = 0 then 0
   4   	  else if n = 1 then 1
   5   	  else fib (n-1) + fib (n-2)
   6   	
   7   	let main () =

As earlier, the first argument is in x0 register. We can examine the value now with the python script.

(lldb) p (value)$x0
(value) 41 caml:20

value is the type of OCaml values defined in the OCaml runtime. The script tools/lldb.py installs a pretty printer for the values of type value. Here is pretty prints the first argument which is 20

We can also print other kinds of OCaml values. Create this file with some interesting OCaml values:

$ cat test_blocks.ml
(* test_blocks.ml *)

type t = {s : string; i : int}

let main a b =
  print_endline "Hello, world!";
  print_endline a;
  print_endline b.s

let _ = main "foo" {s = "bar"; i = 42}

Now we need to compile it, start an lldb session and break on the main function.

$ ocamlopt -g -o test_blocks.exe test_blocks.ml
$ lldb ./test_blocks.exe
(lldb) target create "./test_blocks.exe"
Current executable set to '/home/tsmc/projects/test_blocks.exe' (aarch64).
(lldb) command script import ../ocaml/tools/lldb.py
OCaml support module loaded. Values of type 'value' will now
print as OCaml values, and an 'ocaml' command is available for
heap exploration (see 'help ocaml' for more information).
(lldb) br s -n camlTest_blocks.main_273
Breakpoint 1: where = test_blocks.exe`camlTest_blocks.main_273 + 40, address = 0x0000000000019ab0
(lldb) run
Process 12043 launched: '/home/tsmc/projects/test_blocks.exe' (aarch64)
Process 12043 stopped
* thread #1, name = 'test_blocks.exe', stop reason = breakpoint 1.1
    frame #0: 0x0000aaaaaaab9ab0 test_blocks.exe`camlTest_blocks.main_273 at test_blocks.ml:4
   1   	type t = {s : string; i : int}
   2   	
   3   	let main a b =
-> 4   	  print_endline "Hello, world!";
   5   	  print_endline a;
   6   	  print_endline b.s
   7   	
(lldb)

Let’s examine the two arguments to main

(lldb) p (value)$x0
(value) 187649984891864 caml(-):'Hello, world!'<13>
(lldb) p (value)$x1
(value) 187649984891808 caml(-):('bar', 42)

What is going on here, didn’t we say the first argument is in x0? What has happened here is our breakpoint has been set a little after we have entered the function and the original value for x0 has been stored on the stack and x0 register has been reused to store arguments to print_endline "Hello, world!";. The second argument in x1 is as expected.

To find the original x0 value we need to look at assembly (don’t worry too much about the specifics of ARM assembly).

(lldb) dis
test_blocks.exe`camlTest_blocks.main_273:
    0xaaaaaaab9a88 <+0>:  ldr    x16, [x28, #0x28]
    0xaaaaaaab9a8c <+4>:  add    x16, x16, #0x158
    0xaaaaaaab9a90 <+8>:  cmp    sp, x16
    0xaaaaaaab9a94 <+12>: b.lo   0xaaaaaaab9a78 ; camlStd_exit.code_end
    0xaaaaaaab9a98 <+16>: sub    sp, sp, #0x20
    0xaaaaaaab9a9c <+20>: str    x30, [sp, #0x18]
    0xaaaaaaab9aa0 <+24>: str    x0, [sp]
    0xaaaaaaab9aa4 <+28>: str    x1, [sp, #0x8]
(lldb) reg r sp
      sp = 0x0000aaaaaab3d160
(lldb) memory read -s8 -fx -l2 0x0000aaaaaab3d160
0xaaaaaab3d160: 0x0000aaaaaab10bc8 0x0000aaaaaab10ba0
0xaaaaaab3d170: 0x0000fffffffff8e0 0x0000aaaaaaab9b38
0xaaaaaab3d180: 0x0000000000000000 0x0000aaaaaaab94fc
0xaaaaaab3d190: 0x0000000000000000 0x0000aaaaaaae05c8
(lldb) p (value)0x0000aaaaaab10bc8
(value) 187649984891848 caml(-):'foo'<3>

The disassembled code is the function prologue code, which is saving x0 onto the stack using str x0, [sp]. To get the original value for x0 we read sp (Stack Pointer), retrieve the data at that address and then print it using value. We get back to our argument passed to main, which was foo and can confirm that by looking at the source code.

Extras

A few useful extras for debugging OCaml programs.

You can set breakpoints based on addresses, this is useful when you know a specific instruction you want to break on. From the previous session, set a breakpoint on the sub sp, sp, #0x20 address.

(lldb) br s -a 0xaaaaaaab9a98
Breakpoint 7: where = test_blocks.exe`camlTest_blocks.main_273 + 16, address = 0x0000aaaaaaab9a98
(lldb) run
There is a running process, kill it and restart?: [Y/n] y
Process 12070 exited with status = 9 (0x00000009) killed
Process 12078 launched: '/home/tsmc/projects/test_blocks.exe' (aarch64)
Process 12078 stopped
* thread #1, name = 'test_blocks.exe', stop reason = breakpoint 7.1
    frame #0: 0x0000aaaaaaab9a98 test_blocks.exe`camlTest_blocks.main_273 at test_blocks.ml:3
   1   	type t = {s : string; i : int}
   2   	
-> 3   	let main a b =
   4   	  print_endline "Hello, world!";
   5   	  print_endline a;
   6   	  print_endline b.s
   7   	
(lldb) p (value)$x0
(value) 187649984891848 caml(-):'foo'<3>

Now we can print out the value of x0 before it gets saved on the stack.

We can also lookup symbols in the executable using image lookup -r -n if we are not sure of the specific name we want.

(lldb) image lookup -r -n camlTest
4 matches found in /home/tsmc/projects/test_blocks.exe:
        Address: test_blocks.exe[0x0000000000019a78] (test_blocks.exe.PT_LOAD[0]..text + 1912)
        Summary: test_blocks.exe`camlStd_exit.code_end
        Address: test_blocks.exe[0x0000000000019b48] (test_blocks.exe.PT_LOAD[0]..text + 2120)
        Summary: test_blocks.exe`camlTest_blocks.code_end
        Address: test_blocks.exe[0x0000000000019ae0] (test_blocks.exe.PT_LOAD[0]..text + 2016)
        Summary: test_blocks.exe`camlTest_blocks.entry
        Address: test_blocks.exe[0x0000000000019a88] (test_blocks.exe.PT_LOAD[0]..text + 1928)
        Summary: test_blocks.exe`camlTest_blocks.main_273

Finally setting breakpoints on macOS with LLDB is slightly broken so you need to lookup the symbol name and then set the breakpoint based on the address of the symbol. We can combine image lookup with setting breakpoints on addresses to debug on macOS.

More for later

There is a lot more to say about debugging OCaml programs using LLDB and there is ongoing work to improve debugger support in OCaml. Get in touch if you would like to be involved.

Debugging OCaml with Emacs

Tim McGilchrist — true

Debugging OCaml with Emacs

March 25, 2024

This post started as a summary of my March Hacking Days effort at Tarides.

I have been working on improving the debugging situation for OCaml and wanted to see how easily I could setup debug support in Emacs using DAP. Debug Adapter Protocol (DAP) is a wire protocol for communicating between an editor or IDE and a debug server like LLDB or GDB, providing an abstraction over debugging, similar to how Language Server Protocol (LSP) provides language support for editors.

OCaml comes with support for debugging native programs with GDB and LLDB, and bytecode code programs using ocamldebug and earlybird. In this post we will cover setting up and debugging both kinds of programs. I am using an M3 Mac so all examples will show ARM64 assembly and macOS specific paths. The same setup should work on Linux. I use prelude to configure my Emacs with my own customistations in .emacs/personal, adjust for your own personal Emacs setup.

Let’s start with the following program to compute Fibonacci sequence:

(* fib.ml *)
let rec fib n =
  if n = 0 then 0
  else if n = 1 then 1
  else fib (n-1) + fib (n-2)

let main () =
  let r = fib 20 in
  Printf.printf "fib(20) = %d" r

let _ = main ()

And this dune configuration in the same directory:

(executable
 (name fib)
 (modules fib)
 (modes exe byte))

And this dune-project configuration in the same directory:

(lang dune 3.11)
(map_workspace_root false)

Create an empty opam switch in same directory and install dune:

$ opam switch create . 5.1.1 --no-install
$ opam install dune

This gives us everything we need to try out all the different debuggers.

Emacs configuration

Emacs has dap-mode that provides everything we need for DAP integration. Install it using M-x package-install and choose the dap-mode package. I have the following lines in my .emacs/personal/init.el that will require the packages we need and setup some convenient key bindings:

; Require dap-mode plus the two extra files we need
(require 'dap-mode)
(require 'dap-codelldb)
(require 'dap-ocaml)

; Setup key bindings using use-package.
(use-package dap-mode
  :bind (("C-c M-n" . dap-next)
         ("C-c M-s" . dap-step-in)
         ("C-c M-a" . dap-step-out)
         ("C-c M-w" . dap-continue)))

Save and restart Emacs, then we can move onto setting up bytecode debugging.

Bytecode debugging

The earlybird project provides DAP support for debugging OCaml bytecode. OCaml has a bytecode compiler that produces portable bytecode executables which can be run with ocamlrun, the interpreter for OCaml bytecode. Earlybird uses the (undocumented) protocol of ocamldebug to communicate with a bytecode executable, inheriting the same functionality as ocamldebug.

Start by installing the earlybird package:

opam install earlybird

Then create a file in .vscode/launch.json with this configuration:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "OCaml earlybird (experimental)",
            "type": "ocaml.earlybird",
            "request": "launch",
            "program": "./_build/default/fib.bc",
            "stopOnEntry": true,
            "cwd": "${workspaceFolder}"
        },
    ]
}

Build the project with dune build to create the fib.bc bytecode file. Finally start a debugger by running M-x dap-debug. It will prompt you to choose a session, we want OCaml earlybird (experimental) from the named configuration above. It will start earlybird and immediately stop it before executing any OCaml code.

To set breakpoints you need to open the OCaml source file in _build/default/fib.ml and click on the source lines you want to stop at. Here is what it looks like after a few recursions. Use the buttons to control the debugger or use the keybindings we added to step through the execution. Curiously they are not pre-defined but here I’ve tried to reuse mappings from ocamldebug.

Native debugging

OCaml can also produce native binaries that can be debugged using native debuggers like GDB or LLDB, depending on your platform. Here we will use LLDB on macOS, but Linux LLDB works too – just change the name of the program you want to debug.

Add another section to .vscode/launch.json for starting lldb.

        {
            "type": "lldb",
            "request": "launch",
            "name": "LLDB with ocamlopt",
            "program": "./fib.exe",
            "args": [],
            "stopOnEntry": true,
            "cwd": "${workspaceFolder}"
        },

Run M-x dap-codelldb-setup which will download the codelldb DAP program that we are using to communicate with LLDB. This gets installed into .extension/vscode/codelldb. Now compile the fib program with ocamlopt -g -o fib.exe fib.ml and startup a debugger session with M-x dap-debug choose the LLDB with ocamlopt option. You should see something similar to:

Now DAP as setup with LLDB and macOS, is a little broken and is missing support for setting breakpoints on symbols and line numbers in source code. Fixes for both will be comming soon. Linux LLDB works better in this scenario. Setting breakpoints using line numbers in source code requires fixes to the OCaml compiler, while setting breakpoints on symbols is supported in codelldb but not exposed into dap-mode.

The second option is debugging native binaries built with Dune, this is slightly different for two reasons. First Dune places the executable into _build/default/fib.exe and second Dune produces slightly different symbols. Start by adding a new section in .vscode/launch.json for Dune built executables:

        {
            "type": "lldb",
            "request": "launch",
            "name": "LLDB with Dune",
            "program": "./_build/default/fib.exe",
            "args": [],
            "stopOnEntry": true,
            "cwd": "${workspaceFolder}"
        },

Remove the old fib.exe in the project directory (dune will complain if you don’t) and run dune build. Startup a new DAP session with M-x dap-debug and choose LLDB with Dune. You should see the same debugger session as before.

Conclusion

Debugging OCaml with DAP inside Emacs is possible. There are working options for both bytecode programs and native programs which work reasonably well.

Use dap-mode with:

(require 'dap-mode)
(require 'dap-codelldb)
(require 'dap-ocaml)

(use-package dap-mode
  :bind (("C-c M-n" . dap-next)
         ("C-c M-s" . dap-step-in)
         ("C-c M-a" . dap-step-out)
         ("C-c M-w" . dap-continue)))

and a launch.json of

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "OCaml earlybird (experimental)",
            "type": "ocaml.earlybird",
            "request": "launch",
            "program": "./_build/default/fib.bc",
            "stopOnEntry": true,
            "cwd": "${workspaceFolder}"
        },
        {
            "type": "lldb",
            "request": "launch",
            "name": "LLDB with Dune",
            "program": "./_build/default/fib.exe",
            "args": [],
            "stopOnEntry": true,
            "cwd": "${workspaceFolder}"
        },
        {
            "type": "lldb",
            "request": "launch",
            "name": "LLDB with ocamlopt",
            "program": "./fib.exe",
            "args": [],
            "stopOnEntry": true,
            "cwd": "${workspaceFolder}"
        }
    ]
}

The same setup will work under VSCode with the CodeLLDB and OCaml Platform extensions installed. Happy Emacs debugging.

Future Work

I’m working on improving the OCaml debugging experience on macOS and Linux. Currently the macOS LLDB experience is behind that on Linux LLDB, so that is the first goal. Then I want to improve the DWARF encodings for OCaml and generally improve the native debugger experience.

ICFP 2022 Review

Tim McGilchrist — true

ICFP 2022 Review

October 11, 2022

I wrote up a highlights of ICFP 2022 for the Tarides blog. It was great to get back to in-person conferences again and getting the chance to meet people. Thanks to my employer Tarides for covering the cost.

For me personally the OCaml Workshop was fantastic beginning to end, read the blog post for the full details. Outside of OCaml I spent time in the Haskell Implementors Workshop, hearing about the new features for GHC and excited by the progress that Cabal is making.

My take away research topics are:

Typed Effect Systems especially Koka.
Delimited Continuations for both OCaml and Haskell.
Lockfree data structures, Reagents and STM.

OCaml with Emacs in 2022

Tim McGilchrist — true

OCaml with Emacs in 2022

September 7, 2022

I am revisiting my OCaml setup post from 2021 because I needed to setup a new macOS machine. The official OCaml site points newcomers to Visual Studio Code which is a fine choice to get started. However I am using Emacs and have done so for over 20 years, and did not find a good description of how to set things up with Emacs. Here I could digress into why Emacs but I will just strongly encourage any developers to invest heavily in learning their editor with Emacs being a fine choice.

Beginnings

On macOS I use the pre-compiled GUI version of Emacs from emacsformacosx preferring that over compiling it by hand or using the version in homebrew. Both of which I have done previously but find the emacsformacos version saves me time and effort, plus the GUI version was removed from homebrew at some point in the past.

Next I choose to use an Emacs distro over the base Emacs setup, again this is a time saving choice and especially useful if you are new to Emacs. Use Prelude, which is an enhanced Emacs 25.1+ distribution that should make your experience with Emacs both more pleasant and more powerful. It gives a great modern setup for Emacs with minimal fuss. Once that is cloned and installed the Lisp config begins.

Prelude onfiguration

Prelude provides a base experience of packages available with some configuration. The configuration goes into ~/.emacs.d/tsmc/prelude-modules.el where tsmc is your macOS user. The same path would apply for Linux. A sample prelude-modules.ml is provided in https://github.com/bbatsov/prelude/blob/master/sample/prelude-modules.el

I choose the following modules to enable with prelude-lsp and prelude-ocaml being the core OCaml related choices. The other bits are optional but useful for editing lisp or navigating code.

(require 'prelude-ivy) ;; A mighty modern alternative to ido
(require 'prelude-company)
(require 'prelude-emacs-lisp)
(require 'prelude-lisp) ;; Common setup for Lisp-like languages
(require 'prelude-lsp) ;; Base setup for the Language Server Protocol
(require 'prelude-ocaml)

Now for the customisation to get LSP working properly. There are 3 main pieces:

direnv - for automatically configuring shell environments
ocaml-lsp-server - the core lsp implementation for OCaml
lsp-mode - the Emacs mode that drives everything

direnv the necessary magic

direnv is a small program to load/unload environment variables based on $PWD (current working directory). This program ensures that when you open an OCaml file the correct opam switch is chosen and the tools installed in that switch are made available to Emacs. Opam is the OCaml package manager and manages local sandboxes of packages called switches. Without direnv Emacs will not find the correct tools and you would need to mess with Emacs PATHS to get it right. I have done that and it is much simplier with direnv.

So brew install direnv and create a .envrc file in an OCaml project with eval $(opam env --set-switch) inside. Compared to my previous post I have been using local opam switches which exist inside an OCaml project. They are created as opam switch create . 4.14.0 --with-test --deps-only -y and appear as an _opam directory in the project root. Next run direnv allow to tell direnv it is safe to use the .envrc file in this directory. The reason I have switched is I often need to test different OCaml versions so removing the _opam directory and recreating it is the simplier option.

OCaml LSP Server

OCaml LSP server needs to be installed in the current switch so run opam update && opam install ocaml-lsp-server -y, this will make ocaml-lsp-server available to Emacs via direnv.

There is an opportunity here to use Emacs Lisp to install ocaml-lsp-server if it was missing or to allow lsp-mode to download and install it itself. I would like to have this working in future. Next back into Lisp.

Emacs LSP mode

Create a file init.el in ~/.emacs.d/tsmc/ substituting your Unix user name for tsmc. Thanks to emacs-prelude the configuration is very small.

;;; init.el --- @tsmc configuration entry point.

(prelude-require-packages '(use-package direnv))
;; Use direnv to select the correct opam switch and set the path
;; that Emacs will use to run commands like ocamllsp, merlin or dune build.

(use-package lsp-mode
  :hook
  (tuareg-mode . lsp))
;; Attach lsp hook to modes that require it, here we bind to tuareg-mode rather than
;; prelude-ocaml. For unknown reasons the latter does not bind properly and does not
;; start lsp-mode

(provide 'tsmc)
;;; init.el ends here

We require a few packages use-package and direnv, and then tell Emacs to start lsp-mode when tuareg-mode is started. Tuareg-mode is one of the OCaml modes available for Emacs, the other being caml-mode which I have not really used. Now quit and restart Emacs. Opening an ml file inside the project you started earlier and ocaml-lsp should startup.

The types for expressions and modules will display on mouse hover or beside the definition. Hovering the mouse over a function or type will display the type plus the documentation comments for it. A successful dune build for the project is required to generate the data used by ocaml-lsp-server. At this point in time prelude relies on merlin an assistant for editing OCaml code, that is used by ocaml-lsp-server internally but also available as standalone tool. So I often have both installed, opam install merlin should be enough to get it installed too.

At this point I am mostly happy, the types and documentation displays as required. Navigating using M-. shows a preview of the type / function under point and return will take me to the definition. This is vastly improved in OCaml 4.14 (with the work on Shapes) which I have switched to for everything I can. Switching between ml and mli files is C-c C-a and more, simply visit the M-x describe-mode to show everything available.

The annoyances are more fundamental to how LSP wants to work. It uses what I am calling a push based interaction, where it generates the information for types and documentation in the background and pushes it into the Emacs buffer. You never need to ask what is the type, it will display for you. Sometimes I want to ask for what a type is inside an expression, with LSP you are encouraged to mouse hover over something rather than having a key binding for it. So far I haven’t found the lisp function that drives the hover functionality but when I do I will bind it to a key. The second issue is also around mouse usage to drive LSP functionality like rename or annotate types. I would strongly prefer a key chord driven approach to that. Again I will set this up once I find the right lsp functions. For now I use C-c C-t from merlin to summon the types for things.

Overall the experience is solid. Types and docs appear as required. Navigation works. The speed has been good so far. LSP mode is less janky than it was 1 year ago.

Alternatives

There is a fine alternative LSP mode, Eglot for Emacs. It takes a more minimal approach and uses a pull based interaction. Where you ask for the information based on key bindings vs the information being pushed at you via UI elements. For example, the type of a function is requested rather than shown by default.

The corresponding configuration I was using previously is:

(use-package eglot
  :config
  (define-key eglot-mode-map
    (kbd "C-c C-t") #'eldoc-print-current-symbol-info)

  :hook
  ((tuareg-mode . eglot-ensure)))

Again using use-package to configure the mode, the hooks are triggering Eglot to be loaded when tuareg-mode is. Using the eglot-ensure function which starts an Eglot session for current buffer if there isn’t one. No further configuration is needed in Emacs as Eglot knows the LSP server is called ocamllsp and will look for it on the Unix PATH.

Summary

Getting started with OCaml using Emacs can be a struggle. Emacs is a fine editor but the documentation can be difficult to handle. Hopefully following through this setup will yield a working Emacs / LSP setup for OCaml.

In future I want to try binding more things to keys so I use the mouse less and streamline the installing of the ocaml lsp server. Then after that adding support for more interesting code interactions like extracting modules or hoisting let bindings would be nice to have. Happy hacking!

Getting Started with OCaml in 2021

Tim McGilchrist — false

Getting Started with OCaml in 2021

October 29, 2021

OCaml is an awesome language with many fine features. I enjoy using it immensely!

Unfortunately, it suffers from a perceived weakness in how to get started. Like any new skill, there can be a learning curve. The tools are all there, but combining them for a good developer experience might seem difficult at first.

Often I’ve found that the barrier for getting into a new langauge is less about the new features of that language and more about learning the tools to become productive in that language. The package managers, build tools, and editor integration of a new language can be confusing, making for an awful experience.

Perhaps my opinionated guide to getting started with OCaml in 2021 will help reduce any mental blocks against trying out this excellent language.

Install Opam

First it’s necessary to install OCaml and Opam. Opam is the default package manager for OCaml projects. Ignore the other options for now, once you know more about what you want, you can make an informed choice. For now if you speak OPAM, you’ll get the most out of the community.

On Linux, use your local package manger, e.g., apt-get install opam for Debian and apt install opam for Ubuntu. For MacOS, use homebrew brew install opam. I’ll assume if you run something else, you can handle looking up how to install things.

On my Mac I get Opam 2.1.0:

$ opam --version
2.1.0

Once you’ve got Opam installed, you should be able to move on to the next step.

Choose an OCaml Version

I strongly recommended that you pick a single OCaml version that your project will compile against. Supporting multiple compiler versions is possible and usually not too diffcult, but it complicates the process right now.

Running opam switch list-available will show you a long list of every possible OCaml compiler. Choose the latest mainline compiler identifed by Official release X.XX.X where currently the latest is 4.13.0. Ignore the others.

opam switch list-available
...
ocaml-variants                         4.12.0+domains                         OCaml 4.12.0, with support for multicore domains
ocaml-variants                         4.12.0+domains+effects                 OCaml 4.12.0, with support for multicore domains and effects
ocaml-variants                         4.12.0+options                         Official release of OCaml 4.12.0
ocaml-base-compiler                    4.12.1                                 Official release 4.12.1
ocaml-variants                         4.12.1+options                         Official release of OCaml 4.12.1
ocaml-variants                         4.12.2+trunk                           Latest 4.12 development
ocaml-base-compiler                    4.13.0~alpha1                          First alpha release of OCaml 4.13.0
ocaml-variants                         4.13.0~alpha1+options                  First alpha release of OCaml 4.13.0
ocaml-base-compiler                    4.13.0~alpha2                          Second alpha release of OCaml 4.13.0
ocaml-variants                         4.13.0~alpha2+options                  Second alpha release of OCaml 4.13.0
ocaml-base-compiler                    4.13.0~beta1                           First beta release of OCaml 4.13.0
ocaml-variants                         4.13.0~beta1+options                   First beta release of OCaml 4.13.0
ocaml-base-compiler                    4.13.0~rc1                             First release candidate of OCaml 4.13.0
ocaml-variants                         4.13.0~rc1+options                     First release candidate of OCaml 4.13.0
ocaml-base-compiler                    4.13.0~rc2                             Second release candidate of OCaml 4.13.0
ocaml-variants                         4.13.0~rc2+options                     Second release candidate of OCaml 4.13.0
ocaml-base-compiler                    4.13.0                                 Official release 4.13.0
ocaml-variants                         4.13.0+options                         Official release of OCaml 4.13.0
ocaml-variants                         4.13.1+trunk                           Latest 4.13 developmet
ocaml-variants                         4.14.0+trunk                           Current trunk
...

At this point, install the latest OCaml 4.13.0:

$ opam switch create 4.13.0

<><> Installing new switch packages <><><><><><><><><><><><><><><><><><><><>  🐫
Switch invariant: ["ocaml-base-compiler" {= "4.13.0"} | "ocaml-system" {= "4.13.0"}]

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><>  🐫
∗ installed base-bigarray.base
∗ installed base-threads.base
∗ installed base-unix.base
∗ installed ocaml-options-vanilla.1
⬇ retrieved ocaml-base-compiler.4.13.0  (https://opam.ocaml.org/cache)
∗ installed ocaml-base-compiler.4.13.0
∗ installed ocaml-config.2
∗ installed ocaml.4.13.0
Done.

You can start using this version by typing the following:

$ opam switch set 4.13.0

And verify which switch you are using:

$ opam switch show
4.13.0

When you work with several OCaml projects, it’s best to create a switch per project, as it keeps each project isolated and prevents issues with installing conflicting versions of libraries. For example, I use a naming scheme of ocaml-version-project-name, e.g., 4.13.0-ocurrent. Then in each project directory, run opam switch link 4.13.0-ocurrent to setup that named switch for that specific directory. Opam will take care of setting that switch in your shell when you change into that directory.

Creating Your Project Directory

For this step we need the Dune build tool, so go ahead and install it with opam install dune. Dune comes with a simple scaffolding command to create an empty project that is really useful to get started.

I’m calling my project box, so run:

$ dune init proj box
Success: initialized project component named box

In the project generated, we get a library component, a CLI, and a test component, which will all compile out of the box.

$ cd box
$ tree
.
├── bin
│   ├── dune
│   └── main.ml
├── box.opam
├── lib
│   └── dune
└── test
    ├── box.ml
    └── dune

3 directories, 6 files

Lets try a compile:

$ dune build @all
Info: Creating file dune-project with this contents:
| (lang dune 2.8)
| (name box)

Running the CLI:

$ dune exec bin/main.exe
Hello, World!

Each of the bin, lib, and test directories contains the source code in the form of *.ml files, along with a dune file which tells Dune how to build the source and on what libraries it depends. The box bin\dune file declares it’s an executable with a name box and depends on the box library.

(executable
 (public_name box)
 (name main)
 (libraries box))

Adding a Dependency

CLI tools require command line parsing, Cmdliner is a common library that implements CLI parsing. We need to add it in two places: first in the dune-project file, to get it installed, and then in bin/dune, to say where we’re using it.

One small digression, when generating our project, dune created an box.opam file. This describes our project to Opam, telling it what libraries it requires and what the project does. You need this if you ever publish a package for other people to use. Newer versions of Dune can generate the box.opam file from a dune-project file. Having a single source of information is helpful, so lets create that file:

(lang dune 2.8)
(name box)

(generate_opam_files true)

(package
 (name box)
 (depends
  (ocaml (>= 4.13.0))
  (cmdliner (>= 0.9.8)))
 (synopsis "Box cli"))

Remove the rm box.opam file to test the generation. Now run dune build @all to regenerate the Opam file. This file should be checked in, and any further edits should be at the top-level dune-project file, which should look like this:

$ cat box.opam
# This file is generated by dune, edit dune-project instead
opam-version: "2.0"
synopsis: "Box cli"
depends: [
  "dune" {>= "2.8"}
  "ocaml" {>= "4.13.0"}
  "cmdliner" {>= "0.9.8"}
  "odoc" {with-doc}
]
build: [
  ["dune" "subst"] {dev}
  [
    "dune"
    "build"
    "-p"
    name
    "-j"
    jobs
    "@install"
    "@runtest" {with-test}
    "@doc" {with-doc}
  ]
]

The final step is to actually install the cmdliner library. Run opam install . --deps-only -ty, which will look at the *.opam files present and install just their dependencies with the correct version bounds. The -y says yes to installing the packages. You can remove it if you like by pressing Y or if you want to review what will be installed. -t will run the package tests, which isn’t always necessary, but it’s sometimes useful for certain packages with native C components.

Alternatively you could run opam install cmdliner, as this doesn’t look at version constraints in *.opam files, you might not get what you expect.

Editor Tooling

Finally, you’ll want to get comfy with your chosen editor. If you have a preference, you should use the native LSP support in that editor, along with installing opam install ocaml-lsp-server. OCaml is standardising on the LSP protocol for editor interaction. If you have no editor preference, then start with VSCode and install the OCaml LSP package from the Marketplace.

Personally, I’m using Emacs with the LSP mode eglot, which works really nicely, along with some customisations to bind certain LSP actions to keys. I highly recommend getting into Emacs as an editor because the customisation via a fully-featured language, like Lisp, is fantastic if you live in your editor like I do.

This post is an update to an earlier post by Adam in 2017, and I hope this short tutorial helps get you started with OCaml!

Hakyll Blog setup

Tim McGilchrist — Fri, 27 Aug 2021 00:00:00 UT

Hakyll Blog setup

August 27, 2021

I wanted to port my blog across from an old Jeykll setup to Haykll. The Jekyll was out of date and keeping the required ruby tools installed when I swapped machines was a huge pain. I don’t write ruby much anymore.

Considering my options, I looked at Hugo and Hakyll, discarding Hugo because I don’t want to keep up with the JS churn, even though they have lots of great resources and themes available. So Hakyll seems like the best option. I already regularily write Haskell so the tools will be up to date and I can make it do everything I want by digging into the source code.

My requirements are:

Markdown based workflow
support basic pages
individual post with code highlighting
RSS/Atom feed
GitHub action based build and deploy
support old blog URLs (HTML URL redirects to new url structure)
serve js talks/slides directly from Hakyll
generated sitemap.xml
integrate Google Analytics

Getting Hakyll Setup

First things first! I like the following layout when setting up a basic Haskell project:

$ tree -L 1
.
├── CNAME
├── LICENSE
├── README.md
├── css
├── drafts
├── images
├── index.html
├── lambdafoo.cabal
├── main
├── pages
├── posts
├── talks
└── templates

Initially I used cabal init --cabal-version=2.4 --license=BSD3 -p lambdafoo.com to get a skeleton project with a reasonable cabal file. Then I moved things around, making main/site.hs the entry point for running Hakyll and adding a TODO list of features into the README.md

 * ~~basic pages~~
   * ~~about~~
   * ~~talks~~
   * ~~archive~~
 * ~~individual post with code highlighting~~
 * ~~rss/atom feed~~
 * ~~add rss/atom feed to archive page~~
 * ~~github action build and deploy~~
 * ~~html url redirects to new url structure~~
 * ~~serve js talks/slides directly from Hakyll~~
 * configure dependabot for Haskell
 * ~~add generated sitemap.xml~~
 * ~~integrate Google Analytics~~

These directories are used for Hakyll content:

pages - includes various regular pages on the site like talks or about me
css - includes the style sheets for the HTML
images - is the static images for the site
drafts - containts the draft posts I’m writing
talks - contains static JS/HTML based slides from presentations that I want to serve directly from the site
templates - site templates in a markup language for doing page layouts
CNAME - is Github Pages hosting to tell it the DNS name for the site

The trickiest part was getting a version of the cabal file that worked with GHC 8.10 and a recent version of Hakyll. I ended up needing to pin Hakyll as hakyll ^>= 4.13 and left the other dependencies floating.

executable site
  main-is:             site.hs
  hs-source-dirs:      main
  default-language:    Haskell2010
  build-depends:
                       base      >= 4.6  && < 5
                     , binary    >= 0.5
                     , directory >= 1.2
                     , filepath  >= 1.3
                     , hakyll    ^>= 4.13
                     , blaze-html
                     , lens
                     , time
                     , aeson
                     , lens-aeson
                     , containers
                     , pandoc
                     , process   >= 1.6
                     , text      >= 1.2

At this point, I could have either continued setting up Hakyll or setup CI. I usually prefer setting up CI as early as possible in a project, so I stared there. Here is what that looks like:

Hakyll CI

There are a few options for cloud CI, and my requirements were simple: no cost, easy setup, and integration with GitHub pages where I host my site. It was a toss up between CircleCI and Github Actions, as I’ve had good experience with CircleCI, but Idecided to try Github Actions.

First, create a directory mkdir -p .github/workflows/ with a ci.yml file

name: CI
on:
  push:
    branches:
      - master
  pull_request:
    types:
      - opened
      - synchronize
jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        cabal: ["3.4.0.0"]
        ghc: ["8.10.7"]

The matrix section sets up a build for ghc 8.10.7 and cabal 3.4, which is enough for a simple blog, but is where you’d add extra options, for say a library. Next, we use some community GitHub Actions to checkout and setup Haskell.


    steps:
      - uses: actions/checkout@v2
      - uses: haskell/actions/setup@v1
        id: setup-haskell-cabal
        with:
          ghc-version: ${{ matrix.ghc }}
          cabal-version: ${{ matrix.cabal }}

Here we run cabal update to update our Hackage index and then setup some build caching for our dependencies. You can copy this directly and it should work:


      - name: Cabal Update
        run: |
          cabal v2-update
          cabal v2-freeze $CONFIG
      - uses: actions/cache@v2.1.4
        with:
          path: |
            ${{ steps.setup-haskell-cabal.outputs.cabal-store }}
            dist-newstyle
          key: ${{ runner.os }}-${{ matrix.ghc }}-${{ hashFiles('cabal.project.freeze') }}
          restore-keys: |
            ${{ runner.os }}-${{ matrix.ghc }}-

Then we run the cabal build and Hakyll site build.


      - name: Build Site
        run: |
          cabal v2-build $CONFIG
          cabal exec site build

Adding that into your repo’s main branch of your repo should yield a working CI. On top of that, I added a dependabot configuration to check that my GitHub Actions config was up to date.

Add a file dependabot.yml to .github:

version: 2
updates:
  - package-ecosystem: "github-actions"
    directory: "/"
    schedule:
      interval: "daily"
    commit-message:
      prefix: "GA"
      include: "scope"
    labels:
      - "CI"

This will check that your GitHub Actions use the latest version and open a PR to bump versions if you aren’t. Something like this for Haskell would be super sweet.

Generating the Site

Let’s quickly walk through the contents of main/site.hs, but there are more in-depth tutorials on the main Hakyll site

{-# LANGUAGE OverloadedStrings #-}

import Hakyll

main :: IO ()
main = hakyll $ do

Here we import Hakyll, setup overloaded strings, and create a main function:

  match "images/*" $ do
    route idRoute
    compile copyFileCompiler

  match "css/*" $ do
    route idRoute
    compile compressCssCompiler

Serve stylesheets and images from directories css and images, respectively. This is standard code that can be copied directly, it basically copies the files into the final static site directory _site.

Next I wanted to serve some old talk slides written in HTML and JavaScript directly from my site. I couldn’t find any posts talking about how to do this, but after thinking about it, I realized that I just wanted to serve static assets again like the css and images above. So that’s exactly what was required! If course, I lie. I had to fix a few hard coded paths in the HTML but otherwise it worked.

The layout for talks looks like:

talks
├── erl-syd-2012-webmachine
├── fp-syd-freer-2016
├── fp-syd-higher-2015
├── lambda-jam-2014-raft
├── lambda-jam-2015-ocaml-functors
├── lambda-jam-2016-performance
├── roro-2012-riak
└── scala-syd-2015-modules

So I needed an extra wildcard in my match statement:

  match "talks/**/*" $ do
    route idRoute
    compile $ copyFileCompiler

This content then gets served under lambdafoo.com/talks/scala-syd-2015-modules/. In retrospect, this is an obvious solution to serving any static content generated outside of Hakyll, but it did take me a while to realise it.

Next we load the individual blog posts:

  match "posts/*" $ do
    route $ setExtension "html"
    compile $
      pandocCompiler
        >>= loadAndApplyTemplate "templates/post.html" postCtx
        -- Used by the RSS/Atom feed
        >>= saveSnapshot "content"
        >>= loadAndApplyTemplate "templates/default.html" postCtx
        >>= relativizeUrls

Authoring Posts

After getting a few simple things out of the way, the Markdown-based workflow already worked with Hakyll, so there’s nothing really to see there. Creating a simple YAML file with the following meta-data and content is enough to get a simple post working.

---
title: Hakyll Blog setup
author: Tim McGilchrist
date: 2021-02-01 00:00
tags: haskell
description: How I setup my blog with Hakyll
---

Content of post

Deploying

I have a domain lambdafoo.com that I use to serve my blog. Github pages has up-to-date information on how to set this up with your DNS provider.

Here is where choosing Github Actions really pays off! There is a community action to do it all! Assuming you’ve turned on GitHub Pages in the settings for you repo, add this to the end of the ci.yml:

      - name: Deploy 🚀
        uses: JamesIves/github-pages-deploy-action@4.1.5
        if: github.ref == 'refs/heads/master'
        with:
          token: ${{ secrets.GITHUB_TOKEN }}
          branch: gh-pages # The branch the action should deploy to.
          folder: _site # The folder the action should deploy.
          clean: true # Automatically remove deleted files from the deploy branch

This deploys the output of the Build Site step from folder _site to the branch gh-pages on all master builds (controlled via if: github.ref == 'refs/heads/master').

On the first build, there is a bit of lag to deploy. I had issues with my DNS setup and two personal repositories using the same CNAME values. Apart from that, the process was smooth, and I quickly had a new version working. Again, if you setup dependabot, it will check that this action is up-to-date.

Resources

OCaml CI with CircleCI

Tim McGilchrist — Tue, 02 Feb 2021 00:00:00 UT

OCaml CI with CircleCI

February 2, 2021

I wanted to share a simple configuration for running OCaml projects in CircleCI. CircleCI is what I’m using at work plus it supports a killer feature that you can re-run a failing build getting an SSH session into the machine. This one feature has saved me loads of time in debugging CI configuration and flakey tests. Most of the other features are similar to other cloud CI solutions, the documentation is solid and setting up more advanced workflows is easy enough.

Our requirements are simple to build OCaml projects that use OPAM and have simple test requirements (just running unit tests).

First we add a file .circleci/config.yml with:

version: 2
jobs:
  build-4.10:
    docker:
      - image: ocaml/opam:ubuntu-18.04-ocaml-4.10
    steps:
      - checkout
      - run:
          name: Build
          command: ./bin/ci

workflows:
  version: 2
  build:
    jobs:
      - build-4.10

This creates a job build-4.10 using docker image ocaml/opam2:4.10 published by the OCaml team. The steps defines the commands to run, we use a built in checkout command provided by CirclCI and then a run command that executes a shell script ./bin/ci.

You could use your own docker container in place of ocaml/opam2:4.10, maybe pre-installing some things or using a different linux distro. How to run the command could also be inlined rather than being its own file. I chose to make it a file for two reasons, when you SSH to debug a script you can just re-run ./bin/ci, and you can re-use the steps between local and CI.

Now to the shell script

#!/bin/sh -eux

WORKING_DIR=$(pwd)

# Install some extras
sudo apt-get install m4 pkg-config -y

# Make sure opam is setup in your environment.
eval `opam config env`
opam update

# Install each package as a dev dependency
find . -type f -name '*.opam' | sort -d | while read P; do
  opam pin add -n "$(basename -s .opam ${P})" . -y --dev
  opam install --deps-only "$(basename -s .opam ${P})"  -y
  eval `opam config env`
done

# Run the builds and
dune build
dune runtest

This configuration is from a project with multiple opam files so we have a find to locate all those files. One gotcha with this is it’ll sort the file names which may not match the dependency order, if that is the case you will need to explicitly list them. If you have a single opam file then replace that with the following (replacing project-name with your project name).

opam pin add -n "project-name" . -y --dev
opam install --deps-only "project-name"  -y

Push that into your github main branch, then Set up Project in the circleci UI and you should be off and building. From here the circleci docs can help with setting up different builds based off branches. Adding other OCaml builds is as easy as duplicating the build-4.10 section in YAML, pointing it to another docker container like 4.08 and adding the new build name to workflows under jobs:.

There’s a working setup in my ocaml-bitbucket project. Good luck!

On EitherT

Tim McGilchrist — Fri, 22 Jun 2018 00:00:00 UT

On EitherT

June 22, 2018

In choosing Haskell as a language you sign up for a certain class of features and behaviours. e.g. lazy evaluation, static typing

This gives you a general point in the design space for general purpose languages but like all languages you are still left with a number of choices in building software. These choices are broad, diverse and hotly debated, sometimes they get labelled with Best Practices or the Right way. Like any good engineer you should recognise that everything involves trade-offs and that these labels are trying to hide that. There is not always one best way, an approach has positives and negatives. Knowing those trade offs and deliberately choosing an approach based off them is good engineering.

In programming language communities there are always bikeshedding arguments and Haskell is no different. I want to call out a particular point of view around using exceptions vs data types in Haskell when dealing with errors. Both are valid design points in a wider error handling design space. The exception path is widely associated with Snoyman, who has written much software and written extensively about this in Exceptions Best Practices in Haskell and in the Safe Exceptions package.

I’d like to highlight the negatives, as I see them, of that approach and suggest a different set of trade offs around modelling errors as data types using EitherT/ExceptT.

EitherT is a Monad Transformer built on the familar Either data type.

data Either a b
  = Left a
  | Right b

where typically Left represents some failure case in this context and Right represents success. Another formulation from OCaml community is:

type ('a,'b) result
  = Ok of 'a
  | Error of 'b

which is more explicit about what the two constructors represent.

Async Exceptions

Back in the beginning, actually in 2000/01, asynchronous exceptions were added to Haskell. [2] Quoting Simon Marlow:

Basically it comes down to this: if we want to be able to interrupt purely functional code, asynchronous exceptions are the only way, because polling would be a side-effect.

So Haskell has async exceptions whether you like them or not, the ship has sailed. This means that any code in IO can throw a runtime exception, further any thread can receive an async exception.

So, how should we best deal with this reality and structure our code?

Exceptions

We have exceptions; lets use them.

To start doing that you need to define your own custom exception type.

data VapourError =
    InsufficientFunds
  | ItemUnavailable Text
  | MachineMalfunction Text
  deriving Typeable

-- Write a reasonable Show instance for each error
instance Show VapourError where
  show a = case a of
    InsufficientFunds -> "Insufficient funds."
    ItemUnavailable i -> "Item " ++ i ++ " unavailable."
    MachineMalfunction e -> "Hardware malfunction " ++ e ++ "."

instance Exception VapourError

The three steps we need are:

Build the custom error type as a data type
Provide a show instance, this could be generated but your error messages would not be great.
Make your custom error an instance of Exception

At this point you can use throw, catch and handle with your custom error type.

runVendingMachine :: VendingMachineState -> Coin
                   -> IO Product
runVendingMachine state coin = do
  unless (coin > 0) $ throw InsufficientFunds
  dispenseItem state coin

dispenseItem :: VendingMachineState -> Coin
             -> IO VendingMachineState
dispenseItem = ....

Looking at the signature of runVendingMachine you can see that it returns a Product by running a computation in IO. The problem you have when looking at that code is the signature doesn’t give you any indication that it might fail outside of the IO which we saw earlier can fail with anything. So as a consumer of this function, how are you to know what exceptions to catch. Your options are:

Catch all exceptions - clearly dangerous and wrong
Catch a subset of exceptions - better but tricky to do correctly

The first option is dangerous as catching all exceptions includes asynchronous exceptions like stack/heap overflow, thread killed and user interrupt. The documentation in safe-exception is particularly helpful here and I recommend you read it thoroughly, it is well written. The short version is you should only catch certain exceptions, trying to handle StackOverflow, HeapOverflow and ThreadKilled exceptions could cause your program to crash or behave in unexpected ways.

The second option is error prone. The process for finding the possible exceptions involves reading the source code and reading the haddock docs, with the goal of finding the set of sensible exceptions you need to put into a catch or handle call. Have you found all the places an exception might be thrown? What about if you pull in a new dependency, does it throw exceptions? What about a sub-dependency of a dependency?

What about the functions runVendingMachine calls? And their functions? To me it feels like going back to Javascript or Ruby land and giving up on some of the benefits of a typed language. I want the types to help me find the places I need to consider the errors, just like pattern matching does for data types.

The other less obvious (perhaps) issue is that you force the consumers of your function to know all the gory details of exceptions in Haskell, which ones are safe to catch and what to do. Getting this right is hard and tricky, and really belongs in a library so that it can be written one and reused.

Finally the behaviour of a Haskell system in production is such that throwing an exception would yield you exactly what the show instance for VapourError is. It wouldn’t give you a classic stack trace (unless you set that up) so you loose context where the exception was raised and what was happening around it. At a previous workplace we spend many weeks tracking down SSL and connection reset exceptions that occured in a base library but bubbled out through multiple layers of application code. It wasn’t fun.

This style is perfect for a quick script to munge some data, or an ICFP programming contest

If you really need exceptions, use bracket pattern or safe-exceptions like library. Keep the complexity contained and code needs to be written very carefully.

Data Types

We mentioned data types earlier, using data types to model your computation is the natural approach in Haskell. You build a data type that accurately reflects the data or states that you want to model. We even did it for the custom VapourError type earlier.

Extending that we will use a particular data type EitherT to model errors. This is a monad transformer with an Either where the monad could be anything.

In context it would look something like:

crankHandle :: Int -> EitherT VapourError IO Product
-- or
crankHandle :: Monad m => Int -> EitherT VapourError m Product
-- or
crankHandle :: MonadIO m => Int -> EitherT VapourError m Product

The type of our error is present in the type of our function, a familar situation. If the monad m isn’t IO then we have a good degree of confidence that none of the base exceptions will be present.

Solution

# Build a data type that represents the possible error states
data VapourError =
    InsufficientFunds
  | ItemUnavailable Text
  | MachineMalfunction Text

# Provide a function for turning errors into text
renderVapourError :: VapourError -> Text
renderVapourError = ...

# Usage site
runVendingMachine :: VendingMachineState -> Coin -> EitherT VapourError IO Product
runVendingMachine = ...

Either - Examples

Examples of substantial pieces of code using EitherT to organise errors.

mafia - https://github.com/haskell-mafia/mafia/search?utf8=✓&q=EitherT&type=
boris - https://github.com/markhibberd/boris/search?utf8=✓&q=EitherT&type=
traction - https://github.com/markhibberd/traction/search?utf8=✓&q=EitherT&type=
mismi - https://github.com/nhibberd/mismi/search?q=EitherT&type=Code&utf8=✓

Either Advantages

function signatures clearly indicate error states
exhaustive pattern matching indicates where errors have/have not been handled
requires explicit composition of error data types

Basically the compiler helps you handle the various states required using the type system.

Exception - Examples

Example of code using Exceptions to organise errors

http-client - using non-200 response codes as exceptions
stack - internally follows an exception style

Exception Disadvantages

The main downsides as I see it to exception oriented code are:

exception throwing functions compose too easily you are not forced to think about what it means.
no stack traces by default in Haskell mean you lose context.
handling exceptions requires knowledge about the internals of dependencies and how they use exceptions.

Here the compiler is less helpful in guiding you, giving little or no help with handling particular exceptions or giving compile errors for new exceptions that you might need to consider.

Supporting Libraries

The supporting libraries for this pattern of error handling are:

transformers-either - Provides a type alias type EitherT = ExceptT plus addition operators.
transformers-bifunctor - Provies bifunctors over a monad transformer.

There is nothing revolutionary about transformers-either, you could roll your own version easily or use the ExceptT transformer provided in the transformers package (adding any helper functions you need). The value codes in a structured, consious handling of errors and using the Haskell compiler to help.

Conclusion

The primary value of avoiding exceptions is that it makes error behavior explicit in the type of the function. If you’re in an environment where everything might fail, being explicit about it is probably a negative. But if most of your function calls are total, then knowing which ones might fail highlights places where you should consider what the correct behavior is in the case of that failure. Remember that the failure of an individual step in your program doesn’t generally mean the overall failure of your code.

It’s a little bit like null-handling in languages without options. If everything might be null, well, option types probably don’t help you. But if most of the values you encounter in your program are guaranteed to be there, then tracking which ones might be null be tagging them as options is enormously helpful, since it draws your attention to the cases where it might be there, and so you get an opportunity to think about what the difference really is.

Yaron Minsky

References

OCaml FFI bindings

Tim McGilchrist — Mon, 17 Aug 2015 00:00:00 UT

OCaml FFI bindings

August 17, 2015

One thing that always comes up with your favourite language is how do you use libraries written in another language. Typically this involves needing to talk to a particular C library, either because it’s faster than a native one or just that it is already written.

For OCaml there is the ctypes library for binding to C libraries using pure OCaml. Written by the people at the good people at OCaml Labs http://ocaml.io

The core of ctypes is a set of combinators for describing the structure of C types – numeric types, arrays, pointers, structs, unions and functions. You can use these combinators to describe the types of the functions that you want to call, then bind directly to those functions – all without writing or generating any C!

Lets go through a simple example binding to libyaml. Here’s a declaration form libyaml to get the version string.

/**
 * Get the library version as a string.
 *
 * @returns The function returns the pointer to a static string of the form
 * @c "X.Y.Z", where @c X is the major version number, @c Y is a minor version
 * number, and @c Z is the patch version number.
 */

YAML_DECLARE(const char *)
yaml_get_version_string(void);

To bind to this we need to declare a compatible signature for our OCaml code.


open Ctypes
open Foreign

let get_version_string =
  foreign "yaml_get_version_string"
    (void @-> returning string)

We’re pulling in Ctypes and Foreign. Then the let binding is using foreign with the name of the c method we want to call plus a type signature for that method.

Next we need some calling code to print out the version string.


open Core.Std

let () =
  let version_string = get_version_string() in
  printf "Version: %s\n" version_string

Assuming you’ve got opam installed you can get the dependencies opam install core ctypes and compile the whole thing.


> corebuild -pkg ctypes.foreign -lflags -cclib,-lyaml version_string.native
...
./version_string.native
Version: 0.1.6

We’ve got bindings to a native C library without writing any C.

More complicated example involving passing an allocated string back from C, lets look at the proc_pidpath call from OSX. This particular library call takes a process id (PID) and returns back

int
proc_pidpath(int pid, void * buffer, uint32_t  buffersize)

To bind to this call we again define a compatible signature.

let pidpath =
    foreign ~check_errno:true "proc_pidpath"
            (int @-> ptr char @-> int @-> returning int)

The arguments simply mirror those for the C library call, along with a new argument check_errno which indicates the c library sets errno if it encounters a problem.

http://stackoverflow.com/questions/22651910/returning-a-string-from-a-c-library-to-ocaml-using-ctypes-and-foreign

Ctypes provides native bindings for most things you’ll need. There’s all sorts of pointers and types matching pretty much every native C type you’ll need here.