Perpetually Curious Blog

Debugging OCaml with Emacs

true

Debugging OCaml with Emacs

March 25, 2024

This post started as a summary of my March Hacking Days effort at Tarides.

I have been working on improving the debugging situation for OCaml and wanted to see how easily I could setup debug support in Emacs using DAP. Debug Adapter Protocol (DAP) is a wire protocol for communicating between an editor or IDE and a debug server like LLDB or GDB, providing an abstraction over debugging, similar to how Language Server Protocol (LSP) provides language support for editors.

OCaml comes with support for debugging native programs with GDB and LLDB, and bytecode code programs using ocamldebug and earlybird. In this post we will cover setting up and debugging both kinds of programs. I am using an M3 Mac so all examples will show ARM64 assembly and macOS specific paths. The same setup should work on Linux. I use prelude to configure my Emacs with my own customistations in .emacs/personal, adjust for your own personal Emacs setup.

Let’s start with the following program to compute Fibonacci sequence:

(* fib.ml *)
let rec fib n =
  if n = 0 then 0
  else if n = 1 then 1
  else fib (n-1) + fib (n-2)

let main () =
  let r = fib 20 in
  Printf.printf "fib(20) = %d" r

let _ = main ()

And this dune configuration in the same directory:

(executable
 (name fib)
 (modules fib)
 (modes exe byte))

And this dune-project configuration in the same directory:

(lang dune 3.11)
(map_workspace_root false)

Create an empty opam switch in same directory and install dune:

$ opam switch create . 5.1.1 --no-install
$ opam install dune

This gives us everything we need to try out all the different debuggers.

Emacs configuration

Emacs has dap-mode that provides everything we need for DAP integration. Install it using M-x package-install and choose the dap-mode package. I have the following lines in my .emacs/personal/init.el that will require the packages we need and setup some convenient key bindings:

; Require dap-mode plus the two extra files we need
(require 'dap-mode)
(require 'dap-codelldb)
(require 'dap-ocaml)

; Setup key bindings using use-package.
(use-package dap-mode
  :bind (("C-c M-n" . dap-next)
         ("C-c M-s" . dap-step-in)
         ("C-c M-a" . dap-step-out)
         ("C-c M-w" . dap-continue)))

Save and restart Emacs, then we can move onto setting up bytecode debugging.

Bytecode debugging

The earlybird project provides DAP support for debugging OCaml bytecode. OCaml has a bytecode compiler that produces portable bytecode executables which can be run with ocamlrun, the interpreter for OCaml bytecode. Earlybird uses the (undocumented) protocol of ocamldebug to communicate with a bytecode executable, inheriting the same functionality as ocamldebug.

Start by installing the earlybird package:

opam install earlybird

Then create a file in .vscode/launch.json with this configuration:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "OCaml earlybird (experimental)",
            "type": "ocaml.earlybird",
            "request": "launch",
            "program": "./_build/default/fib.bc",
            "stopOnEntry": true,
            "cwd": "${workspaceFolder}"
        },
    ]
}

Build the project with dune build to create the fib.bc bytecode file. Finally start a debugger by running M-x dap-debug. It will prompt you to choose a session, we want OCaml earlybird (experimental) from the named configuration above. It will start earlybird and immediately stop it before executing any OCaml code.

To set breakpoints you need to open the OCaml source file in _build/default/fib.ml and click on the source lines you want to stop at. Here is what it looks like after a few recursions. Use the buttons to control the debugger or use the keybindings we added to step through the execution. Curiously they are not pre-defined but here I’ve tried to reuse mappings from ocamldebug.

Native debugging

OCaml can also produce native binaries that can be debugged using native debuggers like GDB or LLDB, depending on your platform. Here we will use LLDB on macOS, but Linux LLDB works too – just change the name of the program you want to debug.

Add another section to .vscode/launch.json for starting lldb.

        {
            "type": "lldb",
            "request": "launch",
            "name": "LLDB with ocamlopt",
            "program": "./fib.exe",
            "args": [],
            "stopOnEntry": true,
            "cwd": "${workspaceFolder}"
        },

Run M-x dap-codelldb-setup which will download the codelldb DAP program that we are using to communicate with LLDB. This gets installed into .extension/vscode/codelldb. Now compile the fib program with ocamlopt -g -o fib.exe fib.ml and startup a debugger session with M-x dap-debug choose the LLDB with ocamlopt option. You should see something similar to:

Now DAP as setup with LLDB and macOS, is a little broken and is missing support for setting breakpoints on symbols and line numbers in source code. Fixes for both will be comming soon. Linux LLDB works better in this scenario. Setting breakpoints using line numbers in source code requires fixes to the OCaml compiler, while setting breakpoints on symbols is supported in codelldb but not exposed into dap-mode.

The second option is debugging native binaries built with Dune, this is slightly different for two reasons. First Dune places the executable into _build/default/fib.exe and second Dune produces slightly different symbols. Start by adding a new section in .vscode/launch.json for Dune built executables:

        {
            "type": "lldb",
            "request": "launch",
            "name": "LLDB with Dune",
            "program": "./_build/default/fib.exe",
            "args": [],
            "stopOnEntry": true,
            "cwd": "${workspaceFolder}"
        },

Remove the old fib.exe in the project directory (dune will complain if you don’t) and run dune build. Startup a new DAP session with M-x dap-debug and choose LLDB with Dune. You should see the same debugger session as before.

Conclusion

Debugging OCaml with DAP inside Emacs is possible. There are working options for both bytecode programs and native programs which work reasonably well.

Use dap-mode with:

(require 'dap-mode)
(require 'dap-codelldb)
(require 'dap-ocaml)

(use-package dap-mode
  :bind (("C-c M-n" . dap-next)
         ("C-c M-s" . dap-step-in)
         ("C-c M-a" . dap-step-out)
         ("C-c M-w" . dap-continue)))

and a launch.json of

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "OCaml earlybird (experimental)",
            "type": "ocaml.earlybird",
            "request": "launch",
            "program": "./_build/default/fib.bc",
            "stopOnEntry": true,
            "cwd": "${workspaceFolder}"
        },
        {
            "type": "lldb",
            "request": "launch",
            "name": "LLDB with Dune",
            "program": "./_build/default/fib.exe",
            "args": [],
            "stopOnEntry": true,
            "cwd": "${workspaceFolder}"
        },
        {
            "type": "lldb",
            "request": "launch",
            "name": "LLDB with ocamlopt",
            "program": "./fib.exe",
            "args": [],
            "stopOnEntry": true,
            "cwd": "${workspaceFolder}"
        }
    ]
}

The same setup will work under VSCode with the CodeLLDB and OCaml Platform extensions installed. Happy Emacs debugging.

Future Work

I’m working on improving the OCaml debugging experience on macOS and Linux. Currently the macOS LLDB experience is behind that on Linux LLDB, so that is the first goal. Then I want to improve the DWARF encodings for OCaml and generally improve the native debugger experience.

ICFP 2022 Review

true

ICFP 2022 Review

October 11, 2022

I wrote up a highlights of ICFP 2022 for the Tarides blog. It was great to get back to in-person conferences again and getting the chance to meet people. Thanks to my employer Tarides for covering the cost.

For me personally the OCaml Workshop was fantastic beginning to end, read the blog post for the full details. Outside of OCaml I spent time in the Haskell Implementors Workshop, hearing about the new features for GHC and excited by the progress that Cabal is making.

My take away research topics are:

Typed Effect Systems especially Koka.
Delimited Continuations for both OCaml and Haskell.
Lockfree data structures, Reagents and STM.

OCaml with Emacs in 2022

true

OCaml with Emacs in 2022

September 7, 2022

I am revisiting my OCaml setup post from 2021 because I needed to setup a new macOS machine. The official OCaml site points newcomers to Visual Studio Code which is a fine choice to get started. However I am using Emacs and have done so for over 20 years, and did not find a good description of how to set things up with Emacs. Here I could digress into why Emacs but I will just strongly encourage any developers to invest heavily in learning their editor with Emacs being a fine choice.

Beginnings

On macOS I use the pre-compiled GUI version of Emacs from emacsformacosx preferring that over compiling it by hand or using the version in homebrew. Both of which I have done previously but find the emacsformacos version saves me time and effort, plus the GUI version was removed from homebrew at some point in the past.

Next I choose to use an Emacs distro over the base Emacs setup, again this is a time saving choice and especially useful if you are new to Emacs. Use Prelude, which is an enhanced Emacs 25.1+ distribution that should make your experience with Emacs both more pleasant and more powerful. It gives a great modern setup for Emacs with minimal fuss. Once that is cloned and installed the Lisp config begins.

Prelude onfiguration

Prelude provides a base experience of packages available with some configuration. The configuration goes into ~/.emacs.d/tsmc/prelude-modules.el where tsmc is your macOS user. The same path would apply for Linux. A sample prelude-modules.ml is provided in https://github.com/bbatsov/prelude/blob/master/sample/prelude-modules.el

I choose the following modules to enable with prelude-lsp and prelude-ocaml being the core OCaml related choices. The other bits are optional but useful for editing lisp or navigating code.

(require 'prelude-ivy) ;; A mighty modern alternative to ido
(require 'prelude-company)
(require 'prelude-emacs-lisp)
(require 'prelude-lisp) ;; Common setup for Lisp-like languages
(require 'prelude-lsp) ;; Base setup for the Language Server Protocol
(require 'prelude-ocaml)

Now for the customisation to get LSP working properly. There are 3 main pieces:

direnv - for automatically configuring shell environments
ocaml-lsp-server - the core lsp implementation for OCaml
lsp-mode - the Emacs mode that drives everything

direnv the necessary magic

direnv is a small program to load/unload environment variables based on $PWD (current working directory). This program ensures that when you open an OCaml file the correct opam switch is chosen and the tools installed in that switch are made available to Emacs. Opam is the OCaml package manager and manages local sandboxes of packages called switches. Without direnv Emacs will not find the correct tools and you would need to mess with Emacs PATHS to get it right. I have done that and it is much simplier with direnv.

So brew install direnv and create a .envrc file in an OCaml project with eval $(opam env --set-switch) inside. Compared to my previous post I have been using local opam switches which exist inside an OCaml project. They are created as opam switch create . 4.14.0 --with-test --deps-only -y and appear as an _opam directory in the project root. Next run direnv allow to tell direnv it is safe to use the .envrc file in this directory. The reason I have switched is I often need to test different OCaml versions so removing the _opam directory and recreating it is the simplier option.

OCaml LSP Server

OCaml LSP server needs to be installed in the current switch so run opam update && opam install ocaml-lsp-server -y, this will make ocaml-lsp-server available to Emacs via direnv.

There is an opportunity here to use Emacs Lisp to install ocaml-lsp-server if it was missing or to allow lsp-mode to download and install it itself. I would like to have this working in future. Next back into Lisp.

Emacs LSP mode

Create a file init.el in ~/.emacs.d/tsmc/ substituting your Unix user name for tsmc. Thanks to emacs-prelude the configuration is very small.

;;; init.el --- @tsmc configuration entry point.

(prelude-require-packages '(use-package direnv))
;; Use direnv to select the correct opam switch and set the path
;; that Emacs will use to run commands like ocamllsp, merlin or dune build.

(use-package lsp-mode
  :hook
  (tuareg-mode . lsp))
;; Attach lsp hook to modes that require it, here we bind to tuareg-mode rather than
;; prelude-ocaml. For unknown reasons the latter does not bind properly and does not
;; start lsp-mode

(provide 'tsmc)
;;; init.el ends here

We require a few packages use-package and direnv, and then tell Emacs to start lsp-mode when tuareg-mode is started. Tuareg-mode is one of the OCaml modes available for Emacs, the other being caml-mode which I have not really used. Now quit and restart Emacs. Opening an ml file inside the project you started earlier and ocaml-lsp should startup.

The types for expressions and modules will display on mouse hover or beside the definition. Hovering the mouse over a function or type will display the type plus the documentation comments for it. A successful dune build for the project is required to generate the data used by ocaml-lsp-server. At this point in time prelude relies on merlin an assistant for editing OCaml code, that is used by ocaml-lsp-server internally but also available as standalone tool. So I often have both installed, opam install merlin should be enough to get it installed too.

At this point I am mostly happy, the types and documentation displays as required. Navigating using M-. shows a preview of the type / function under point and return will take me to the definition. This is vastly improved in OCaml 4.14 (with the work on Shapes) which I have switched to for everything I can. Switching between ml and mli files is C-c C-a and more, simply visit the M-x describe-mode to show everything available.

The annoyances are more fundamental to how LSP wants to work. It uses what I am calling a push based interaction, where it generates the information for types and documentation in the background and pushes it into the Emacs buffer. You never need to ask what is the type, it will display for you. Sometimes I want to ask for what a type is inside an expression, with LSP you are encouraged to mouse hover over something rather than having a key binding for it. So far I haven’t found the lisp function that drives the hover functionality but when I do I will bind it to a key. The second issue is also around mouse usage to drive LSP functionality like rename or annotate types. I would strongly prefer a key chord driven approach to that. Again I will set this up once I find the right lsp functions. For now I use C-c C-t from merlin to summon the types for things.

Overall the experience is solid. Types and docs appear as required. Navigation works. The speed has been good so far. LSP mode is less janky than it was 1 year ago.

Alternatives

There is a fine alternative LSP mode, Eglot for Emacs. It takes a more minimal approach and uses a pull based interaction. Where you ask for the information based on key bindings vs the information being pushed at you via UI elements. For example, the type of a function is requested rather than shown by default.

The corresponding configuration I was using previously is:

(use-package eglot
  :config
  (define-key eglot-mode-map
    (kbd "C-c C-t") #'eldoc-print-current-symbol-info)

  :hook
  ((tuareg-mode . eglot-ensure)))

Again using use-package to configure the mode, the hooks are triggering Eglot to be loaded when tuareg-mode is. Using the eglot-ensure function which starts an Eglot session for current buffer if there isn’t one. No further configuration is needed in Emacs as Eglot knows the LSP server is called ocamllsp and will look for it on the Unix PATH.

Summary

Getting started with OCaml using Emacs can be a struggle. Emacs is a fine editor but the documentation can be difficult to handle. Hopefully following through this setup will yield a working Emacs / LSP setup for OCaml.

In future I want to try binding more things to keys so I use the mouse less and streamline the installing of the ocaml lsp server. Then after that adding support for more interesting code interactions like extracting modules or hoisting let bindings would be nice to have. Happy hacking!

Getting Started with OCaml in 2021

false

Getting Started with OCaml in 2021

October 29, 2021

OCaml is an awesome language with many fine features. I enjoy using it immensely!

Unfortunately, it suffers from a perceived weakness in how to get started. Like any new skill, there can be a learning curve. The tools are all there, but combining them for a good developer experience might seem difficult at first.

Often I’ve found that the barrier for getting into a new langauge is less about the new features of that language and more about learning the tools to become productive in that language. The package managers, build tools, and editor integration of a new language can be confusing, making for an awful experience.

Perhaps my opinionated guide to getting started with OCaml in 2021 will help reduce any mental blocks against trying out this excellent language.

Install Opam

First it’s necessary to install OCaml and Opam. Opam is the default package manager for OCaml projects. Ignore the other options for now, once you know more about what you want, you can make an informed choice. For now if you speak OPAM, you’ll get the most out of the community.

On Linux, use your local package manger, e.g., apt-get install opam for Debian and apt install opam for Ubuntu. For MacOS, use homebrew brew install opam. I’ll assume if you run something else, you can handle looking up how to install things.

On my Mac I get Opam 2.1.0:

$ opam --version
2.1.0

Once you’ve got Opam installed, you should be able to move on to the next step.

Choose an OCaml Version

I strongly recommended that you pick a single OCaml version that your project will compile against. Supporting multiple compiler versions is possible and usually not too diffcult, but it complicates the process right now.

Running opam switch list-available will show you a long list of every possible OCaml compiler. Choose the latest mainline compiler identifed by Official release X.XX.X where currently the latest is 4.13.0. Ignore the others.

opam switch list-available
...
ocaml-variants                         4.12.0+domains                         OCaml 4.12.0, with support for multicore domains
ocaml-variants                         4.12.0+domains+effects                 OCaml 4.12.0, with support for multicore domains and effects
ocaml-variants                         4.12.0+options                         Official release of OCaml 4.12.0
ocaml-base-compiler                    4.12.1                                 Official release 4.12.1
ocaml-variants                         4.12.1+options                         Official release of OCaml 4.12.1
ocaml-variants                         4.12.2+trunk                           Latest 4.12 development
ocaml-base-compiler                    4.13.0~alpha1                          First alpha release of OCaml 4.13.0
ocaml-variants                         4.13.0~alpha1+options                  First alpha release of OCaml 4.13.0
ocaml-base-compiler                    4.13.0~alpha2                          Second alpha release of OCaml 4.13.0
ocaml-variants                         4.13.0~alpha2+options                  Second alpha release of OCaml 4.13.0
ocaml-base-compiler                    4.13.0~beta1                           First beta release of OCaml 4.13.0
ocaml-variants                         4.13.0~beta1+options                   First beta release of OCaml 4.13.0
ocaml-base-compiler                    4.13.0~rc1                             First release candidate of OCaml 4.13.0
ocaml-variants                         4.13.0~rc1+options                     First release candidate of OCaml 4.13.0
ocaml-base-compiler                    4.13.0~rc2                             Second release candidate of OCaml 4.13.0
ocaml-variants                         4.13.0~rc2+options                     Second release candidate of OCaml 4.13.0
ocaml-base-compiler                    4.13.0                                 Official release 4.13.0
ocaml-variants                         4.13.0+options                         Official release of OCaml 4.13.0
ocaml-variants                         4.13.1+trunk                           Latest 4.13 developmet
ocaml-variants                         4.14.0+trunk                           Current trunk
...

At this point, install the latest OCaml 4.13.0:

$ opam switch create 4.13.0

<><> Installing new switch packages <><><><><><><><><><><><><><><><><><><><>  🐫
Switch invariant: ["ocaml-base-compiler" {= "4.13.0"} | "ocaml-system" {= "4.13.0"}]

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><>  🐫
∗ installed base-bigarray.base
∗ installed base-threads.base
∗ installed base-unix.base
∗ installed ocaml-options-vanilla.1
⬇ retrieved ocaml-base-compiler.4.13.0  (https://opam.ocaml.org/cache)
∗ installed ocaml-base-compiler.4.13.0
∗ installed ocaml-config.2
∗ installed ocaml.4.13.0
Done.

You can start using this version by typing the following:

$ opam switch set 4.13.0

And verify which switch you are using:

$ opam switch show
4.13.0

When you work with several OCaml projects, it’s best to create a switch per project, as it keeps each project isolated and prevents issues with installing conflicting versions of libraries. For example, I use a naming scheme of ocaml-version-project-name, e.g., 4.13.0-ocurrent. Then in each project directory, run opam switch link 4.13.0-ocurrent to setup that named switch for that specific directory. Opam will take care of setting that switch in your shell when you change into that directory.

Creating Your Project Directory

For this step we need the Dune build tool, so go ahead and install it with opam install dune. Dune comes with a simple scaffolding command to create an empty project that is really useful to get started.

I’m calling my project box, so run:

$ dune init proj box
Success: initialized project component named box

In the project generated, we get a library component, a CLI, and a test component, which will all compile out of the box.

$ cd box
$ tree
.
├── bin
│   ├── dune
│   └── main.ml
├── box.opam
├── lib
│   └── dune
└── test
    ├── box.ml
    └── dune

3 directories, 6 files

Lets try a compile:

$ dune build @all
Info: Creating file dune-project with this contents:
| (lang dune 2.8)
| (name box)

Running the CLI:

$ dune exec bin/main.exe
Hello, World!

Each of the bin, lib, and test directories contains the source code in the form of *.ml files, along with a dune file which tells Dune how to build the source and on what libraries it depends. The box bin\dune file declares it’s an executable with a name box and depends on the box library.

(executable
 (public_name box)
 (name main)
 (libraries box))

Adding a Dependency

CLI tools require command line parsing, Cmdliner is a common library that implements CLI parsing. We need to add it in two places: first in the dune-project file, to get it installed, and then in bin/dune, to say where we’re using it.

One small digression, when generating our project, dune created an box.opam file. This describes our project to Opam, telling it what libraries it requires and what the project does. You need this if you ever publish a package for other people to use. Newer versions of Dune can generate the box.opam file from a dune-project file. Having a single source of information is helpful, so lets create that file:

(lang dune 2.8)
(name box)

(generate_opam_files true)

(package
 (name box)
 (depends
  (ocaml (>= 4.13.0))
  (cmdliner (>= 0.9.8)))
 (synopsis "Box cli"))

Remove the rm box.opam file to test the generation. Now run dune build @all to regenerate the Opam file. This file should be checked in, and any further edits should be at the top-level dune-project file, which should look like this:

$ cat box.opam
# This file is generated by dune, edit dune-project instead
opam-version: "2.0"
synopsis: "Box cli"
depends: [
  "dune" {>= "2.8"}
  "ocaml" {>= "4.13.0"}
  "cmdliner" {>= "0.9.8"}
  "odoc" {with-doc}
]
build: [
  ["dune" "subst"] {dev}
  [
    "dune"
    "build"
    "-p"
    name
    "-j"
    jobs
    "@install"
    "@runtest" {with-test}
    "@doc" {with-doc}
  ]
]

The final step is to actually install the cmdliner library. Run opam install . --deps-only -ty, which will look at the *.opam files present and install just their dependencies with the correct version bounds. The -y says yes to installing the packages. You can remove it if you like by pressing Y or if you want to review what will be installed. -t will run the package tests, which isn’t always necessary, but it’s sometimes useful for certain packages with native C components.

Alternatively you could run opam install cmdliner, as this doesn’t look at version constraints in *.opam files, you might not get what you expect.

Editor Tooling

Finally, you’ll want to get comfy with your chosen editor. If you have a preference, you should use the native LSP support in that editor, along with installing opam install ocaml-lsp-server. OCaml is standardising on the LSP protocol for editor interaction. If you have no editor preference, then start with VSCode and install the OCaml LSP package from the Marketplace.

Personally, I’m using Emacs with the LSP mode eglot, which works really nicely, along with some customisations to bind certain LSP actions to keys. I highly recommend getting into Emacs as an editor because the customisation via a fully-featured language, like Lisp, is fantastic if you live in your editor like I do.

This post is an update to an earlier post by Adam in 2017, and I hope this short tutorial helps get you started with OCaml!

Hakyll Blog setup

2021-08-27T00:00:00Z

Hakyll Blog setup

August 27, 2021

I wanted to port my blog across from an old Jeykll setup to Haykll. The Jekyll was out of date and keeping the required ruby tools installed when I swapped machines was a huge pain. I don’t write ruby much anymore.

Considering my options, I looked at Hugo and Hakyll, discarding Hugo because I don’t want to keep up with the JS churn, even though they have lots of great resources and themes available. So Hakyll seems like the best option. I already regularily write Haskell so the tools will be up to date and I can make it do everything I want by digging into the source code.

My requirements are:

Markdown based workflow
support basic pages
individual post with code highlighting
RSS/Atom feed
GitHub action based build and deploy
support old blog URLs (HTML URL redirects to new url structure)
serve js talks/slides directly from Hakyll
generated sitemap.xml
integrate Google Analytics

Getting Hakyll Setup

First things first! I like the following layout when setting up a basic Haskell project:

$ tree -L 1
.
├── CNAME
├── LICENSE
├── README.md
├── css
├── drafts
├── images
├── index.html
├── lambdafoo.cabal
├── main
├── pages
├── posts
├── talks
└── templates

Initially I used cabal init --cabal-version=2.4 --license=BSD3 -p lambdafoo.com to get a skeleton project with a reasonable cabal file. Then I moved things around, making main/site.hs the entry point for running Hakyll and adding a TODO list of features into the README.md

 * ~~basic pages~~
   * ~~about~~
   * ~~talks~~
   * ~~archive~~
 * ~~individual post with code highlighting~~
 * ~~rss/atom feed~~
 * ~~add rss/atom feed to archive page~~
 * ~~github action build and deploy~~
 * ~~html url redirects to new url structure~~
 * ~~serve js talks/slides directly from Hakyll~~
 * configure dependabot for Haskell
 * ~~add generated sitemap.xml~~
 * ~~integrate Google Analytics~~

These directories are used for Hakyll content:

pages - includes various regular pages on the site like talks or about me
css - includes the style sheets for the HTML
images - is the static images for the site
drafts - containts the draft posts I’m writing
talks - contains static JS/HTML based slides from presentations that I want to serve directly from the site
templates - site templates in a markup language for doing page layouts
CNAME - is Github Pages hosting to tell it the DNS name for the site

The trickiest part was getting a version of the cabal file that worked with GHC 8.10 and a recent version of Hakyll. I ended up needing to pin Hakyll as hakyll ^>= 4.13 and left the other dependencies floating.

executable site
  main-is:             site.hs
  hs-source-dirs:      main
  default-language:    Haskell2010
  build-depends:
                       base      >= 4.6  && < 5
                     , binary    >= 0.5
                     , directory >= 1.2
                     , filepath  >= 1.3
                     , hakyll    ^>= 4.13
                     , blaze-html
                     , lens
                     , time
                     , aeson
                     , lens-aeson
                     , containers
                     , pandoc
                     , process   >= 1.6
                     , text      >= 1.2

At this point, I could have either continued setting up Hakyll or setup CI. I usually prefer setting up CI as early as possible in a project, so I stared there. Here is what that looks like:

Hakyll CI

There are a few options for cloud CI, and my requirements were simple: no cost, easy setup, and integration with GitHub pages where I host my site. It was a toss up between CircleCI and Github Actions, as I’ve had good experience with CircleCI, but Idecided to try Github Actions.

First, create a directory mkdir -p .github/workflows/ with a ci.yml file

name: CI
on:
  push:
    branches:
      - master
  pull_request:
    types:
      - opened
      - synchronize
jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        cabal: ["3.4.0.0"]
        ghc: ["8.10.7"]

The matrix section sets up a build for ghc 8.10.7 and cabal 3.4, which is enough for a simple blog, but is where you’d add extra options, for say a library. Next, we use some community GitHub Actions to checkout and setup Haskell.


    steps:
      - uses: actions/checkout@v2
      - uses: haskell/actions/setup@v1
        id: setup-haskell-cabal
        with:
          ghc-version: ${{ matrix.ghc }}
          cabal-version: ${{ matrix.cabal }}

Here we run cabal update to update our Hackage index and then setup some build caching for our dependencies. You can copy this directly and it should work:


      - name: Cabal Update
        run: |
          cabal v2-update
          cabal v2-freeze $CONFIG
      - uses: actions/cache@v2.1.4
        with:
          path: |
            ${{ steps.setup-haskell-cabal.outputs.cabal-store }}
            dist-newstyle
          key: ${{ runner.os }}-${{ matrix.ghc }}-${{ hashFiles('cabal.project.freeze') }}
          restore-keys: |
            ${{ runner.os }}-${{ matrix.ghc }}-

Then we run the cabal build and Hakyll site build.


      - name: Build Site
        run: |
          cabal v2-build $CONFIG
          cabal exec site build

Adding that into your repo’s main branch of your repo should yield a working CI. On top of that, I added a dependabot configuration to check that my GitHub Actions config was up to date.

Add a file dependabot.yml to .github:

version: 2
updates:
  - package-ecosystem: "github-actions"
    directory: "/"
    schedule:
      interval: "daily"
    commit-message:
      prefix: "GA"
      include: "scope"
    labels:
      - "CI"

This will check that your GitHub Actions use the latest version and open a PR to bump versions if you aren’t. Something like this for Haskell would be super sweet.

Generating the Site

Let’s quickly walk through the contents of main/site.hs, but there are more in-depth tutorials on the main Hakyll site

{-# LANGUAGE OverloadedStrings #-}

import Hakyll

main :: IO ()
main = hakyll $ do

Here we import Hakyll, setup overloaded strings, and create a main function:

  match "images/*" $ do
    route idRoute
    compile copyFileCompiler

  match "css/*" $ do
    route idRoute
    compile compressCssCompiler

Serve stylesheets and images from directories css and images, respectively. This is standard code that can be copied directly, it basically copies the files into the final static site directory _site.

Next I wanted to serve some old talk slides written in HTML and JavaScript directly from my site. I couldn’t find any posts talking about how to do this, but after thinking about it, I realized that I just wanted to serve static assets again like the css and images above. So that’s exactly what was required! If course, I lie. I had to fix a few hard coded paths in the HTML but otherwise it worked.

The layout for talks looks like:

talks
├── erl-syd-2012-webmachine
├── fp-syd-freer-2016
├── fp-syd-higher-2015
├── lambda-jam-2014-raft
├── lambda-jam-2015-ocaml-functors
├── lambda-jam-2016-performance
├── roro-2012-riak
└── scala-syd-2015-modules

So I needed an extra wildcard in my match statement:

  match "talks/**/*" $ do
    route idRoute
    compile $ copyFileCompiler

This content then gets served under lambdafoo.com/talks/scala-syd-2015-modules/. In retrospect, this is an obvious solution to serving any static content generated outside of Hakyll, but it did take me a while to realise it.

Next we load the individual blog posts:

  match "posts/*" $ do
    route $ setExtension "html"
    compile $
      pandocCompiler
        >>= loadAndApplyTemplate "templates/post.html" postCtx
        -- Used by the RSS/Atom feed
        >>= saveSnapshot "content"
        >>= loadAndApplyTemplate "templates/default.html" postCtx
        >>= relativizeUrls

Authoring Posts

After getting a few simple things out of the way, the Markdown-based workflow already worked with Hakyll, so there’s nothing really to see there. Creating a simple YAML file with the following meta-data and content is enough to get a simple post working.

---
title: Hakyll Blog setup
author: Tim McGilchrist
date: 2021-02-01 00:00
tags: haskell
description: How I setup my blog with Hakyll
---

Content of post

Deploying

I have a domain lambdafoo.com that I use to serve my blog. Github pages has up-to-date information on how to set this up with your DNS provider.

Here is where choosing Github Actions really pays off! There is a community action to do it all! Assuming you’ve turned on GitHub Pages in the settings for you repo, add this to the end of the ci.yml:

      - name: Deploy 🚀
        uses: JamesIves/github-pages-deploy-action@4.1.5
        if: github.ref == 'refs/heads/master'
        with:
          token: ${{ secrets.GITHUB_TOKEN }}
          branch: gh-pages # The branch the action should deploy to.
          folder: _site # The folder the action should deploy.
          clean: true # Automatically remove deleted files from the deploy branch

This deploys the output of the Build Site step from folder _site to the branch gh-pages on all master builds (controlled via if: github.ref == 'refs/heads/master').

On the first build, there is a bit of lag to deploy. I had issues with my DNS setup and two personal repositories using the same CNAME values. Apart from that, the process was smooth, and I quickly had a new version working. Again, if you setup dependabot, it will check that this action is up-to-date.

Resources

OCaml CI with CircleCI

2021-02-02T00:00:00Z

OCaml CI with CircleCI

February 2, 2021

I wanted to share a simple configuration for running OCaml projects in CircleCI. CircleCI is what I’m using at work plus it supports a killer feature that you can re-run a failing build getting an SSH session into the machine. This one feature has saved me loads of time in debugging CI configuration and flakey tests. Most of the other features are similar to other cloud CI solutions, the documentation is solid and setting up more advanced workflows is easy enough.

Our requirements are simple to build OCaml projects that use OPAM and have simple test requirements (just running unit tests).

First we add a file .circleci/config.yml with:

version: 2
jobs:
  build-4.10:
    docker:
      - image: ocaml/opam:ubuntu-18.04-ocaml-4.10
    steps:
      - checkout
      - run:
          name: Build
          command: ./bin/ci

workflows:
  version: 2
  build:
    jobs:
      - build-4.10

This creates a job build-4.10 using docker image ocaml/opam2:4.10 published by the OCaml team. The steps defines the commands to run, we use a built in checkout command provided by CirclCI and then a run command that executes a shell script ./bin/ci.

You could use your own docker container in place of ocaml/opam2:4.10, maybe pre-installing some things or using a different linux distro. How to run the command could also be inlined rather than being its own file. I chose to make it a file for two reasons, when you SSH to debug a script you can just re-run ./bin/ci, and you can re-use the steps between local and CI.

Now to the shell script

#!/bin/sh -eux

WORKING_DIR=$(pwd)

# Install some extras
sudo apt-get install m4 pkg-config -y

# Make sure opam is setup in your environment.
eval `opam config env`
opam update

# Install each package as a dev dependency
find . -type f -name '*.opam' | sort -d | while read P; do
  opam pin add -n "$(basename -s .opam ${P})" . -y --dev
  opam install --deps-only "$(basename -s .opam ${P})"  -y
  eval `opam config env`
done

# Run the builds and
dune build
dune runtest

This configuration is from a project with multiple opam files so we have a find to locate all those files. One gotcha with this is it’ll sort the file names which may not match the dependency order, if that is the case you will need to explicitly list them. If you have a single opam file then replace that with the following (replacing project-name with your project name).

opam pin add -n "project-name" . -y --dev
opam install --deps-only "project-name"  -y

Push that into your github main branch, then Set up Project in the circleci UI and you should be off and building. From here the circleci docs can help with setting up different builds based off branches. Adding other OCaml builds is as easy as duplicating the build-4.10 section in YAML, pointing it to another docker container like 4.08 and adding the new build name to workflows under jobs:.

There’s a working setup in my ocaml-bitbucket project. Good luck!

On EitherT

2018-06-22T00:00:00Z

On EitherT

June 22, 2018

In choosing Haskell as a language you sign up for a certain class of features and behaviours. e.g. lazy evaluation, static typing

This gives you a general point in the design space for general purpose languages but like all languages you are still left with a number of choices in building software. These choices are broad, diverse and hotly debated, sometimes they get labelled with Best Practices or the Right way. Like any good engineer you should recognise that everything involves trade-offs and that these labels are trying to hide that. There is not always one best way, an approach has positives and negatives. Knowing those trade offs and deliberately choosing an approach based off them is good engineering.

In programming language communities there are always bikeshedding arguments and Haskell is no different. I want to call out a particular point of view around using exceptions vs data types in Haskell when dealing with errors. Both are valid design points in a wider error handling design space. The exception path is widely associated with Snoyman, who has written much software and written extensively about this in Exceptions Best Practices in Haskell and in the Safe Exceptions package.

I’d like to highlight the negatives, as I see them, of that approach and suggest a different set of trade offs around modelling errors as data types using EitherT/ExceptT.

EitherT is a Monad Transformer built on the familar Either data type.

data Either a b
  = Left a
  | Right b

where typically Left represents some failure case in this context and Right represents success. Another formulation from OCaml community is:

type ('a,'b) result
  = Ok of 'a
  | Error of 'b

which is more explicit about what the two constructors represent.

Async Exceptions

Back in the beginning, actually in 2000/01, asynchronous exceptions were added to Haskell. [2] Quoting Simon Marlow:

Basically it comes down to this: if we want to be able to interrupt purely functional code, asynchronous exceptions are the only way, because polling would be a side-effect.

So Haskell has async exceptions whether you like them or not, the ship has sailed. This means that any code in IO can throw a runtime exception, further any thread can receive an async exception.

So, how should we best deal with this reality and structure our code?

Exceptions

We have exceptions; lets use them.

To start doing that you need to define your own custom exception type.

data VapourError =
    InsufficientFunds
  | ItemUnavailable Text
  | MachineMalfunction Text
  deriving Typeable

-- Write a reasonable Show instance for each error
instance Show VapourError where
  show a = case a of
    InsufficientFunds -> "Insufficient funds."
    ItemUnavailable i -> "Item " ++ i ++ " unavailable."
    MachineMalfunction e -> "Hardware malfunction " ++ e ++ "."

instance Exception VapourError

The three steps we need are:

Build the custom error type as a data type
Provide a show instance, this could be generated but your error messages would not be great.
Make your custom error an instance of Exception

At this point you can use throw, catch and handle with your custom error type.

runVendingMachine :: VendingMachineState -> Coin
                   -> IO Product
runVendingMachine state coin = do
  unless (coin > 0) $ throw InsufficientFunds
  dispenseItem state coin

dispenseItem :: VendingMachineState -> Coin
             -> IO VendingMachineState
dispenseItem = ....

Looking at the signature of runVendingMachine you can see that it returns a Product by running a computation in IO. The problem you have when looking at that code is the signature doesn’t give you any indication that it might fail outside of the IO which we saw earlier can fail with anything. So as a consumer of this function, how are you to know what exceptions to catch. Your options are:

Catch all exceptions - clearly dangerous and wrong
Catch a subset of exceptions - better but tricky to do correctly

The first option is dangerous as catching all exceptions includes asynchronous exceptions like stack/heap overflow, thread killed and user interrupt. The documentation in safe-exception is particularly helpful here and I recommend you read it thoroughly, it is well written. The short version is you should only catch certain exceptions, trying to handle StackOverflow, HeapOverflow and ThreadKilled exceptions could cause your program to crash or behave in unexpected ways.

The second option is error prone. The process for finding the possible exceptions involves reading the source code and reading the haddock docs, with the goal of finding the set of sensible exceptions you need to put into a catch or handle call. Have you found all the places an exception might be thrown? What about if you pull in a new dependency, does it throw exceptions? What about a sub-dependency of a dependency?

What about the functions runVendingMachine calls? And their functions? To me it feels like going back to Javascript or Ruby land and giving up on some of the benefits of a typed language. I want the types to help me find the places I need to consider the errors, just like pattern matching does for data types.

The other less obvious (perhaps) issue is that you force the consumers of your function to know all the gory details of exceptions in Haskell, which ones are safe to catch and what to do. Getting this right is hard and tricky, and really belongs in a library so that it can be written one and reused.

Finally the behaviour of a Haskell system in production is such that throwing an exception would yield you exactly what the show instance for VapourError is. It wouldn’t give you a classic stack trace (unless you set that up) so you loose context where the exception was raised and what was happening around it. At a previous workplace we spend many weeks tracking down SSL and connection reset exceptions that occured in a base library but bubbled out through multiple layers of application code. It wasn’t fun.

This style is perfect for a quick script to munge some data, or an ICFP programming contest

If you really need exceptions, use bracket pattern or safe-exceptions like library. Keep the complexity contained and code needs to be written very carefully.

Data Types

We mentioned data types earlier, using data types to model your computation is the natural approach in Haskell. You build a data type that accurately reflects the data or states that you want to model. We even did it for the custom VapourError type earlier.

Extending that we will use a particular data type EitherT to model errors. This is a monad transformer with an Either where the monad could be anything.

In context it would look something like:

crankHandle :: Int -> EitherT VapourError IO Product
-- or
crankHandle :: Monad m => Int -> EitherT VapourError m Product
-- or
crankHandle :: MonadIO m => Int -> EitherT VapourError m Product

The type of our error is present in the type of our function, a familar situation. If the monad m isn’t IO then we have a good degree of confidence that none of the base exceptions will be present.

Solution

# Build a data type that represents the possible error states
data VapourError =
    InsufficientFunds
  | ItemUnavailable Text
  | MachineMalfunction Text

# Provide a function for turning errors into text
renderVapourError :: VapourError -> Text
renderVapourError = ...

# Usage site
runVendingMachine :: VendingMachineState -> Coin -> EitherT VapourError IO Product
runVendingMachine = ...

Either - Examples

Examples of substantial pieces of code using EitherT to organise errors.

mafia - https://github.com/haskell-mafia/mafia/search?utf8=✓&q=EitherT&type=
boris - https://github.com/markhibberd/boris/search?utf8=✓&q=EitherT&type=
traction - https://github.com/markhibberd/traction/search?utf8=✓&q=EitherT&type=
mismi - https://github.com/nhibberd/mismi/search?q=EitherT&type=Code&utf8=✓

Either Advantages

function signatures clearly indicate error states
exhaustive pattern matching indicates where errors have/have not been handled
requires explicit composition of error data types

Basically the compiler helps you handle the various states required using the type system.

Exception - Examples

Example of code using Exceptions to organise errors

http-client - using non-200 response codes as exceptions
stack - internally follows an exception style

Exception Disadvantages

The main downsides as I see it to exception oriented code are:

exception throwing functions compose too easily you are not forced to think about what it means.
no stack traces by default in Haskell mean you lose context.
handling exceptions requires knowledge about the internals of dependencies and how they use exceptions.

Here the compiler is less helpful in guiding you, giving little or no help with handling particular exceptions or giving compile errors for new exceptions that you might need to consider.

Supporting Libraries

The supporting libraries for this pattern of error handling are:

transformers-either - Provides a type alias type EitherT = ExceptT plus addition operators.
transformers-bifunctor - Provies bifunctors over a monad transformer.

There is nothing revolutionary about transformers-either, you could roll your own version easily or use the ExceptT transformer provided in the transformers package (adding any helper functions you need). The value codes in a structured, consious handling of errors and using the Haskell compiler to help.

Conclusion

The primary value of avoiding exceptions is that it makes error behavior explicit in the type of the function. If you’re in an environment where everything might fail, being explicit about it is probably a negative. But if most of your function calls are total, then knowing which ones might fail highlights places where you should consider what the correct behavior is in the case of that failure. Remember that the failure of an individual step in your program doesn’t generally mean the overall failure of your code.

It’s a little bit like null-handling in languages without options. If everything might be null, well, option types probably don’t help you. But if most of the values you encounter in your program are guaranteed to be there, then tracking which ones might be null be tagging them as options is enormously helpful, since it draws your attention to the cases where it might be there, and so you get an opportunity to think about what the difference really is.

Yaron Minsky

References

OCaml FFI bindings

2015-08-17T00:00:00Z

OCaml FFI bindings

August 17, 2015

One thing that always comes up with your favourite language is how do you use libraries written in another language. Typically this involves needing to talk to a particular C library, either because it’s faster than a native one or just that it is already written.

For OCaml there is the ctypes library for binding to C libraries using pure OCaml. Written by the people at the good people at OCaml Labs http://ocaml.io

The core of ctypes is a set of combinators for describing the structure of C types – numeric types, arrays, pointers, structs, unions and functions. You can use these combinators to describe the types of the functions that you want to call, then bind directly to those functions – all without writing or generating any C!

Lets go through a simple example binding to libyaml. Here’s a declaration form libyaml to get the version string.

/**
 * Get the library version as a string.
 *
 * @returns The function returns the pointer to a static string of the form
 * @c "X.Y.Z", where @c X is the major version number, @c Y is a minor version
 * number, and @c Z is the patch version number.
 */

YAML_DECLARE(const char *)
yaml_get_version_string(void);

To bind to this we need to declare a compatible signature for our OCaml code.


open Ctypes
open Foreign

let get_version_string =
  foreign "yaml_get_version_string"
    (void @-> returning string)

We’re pulling in Ctypes and Foreign. Then the let binding is using foreign with the name of the c method we want to call plus a type signature for that method.

Next we need some calling code to print out the version string.


open Core.Std

let () =
  let version_string = get_version_string() in
  printf "Version: %s\n" version_string

Assuming you’ve got opam installed you can get the dependencies opam install core ctypes and compile the whole thing.


> corebuild -pkg ctypes.foreign -lflags -cclib,-lyaml version_string.native
...
./version_string.native
Version: 0.1.6

We’ve got bindings to a native C library without writing any C.

More complicated example involving passing an allocated string back from C, lets look at the proc_pidpath call from OSX. This particular library call takes a process id (PID) and returns back

int
proc_pidpath(int pid, void * buffer, uint32_t  buffersize)

To bind to this call we again define a compatible signature.

let pidpath =
    foreign ~check_errno:true "proc_pidpath"
            (int @-> ptr char @-> int @-> returning int)

The arguments simply mirror those for the C library call, along with a new argument check_errno which indicates the c library sets errno if it encounters a problem.

http://stackoverflow.com/questions/22651910/returning-a-string-from-a-c-library-to-ocaml-using-ctypes-and-foreign

Ctypes provides native bindings for most things you’ll need. There’s all sorts of pointers and types matching pretty much every native C type you’ll need here.

First month of Haskell

2015-05-20T00:00:00Z

First month of Haskell

May 20, 2015

I’ve been excpetionally fortunate in the past month to accomplish a long held goal of mine. As of the 13th of April I’ve been employed full time as a functional programmer. In particular I’ve taken the deepest of dives into Haskell. I thought it might be interesting, at least for me, to write up my thoughts after completing a month of Haskell.

First the depth of the dive has been overwhelming and the learning curve more equivalent to a vertical rock climb. But the entire time, no matter the exhaustion and believe me there was a lot of that, has been extrodinary. When I take a moment to reflect, I’ve had a smile on my face.

First thing, the degree to which types are ingrained in Haskell. That might seem surprising in itself, Haskell is afterall a strongly typed langauge and it was surprising to me too. I’ve used Erlang, with dialyzer, and OCaml a great deal before starting, and both these languages have reasonable type systems. Ocaml is even described as strongly typed. So what am I getting at?

Everything in Haskell feels typed to the nth degree. Every possible abstraction is pulled out into a common place, either Applicatives, Monads, Bimaps or Monad Transformers. Which is great that you can abstract like that. Using any Haskell library will require you to know about some of these things.

Coming from a background where I’d done Lisp, Erlang and OCaml, and some Haskell I thought I was totally prepared to start working in Haskell full time.

Learning Haskell the language is a good first step, but knowing the syntax and being comfortable reading code is one thing. What really surprised me was that knowing Haskell isn’t sufficient, you need to learn the set of typical Haskell libraries before you can really start making progress and feel at home in Haskell. Of course I’d used things like Monads in OCaml and read

No equivalent to Monad, Monad transformers, lenses, applicative, traversable library eco-system in OCaml. I naievely wonder why this hasn’t been built before and whether it’s even a good idea. The parallels to Scala and scalaz are all too apparent to me. Scala is a mixed OO/FP langauge in a similar way to OCaml. It also doesn’t enforce the same level of strictness with respect to side effects that Haskell does. So in both langauges if you want you can create a mess of side effectey code if you;re not careful. Also both langauges allow mutation, again another side effect, without tracking this via the type system.

I want to thank Mark Hibberd and Charles O’Farrell for being such great mentors over the last month, may they never grow tired of my endless questions.

Unreliable guide to OCaml modules

2015-05-17T00:00:00Z

Unreliable guide to OCaml modules

May 17, 2015

Being on the curious side of things I have been interested lately in the dualities between programming languages. Like how one feature say Type Classes in Haskell compares to what is available in Scala or OCaml. This has lead to me reading a substantial amount of academic papers about the subject.

So with that in mind I would like to give a brief introduction to OCaml style modules. Perhaps in another post going into how can you encode something like rank n types from Haskell in OCaml which natively doesn’t support them.

Preface, the use of the word module can be confusing, and it sometimes seems that module is used to refer to structures interchangeably. I’ve tried to avoid that but it’s helpful to keep in mind for further reading. Look at what’s on the right hand side of the equals in the code. Let start.

Terminology

OCaml is a member of the ML family of languages, sharing common features like modules, Hindley-Milner type sytem and strict evaluation. OCaml as a language can be though of 2 distinct part; one a core language that’s values and types and a second module language that revolves around modules and signatures. While OCaml does provide some support for bridging these parts in the form of First Class Modules, I won’t cover them here.

The key parts of the module system in OCaml are:

Structures
Signatures
Functors

Structures

Structures provide a way for grouping together related declarations like data types and functions the operate on them; they also provide the values in the module langauge. Below is a module for integer Sets:

    module IntSet = struct
      type t = int
      type set = t list
      let empty = []
      let member i s = List.exists (fun x -> x = i) s
      let insert i s = if member i s then s else (i::s)
    end

This code defines a new structure using the struct keyword and binds it to a name using module. It’s useful to note that OCaml types are written in lowercase (t, list and set) and type variables are written with a single quote 'a. Also type constructors are written differently to Haskell, in Haskell you’d have List a while in OCaml the order is reveresed t list.

Basically a struct is an opening struct followed by a bunch of type and let bindings, and closed with an end.

At the call site exposed declarations are referred to by dot notation:

    IntSet.t
    IntSet.empty

If no module name is defined within a file, say you have a file called set.ml with:

      type t = int
      type set = t list
      let empty = []
      let member i s = List.exists (fun x -> x = i) s
      let insert i s = if member i s then s else (i::s)

It will implicitly be given a structure name derived from the file name Set but as you may have worked out module names are not bound to file names. Further structures can be nested within other structures, leading to more freedom than just having 1 file becoming 1 module.

    module IntSet = struct
      module Compare = struct
         type t = int
         let eql x y = x = y
      end
    end;;

The values within the nested module are referred to like so:

    IntSet.Compare.eql 1 1;;

While it is great to have functions namespaced like so, it would become tedious if you needed to use the longer name to refer to a nested module. OCaml provides a couple of solutions, first local opens.

Rather than having an open statement at the top of the file and bringing every thing into scope for that file we can do a local open and restrict the scope to between the two brackets.

     IntSet.Compare.(eql 1 1);;

The other option available is aliasing the module name to something shorter

     module X = IntSet.Compare;;
     X.eql 1 1;;

I mentioned open before without saying what it does. Simply open brings the contents of a module within another module, so they can be referred to without the module name prefix.

Signatures

Signatures are the interfaces for structures, a signature defines what parts of a structure is visable from the outside. A signature can be used to hide components of a structure or export some definitions with more general types.

A signature is introduced with the sig keyword

    module type Set =
      sig
        type elt
        type t

        val empty : t
        val member : elt -> t -> bool
        val insert : elt-> t -> t
      end

As you can see looking at our definition of Set, it lists a type and function signatures without specifying a concrete implementation. It’s also bound to a name Set using module type.

As I metnioned before signatures are typically used to hide or change the interface a module exposes. By default all types and functions are exported from a module. Useful for doing things like hiding implementation details or only construct the data type via the invariant-preserving operations that the module provides.

Typically in OCaml you’ll define your struct in one file set.ml and then create a second file set.mli which contains the signature for the module set. Only occasionally will you see the signature and structure defined together.

Functors

Now to the functors, they’re not exactly like Haskell’s though they do perform a kind of mapping.

Functors are for lifting functions into the module language, or another way they are functions from structures to structures. Which brings the abstract idea of functors from category theory back to 2 concrete examples, where Haskell functors are functions from types to types, OCaml’s functors are functions from structures to structures.

Following out set example we can make set operations abstract across both the type inside the set and the equality comparison.

    module type ORDERING =
      sig
        type t
        val compare : t -> t -> int
      end;;

    module type Set =
      sig
        type elt
        type t
        val empty : t
        val member : elt -> t -> bool
        val insert : elt-> t -> t
      end;;

    module MkSet (Ord : ORDERING) : (Set with type elt := Ord.t) =
      struct
        type elt = Ord.t
        type t = Empty | Node of t * elt * t

        let empty = Empty

        let rec insert x = function
          | Empty -> Node(Empty, x, Empty)
          | Node(a, y, b) when Ord.compare x y < 0 -> Node(insert x a, y, b)
          | Node(a, y, b) when Ord.compare x y > 0 -> Node(a, y, insert x b)
          | Node(a, y, b) as s -> s

        let rec member x = function
          | Empty -> false
          | Node(l, v, r) ->
              let c = Ord.compare x v in
              c = 0 || member x (if c < 0 then l else r)
    end;;

    module IntOrdering = struct
        type t = int
        let compare x y = Pervasives.compare x y
      end;;

    module IntSet' = MkSet(IntOrdering);;

Here we define ORDERING and Set as signatures, similar to our previous definitons. Then a functor is defined MkSet that takes the ORDERING signature and defines the types and functions for set based off that interface. So the definition of MkSet is completely abstracted away from the type used in the set and the functions used on those types. As long as it implements ORDERING.

The last part defines a particular ordering for int using, binding t to int and compare to Int.compare.

Using Modules

After covering what is in the OCaml module system, what exactly do we use it for. At the very basic level we collect together types and functions, which is pretty much what all modules do. Outside of that we can:

Hide implementation details, like the types exported by the module. If we wanted to hide how our Set was implemented we could redefine the functor as:

module type SETFUNCTOR =
    functor (O: ORDERING) ->
      sig
        type t = O.t      (* concrete *)
        type set          (* abstract *)
        val empty : set
        val add : t -> set -> set
        val member : t -> set -> bool
      end;;

Here we expose the elements within the set via type t = O.t so they’re a concrete type, while set isn’t given a definition so the consumers of this module can’t look into that type without using the functions provided in the Set module. This hiding using abstract types lets us swap out different implementations for testing purposes or if requirements change.

Namespace functions and type, all types and functions live within some module.
Extending existing modules in a type safe way. You may want to extend a module from a library with extra derived functions. For example the Core library from Jane Street extends the built in OCaml library with a number of new and different functions. eg Say Lists didn’t provide a transpose function.
Instantiating modules with State, OCaml allows modules to include mutable state (while we may not particularly like mutable things) sometimes it’s necessary and you may want multiple instances of a particular module with their own state. Functors make doing this more succinct.
Collecting definitions and exporting as a single module, e.g. Core.Std inside Jane Street Core library.

Perpetually Curious Blog

Debugging OCaml with Emacs

Debugging OCaml with Emacs

Emacs configuration

Bytecode debugging

Native debugging

Conclusion

Future Work

ICFP 2022 Review

ICFP 2022 Review

OCaml with Emacs in 2022

OCaml with Emacs in 2022

Beginnings

Prelude onfiguration

direnv the necessary magic

OCaml LSP Server

Emacs LSP mode

Alternatives

Summary

Getting Started with OCaml in 2021

Getting Started with OCaml in 2021

Install Opam

Choose an OCaml Version

Creating Your Project Directory

Adding a Dependency

Editor Tooling

Hakyll Blog setup

Hakyll Blog setup

Getting Hakyll Setup

Hakyll CI

Generating the Site

Authoring Posts

Deploying

Resources

OCaml CI with CircleCI

OCaml CI with CircleCI

On EitherT

On EitherT

Async Exceptions

Exceptions

Data Types

Solution

Either - Examples

Either Advantages

Exception - Examples

Exception Disadvantages

Supporting Libraries

Conclusion

References

OCaml FFI bindings

OCaml FFI bindings

First month of Haskell

First month of Haskell

Unreliable guide to OCaml modules

Unreliable guide to OCaml modules

Terminology

Structures

Signatures

Functors

Using Modules

Further Reading