Moshi - pattern matching and algebraic data types for Julia

In this post, I’m excited to announce Moshi (模式), a new package for Julia that brings pattern matching and type-stable generic algebraic data types to the language. It also provides a derive macro similar to Rust’s derive macro for deriving traits (a set of interface functions) for algebraic data types (ADT) or Julia structs. Moshi is a complete rewrite of the MLStyle package.

I have been iterating the design and implementation of Moshi in the past 3 year through various packages (mostly Expronicon). The syntax and implementation is gettign stable. However, the package is still in its early stages (mostly on stabilizing the package and covering more patterns on Julia’s Base types), I would like to announce it here and invite the community to try it out and provide feedback.

Installation

You can install Moshi using the Julia package manager. From the Julia REPL, type ] to enter the Pkg REPL mode and run:

pkg> add Moshi

The Name and Acknowledgement

The name “Moshi” is derived from the Chinese word “模式” (móshì) which means “pattern”. The design of pattern matching is inspired by its predecessor MLStyle, which tries to bring pattern matching and algebraic data types from ML family languages to Julia.

The generic algebraic data type is highly inspired by previous work done in Julia ecosystem:

SumTypes by @MasonProtter.
Expronicon by myself.
this discussion about “generated struct” and how Julia implements Union types by @vjnash.

Quick Example

Here is a quick example of defining a simple algebraic data type:

using Moshi.Data: @data

@data Message begin
    Quit
    struct Move
        x::Int
        y::Int
    end

    Write(String)
    ChangeColor(Int, Int, Int)
end

For pattern matching, if you already used MLStyle, the syntax is very similar:

using Moshi.Match: @match

@match [1.0, 2, 3] begin
    [1, xs::Float64...] => xs
end

@match (1, 2.0, "a") begin
    (1, x::Int, y::String) => x
    (1, x::Real, y::String) => y
end

What is MLStyle? And Why Rewrite?

MLStyle has been one of the “go-to” patterns-matching packages in the Julia ecosystem. It is the most flexible and extensible pattern-matching system in Julia. 77 packages have been built on top of it, receiving 13k monthly downloads.

MLStyle has been a great success, but it has some problems and let me explain why I decided to rewrite it and what Moshi is trying to achieve:

Pattern Matching Syntax Had Correctness Issues

@thautwarm and I have been discussing various designs that were not made correctly back in ~2018, such as:

The main reason here is when MLStyle was designed in 2018, we didn’t have enough experience in designing the pattern matching syntax in Julia. However, as a user of MLStyle in the past 6 years. I have a better understanding of the problems in MLStyle. Some ideas from @thautwarm have also been floating around for MLStyle v0.5. Some syntax design has become problematic when I started integrating my ADT implementation with MLStyle afterwards.

The History of Supporting Algebraic Data Types in Julia

In fact, MLStyle also provides GADTs since 2018, but the implementation is not type-stable and the performance of generated code is not good.

Around 2021, @YingboMa tried to improve the performance of symbolic expression in SymbolicUtils in Unityper by manually generating tagged union types with the same C layout. However, the implementation does not support pattern matching and thus can be very verbose.

After several discussion with @thautwarm, I decided to use a similar approach like Unityper on memory management (how the structs are created) and integrate my implementation with MLStyle’s pattern matching system. I learned about how memory alignment should be done from @MasonProtter later to further improve the performance. This becomes the ADT in Expronicon around early 2023.

The ADT in Expronicon is type-stable and has relatively good performance. However, it has trouble supporting generic ADTs and what was more problematic is I find it is hard to support namespace in Julia. To explain this, let me ask a question: is the following code type piracy?

function Base.getproperty(::MyType, field::Symbol)
    # blabla
end

It should be type-stable and should not be type piracy, right? Because we are overloading the method on our own type MyType. Now, what about the following code?

function Base.getproperty(::Type{MyType}, field::Symbol)
    # blabla
end

This is actually type piracy because this effects the behaviour . operator inside Core.Compiler and methods on Type will need to be invalidated. This is a problem raised by @vjnash. Thus the support of namespace in Expronicon, e.g Message.Move(1, 2) has been an evil implementation causing massive invalidations in packages using Expronicon.

And most importantly, from that discussion, we learned Julia’s Union is doing the tagged Union memory layout in a similar way as Expronicon, Unityper and SumTypes. Thus in principle, one can just use a Union as the storage instead of calculating the memory layout manually. In 2023 summer, @MasonProtter decided to abandon the manual memory management in SumTypes. It turns out not only simplifying the implementation but also improving the performance because now Julia compiler understands the ADT and can optimize the code better via passes like union splitting.

Some has started using @MasonProtter’s work in SumTypes since then. However, the pattern matching system in SumTypes is not as powerful as MLStyle. @MasonProtter asked me a few times about how to integrate with MLStyle and he also complained about how hard it is to integrate with MLStyle, which I fully agree.

The implementation of ADT and pattern matching turns out to be closely related. One need a good ADT implementation to support pattern matching and one need a good pattern matching system to support ADT.

From a more practical perspective, I started realizing the importance of a good pattern matching system in implementing the next-gen of Yao and symbolic engines like SymbolicUtils. The proof of concept ADT in Expronicon finds its usefulness in the refactor of SymbolicUtils combined with MLStyle. Some of the quantum error correction compiler in QuEra was also built on top of Expronicon and MLStyle internally. These practical usage of ADT and pattern matching in Julia pushed me to rewrite MLStyle to support ADT and pattern matching in a more integrated way — this is the born of Moshi.

We don’t like the name

Many has asked about what is “MLStyle”? It seems not everyone knows the name is inspired by “ML family languages” but instead think it is a style for machine learning. The name is also not very easy to remember.

Conlusion

So In summary, we want the following:

Generic Algebraic Data Types (GADT) that are type-stable and performant.
Proper namespace support for (G)ADT in Julia that does not make compiler mad.
A pattern matching system that is as powerful as MLStyle
- all the standard built-in patterns in MLStyle
- zero runtime dependency (only Base and Core)
- expression patterns (this is the nicest feature in MLStyle for meta-programming)
- extensible pattern matching system (you can define your own patterns in MLStyle, just most people don’t know how to do it)
A set of reflection tools for (G)ADT and pattern matching
A formal definition of the pattern matching language so we can at least reason about the correctness in discussions and maybe even automatically.

Moshi is trying to achieve all of these. The package is still in its early stages (v0.3), but I believe the design is getting stable and the implementation is getting mature. I would like to invite the community to try it out and provide feedback.

Future Work

The next step is to stabilize the package and cover more patterns on Julia’s Base types. For example, patterns like range patterns and custom pattern syntax is not yet fully supported. However, they shouldn’t change the main design of Moshi.

Happy coding!