Quick and easy user-defined operators with Plated
Posted on November 17, 2017All infix operators have precedence and associativity. A language that supports user-defined operators should also give the user a way to control these attributes for their custom operators. Modern languages do this in a variety of ways. In Swift, all operators are associated with a precedence group. User-defined operators in F# get their precedence and associativity from the combination of characters that make up the operator. In Haskell-like languages, the user explicitly states the precedence and associativity using special syntax. For example, infixl 5 +
says “the +
operator is left-associative and has precedence level 5”.
I like Haskell-style operators out of all these options — I think they’re a more elegant solution to the problem. However, elegance comes at the cost of implementation difficulty.
Haskell-style infix operators are generally implemented like this:
Define a set of characters that are allowed in operators.
Add syntax for declaring operator precedence and associativity.
Parse all operators right-recursively.
The grammar might look something like this: ``` operator_char ::= ‘~’ | ‘!’ | ‘@’ | ‘#’ | ‘$’ | ‘%’ | ‘^’ | ‘&’ | ’*’ | ‘?’ | ‘>’ | ‘<’ | ‘.’ | ‘|’ | ‘-’ | ‘+’ | ‘=’ operator ::= operator_char+
fixity ::= ‘infixr’ | ‘infixl’ infix_decl ::= fixity [0-9]+ operator declaration ::= infix_decl | …
non_operator_expr ::= ‘(’ expr ‘)’ | … expr ::= non_operator_expr (operator non_operator_expr)* ```
Collect the operator precedences and associativities from the parsed output
Re-write the parsed expressions according to this information
This is done in two parts:
- Associativity correction
- Precedence correction
Re-association
Consider the input 2 - 3 + 4
. We implicitly read this as [2 - 3] + 4
, because -
and +
have the same precedence and are left-associative. According to our grammar this expression will always be parsed as 2 - [3 + 4]
, which has a completely different meaning. Changing the position of the brackets accomplished by changing which operator is at the top of the tree. (-)
is at the top of the tree in the expression 2 - [3 + 4]
, but (+)
is at the top in [2 - 3] + 4
. Notice that the order of the leaves — 2, 3, 4
— stays the same regardless of how the tree is transformed.
The re-association algorithm for a tree of height two looks like this:
Left associative operators:
OP1(prec=X, assoc=Left)
/ \
/ \
A OP2(prec=Y, assoc=Left)
/ \
/ \
B C
if X == Y, becomes
OP2(prec=Y, assoc=Left)
/ \
/ \
OP1(prec=X, assoc=Left) C
/ \
/ \
A B
Right associative operators
OP1(prec=X, assoc=Right)
/ \
/ \
OP2(prec=Y, assoc=Right) C
/ \
/ \
A B
if X == Y, becomes
OP2(prec=Y, assoc=Right)
/ \
/ \
A OP1(prec=X, assoc=Right)
/ \
/ \
B C
If a parent and child node have different precedences and different associativities, then they won’t be re-associated. If they have the same precedence but different associativities, an “ambiguous parse” error is raised.
If we have two operators !
and ^
, with the same precedence and different associativity, any unparenthesised expression that uses them has two valid parenthesisations — 5 ^ 4 ! 3
could be [5 ^ 4] ! 3
or 5 ^ [4 ! 3]
.
Precedence Correction
Consider the input 2 * 3 + 4
. This is read as [2 * 3] + 4
, because *
has higher precedence than +
, but it will be parsed as 2 * [3 + 4]
.
The precedence-correction algorithm looks like this:
OP1(prec=X, assoc=?)
/ \
/ \
A OP2(prec=Y, assoc=??)
/ \
/ \
B C
if X > Y, becomes
OP2(prec=Y, assoc=??)
/ \
/ \
OP1(prec=X, assoc=?) C
/ \
/ \
A B
and
OP1(prec=X, assoc=?)
/ \
/ \
OP2(prec=Y, assoc=??) C
/ \
/ \
A B
if X > Y, becomes
OP2(prec=Y, assoc=??)
/ \
/ \
A OP1(prec=X, assoc=?)
/ \
/ \
B C
Putting it together
This seems fairly straightforward when thinking about trees of height 2, but how do you generalise it to trees of any height?
This is where Plated
comes in. To make your datatype an instance of Plated, you write a Traversal that operates over its immediate self-similar children. rewriteM then allows you to write transformations on trees as deep or as shallow as you wish, interleaving a monadic context, and will recursively apply those transformations from the bottom of a tree upwards until it can no longer be transformed.
Good abstractions reduce boilerplate and help you focus on what’s important. The “recursive bottom-up tree rewriting” algorithm has already been written for us. Using Plated
, we need only consider the simplest case for re-ordering the tree, and then it scales for free.
Let’s write some code.
module Operators where
import Control.Applicative ((<|>), liftA2)
import Control.Lens.Plated (Plated(..), rewriteM)
import Data.Maybe (fromMaybe)
Parens
node is for explicit parenthesisation.
Plated
instance.
instance Plated Expr where
plate f (Parens a) = Parens <$> f a
plate f (BinOp n a b) = BinOp n <$> f a <*> f b
plate _ a = pure a
data Associativity
= AssocL
| AssocR
deriving (Eq, Ord, Show)
data OperatorInfo
= OperatorInfo
{ opAssoc :: Associativity
, opPrecedence :: Int
}
OpTable
is a map from operator names to their OperatorInfo
.
Re-association. This code crashes when an operator is missing from the table.
If the input node is an operator, and:
- It is left-associative:
- Inspect its right child
- If the right child node is an operator and has equal precedence to the input node
- And is also left-associative, then re-order the tree to be left-associative
- And is right-associative, then report an ambiguity
- If the right child node is an operator and has equal precedence to the input node
- Inspect its left child
- If the left child node is an operator, has equal precedence to the input node and is right-associative, then report an ambiguity
- Inspect its right child
- It is right-associative
- Inspect its left child
- If the left child node is an operator and has equal precedence to the input node
- And is also right-associative, then re-order the tree to be right-associative
- And is left-associative, then report an ambiguity
- If the left child node is an operator and has equal precedence to the input node
- Inspect its right child
- If the right child node is an operator, has equal precedence to the input node and is left-associative, then report an ambiguity
- Inspect its left child
- It is left-associative:
Otherwise, do nothing
associativity :: OpTable -> Expr -> Either ParseError (Maybe Expr)
associativity table (BinOp name l r)
| Just entry <- lookup name table =
case opAssoc entry of
AssocL
| BinOp name' l' r' <- r
, Just entry' <- lookup name' table
, opPrecedence entry == opPrecedence entry' ->
case opAssoc entry' of
AssocL -> Right . Just $ BinOp name' (BinOp name l l') r'
AssocR -> Left AmbiguousParse
| BinOp name' _ _ <- l
, Just entry' <- lookup name' table
, opPrecedence entry == opPrecedence entry'
, AssocR <- opAssoc entry' ->
Left AmbiguousParse
| otherwise -> Right Nothing
AssocR
| BinOp name' l' r' <- l
, Just entry' <- lookup name' table
, opPrecedence entry == opPrecedence entry' ->
case opAssoc entry' of
AssocL -> Left AmbiguousParse
AssocR -> Right . Just $ BinOp name' l' (BinOp name r' r)
| BinOp name' _ _ <- r
, Just entry' <- lookup name' table
, opPrecedence entry == opPrecedence entry'
, AssocL <- opAssoc entry' ->
Left AmbiguousParse
| otherwise -> Right Nothing
associativity _ _ = Right Nothing
Precedence correction. This code also crashes when operators are missing from the operator table.
This is broken down into two phases- making sure the left branch is precedence-correct with respect to the input node, and then doing the same for the left branch.
For each branch, if that branch contains an operator with a lower precedence than the input node, re-order the tree so the lower-precedence operator is at the top.precedence :: OpTable -> Expr -> Maybe Expr
precedence table e@(BinOp name _ _)
| Just entry <- lookup name table =
checkR entry $ fromMaybe e (checkL entry e)
where
checkL entry (BinOp name l c) =
case l of
BinOp name' a b
| Just entry' <- lookup name' table
, opPrecedence entry' < opPrecedence entry ->
Just $ BinOp name' a (BinOp name b c)
_ -> Nothing
checkL _ _ = Nothing
checkR entry (BinOp name a r) =
case r of
BinOp name' b c
| Just entry' <- lookup name' table
, opPrecedence entry' < opPrecedence entry ->
Just $ BinOp name' (BinOp name a b) c
_ -> Nothing
checkR _ _ = Nothing
precedence _ _ = Nothing
precedence
and associativity
have type Expr -> Maybe Expr
because eventually the transformations will no longer be applicable. We can use liftA2 (liftA2 (<|>))
to combine the two rewrite rules, and rewriteM
will run until one produces a Left
, or until both always produce Right Nothing
reorder :: OpTable -> Expr -> Either ParseError Expr
reorder table =
rewriteM $
liftA2 (liftA2 (<|>)) (Right . precedence table) (associativity table)
Let’s try it on the expression 5 - 4 + 3 * 2 + 1
. It will be parsed as 5 - [4 + [3 * [2 + 1]]]
, but after re-ordering should become [[5 - 4] + [3 * 2]] + 1
.
ghci> let o = [("+", OperatorInfo AssocL 5), ("-", OperatorInfo AssocL 5), ("*", OperatorInfo AssocL 6)]
ghci> let input = BinOp "-" (Number 5) (BinOp "+" (Number 4) (BinOp "*" (Number 3) (BinOp "+" (Number 2) (Number 1))))
ghci> reorder o input
Right (BinOp "+" (BinOp "+" (BinOp "-" (Number 5) (Number 4)) (BinOp "*" (Number 3) (Number 2))) (Number 1))
We can also use Parens
to explicitly parenthesise the expression. If we input 5 - (4 + (3 * (2 + 1)))
, it will not be re-ordered at all.
ghci> let input = BinOp "-" (Number 5) (Parens $ BinOp "+" (Number 4) (Parens $ BinOp "*" (Number 3) (Parens $ BinOp "+" (Number 2) (Number 1))))
ghci> reorder o input
Right (BinOp "-" (Number 5) (Parens (BinOp "+" (Number 4) (Parens (BinOp "*" (Number 3) (Parens (BinOp "+" (Number 2) (Number 1))))))))
ghci> reorder o input == Right input
True
Ambiguous expressions are reported. Here’s the example from earlier — 5 ^ 4 ! 3
:
ghci> let o = [("^", OperatorInfo AssocL 5), ("!", OperatorInfo AssocR 5)]
ghci> let input = BinOp "^" (Number 5) (BinOp "!" (Number 4) (Number 3))
ghci> reorder o input
Left AmbiguousParse
ghci> let input = BinOp "!" (BinOp "^" (Number 5) (Number 4)) (Number 3)
ghci> reorder o input
Left AmbiguousParse
And are resolved by adding explicit parentheses — 5 ^ (4 ! 3)
and (5 ^ 4) ! 3
respectively:
ghci> let o = [("^", OperatorInfo AssocL 5), ("!", OperatorInfo AssocR 5)]
ghci> let input = BinOp "^" (Number 5) (Parens $ BinOp "!" (Number 4) (Number 3))
ghci> reorder o input
Right (BinOp "!" (Number 5) (Parens (BinOp "^" (Number 4) (Number 3))))
ghci> let input = BinOp "!" (Parens $ BinOp "^" (Number 5) (Number 4)) (Number 3)
ghci> reorder o input
Right (BinOp "!" (Parens (BinOp "^" (Number 5) (Number 4)) (Number 3)))
> Isaac Elliott
Isaac really likes types