Herb tutorial: Abstract syntax trees

In this tutorial, you will learn

  • How to represent a computer program as an abstract syntax tree in Herb.

  • How to replace parts of the tree to modify the program.

Abstract syntax trees

The syntactic structure of a computer program can be represented in a hierarchical tree structure, a so-called Abstract Syntax Tree (AST). The syntax of a programming language is typically defined using a formal grammar, a set of rules on how valid programs can be constructed. ASTs are derived from the grammar, but are abstractions in the sense that they omit details such as parenthesis, semicolons, etc. and only retain what's necessary to capture the program structure.

In the context of program synthesis, ASTs are often used to define the space of all possible programs which is searched to find one that satisfies the given specifications. During the search process, different ASTs, each corresponding to a different program, are generated and evaluated until a suitable one is found.

Each node of the AST represents a construct in the program (e.g., a variable, an operator, a statement, or a function) and this construct corresponds to a rule in the formal grammar. An edge describes the relationship between constructs, and the tree structure captures the nesting of constructs.

A simple example program

We first consider the simple program 5*(x+3). We will define a grammar that is sufficient to represent this program and use it to construct a AST for our program.

Define the grammar

using Herb
grammar = @csgrammar begin
        Number = |(0:9)
        Number = x
        Number = Number + Number
        Number = Number * Number
    end
1: Number = 0
2: Number = 1
3: Number = 2
4: Number = 3
5: Number = 4
6: Number = 5
7: Number = 6
8: Number = 7
9: Number = 8
10: Number = 9
11: Number = x
12: Number = Number + Number
13: Number = Number * Number

Construct the syntax tree

The AST of this program is shown in the diagram below. The number in each node refers to the index of the corresponding rule in our grammar.

Diagram(:mermaid, """
flowchart TD
    id1((13)) ---
    id2((6))
    id1 --- id3((12))
    id4((11))
    id5((4))
    id3 --- id4
    id3 --- id5
""")

In Herb.jl, the HerbCore.RuleNode is used to represent both an individual node, but also entire ASTs or sub-trees. This is achieved by nesting instances of RuleNode. A RuleNode can be instantiated by providing the index of the grammar rule that the node represents and a vector of child nodes.

syntaxtree = HerbGrammar.RuleNode(13, [RuleNode(6), RuleNode(12, [RuleNode(11), RuleNode(4)])])
13{6,12{11,4}}

We can confirm that our AST is correct by displaying it in a more human-readable way, using HerbGrammar.rulenode2expr and by testing it on a few input examples using HerbInterpret.execute_on_input.

rulenode2expr(syntaxtree, grammar)
:(5 * (x + 3))
# test solution on inputs
execute_on_input(grammar, syntaxtree, Dict(:x => 10))
65

Another example: FizzBuzz

Let's look at a more interesting example. The program fizzbuzz() is based on the popular FizzBuzz problem. Given an integer number, the program simply returns a String of that number, but replace numbers divisible by 3 with "Fizz", numbers divisible by 5 with "Buzz", and number divisible by both 3 and 5 with "FizzBuzz".

function fizzbuzz(x)
    if x % 5 == 0 && x % 3 == 0
        return "FizzBuzz"
    else
        if x % 3 == 0
            return  "Fizz"
        else
            if x % 5 == 0
                return "Buzz"
            else
                return string(x)
            end
        end
    end
end
fizzbuzz (generic function with 1 method)

Define the grammar

Let's define a grammar with all the rules that we need.

grammar_fizzbuzz = @csgrammar begin
    Int = input1
    Int = 0 | 3 | 5
    String = "Fizz" | "Buzz" | "FizzBuzz"
    String = string(Int)
    Return = String
    Int = Int % Int
    Bool = Int == Int
    Int = Bool ? Int : Int
    Bool = Bool && Bool
end
1: Int = input1
2: Int = 0
3: Int = 3
4: Int = 5
5: String = Fizz
6: String = Buzz
7: String = FizzBuzz
8: String = string(Int)
9: Return = String
10: Int = Int % Int
11: Bool = Int == Int
12: Int = if Bool
    Int
else
    Int
end
13: Bool = Bool && Bool

Construct the syntax tree

Given the grammar, the AST of fizzbuzz() looks like this:

Diagram(:mermaid, """
flowchart TD
    id1((12)) --- id21((13))
    id1--- id22((9))
    id1--- id23((12))

    id21 --- id31((11))
    id21 --- id32((11))

    id31 --- id41((10))
    id31 --- id42((2))

    id41 --- id51((1))
    id41 --- id52((4))

    id32 --- id43((10)) 
    id32 --- id44((2))

    id43 --- id53((1))
    id43 --- id54((3))

    id22 --- id33((7))
    id23 --- id34((11))

    id34 --- id45((10))
    id34 --- id46((2))

    id45 --- id55((1))
    id45 --- id56((3))

    id23 --- id35((9))
    id35 --- id47((5))

    id23 --- id36((12))
    id36 --- id48((11))
    id48 --- id57((10))
    id57 --- id61((1))
    id57 --- id62((4))
    id48 --- id58((2))

    id36 --- id49((9))
    id49 --- id59((6))

    id36 --- id410((9))
    id410 --- id510((8))
    id510 --- id63((1))
""")

As before, we use nest instanced of RuleNode to implement the AST.

fizzbuzz_syntaxtree = RuleNode(12, [
               RuleNode(13, [
                   RuleNode(11, [
                       RuleNode(10, [
                           RuleNode(1),
                           RuleNode(4)
                       ]),
                       RuleNode(2)
                   ]),
                   RuleNode(11, [
                       RuleNode(10, [
                           RuleNode(1),
                           RuleNode(3)
                       ]),
                       RuleNode(2)
                   ])
               ]),
               RuleNode(9, [
                   RuleNode(7)
               
               ]),
               RuleNode(12, [
                   RuleNode(11, [
                       RuleNode(10, [
                           RuleNode(1),
                           RuleNode(3),
                       ]),
                       RuleNode(2)
                   ]),
                   RuleNode(9, [
                       RuleNode(5)
                   ]),
                   RuleNode(12, [
                       RuleNode(11, [
                           RuleNode(10, [
                               RuleNode(1),
                               RuleNode(4)
                           ]),
                           RuleNode(2)
                       ]),
                       RuleNode(9, [
                           RuleNode(6)
                       ]),
                       RuleNode(9, [
                           RuleNode(8, [
                                RuleNode(1)
                            ])
                       ])
                   ])
               ]) 
    ])
12{13{11{10{1,4},2},11{10{1,3},2}},9{7},12{11{10{1,3},2},9{5},12{11{10{1,4},2},9{6},9{8{1}}}}}

And we check our syntax tree is correct:

rulenode2expr(fizzbuzz_syntaxtree, grammar_fizzbuzz)
:(if input1 % 5 == 0 && input1 % 3 == 0
      "FizzBuzz"
  else
      if input1 % 3 == 0
          "Fizz"
      else
          if input1 % 5 == 0
              "Buzz"
          else
              string(input1)
          end
      end
  end)
begin
    # test solution on inputs
    input = [Dict(:input1 => 3), Dict(:input1 => 5), Dict(:input1 =>15), Dict(:input1 => 22)]
    output1 = execute_on_input(grammar_fizzbuzz, fizzbuzz_syntaxtree, input)
    output1
end
4-element Vector{Any}:
 "Fizz"
 "Buzz"
 "FizzBuzz"
 "22"

Modify the AST/program

There are several ways to modify an AST and hence, a program. You can

  • directly replace a node with HerbCore.swap_node()

  • insert a rule node with insert!

Let's modify our example such that if the input number is divisible by 3, the program returns "Buzz" instead of "Fizz". We use swap_node() to replace the node of the AST that corresponds to rule 5 in the grammar (String = Fizz) with rule 6 (String = Buzz). To do so, swap_node() needs the tree that contains the node we want to modify, the new node we want to replace the node with, and the path to that node.

Note that swap_node() modifies the tree, hence we make a deep copy of it first.

begin
    modified_fizzbuzz_syntaxtree = deepcopy(fizzbuzz_syntaxtree)
    newnode = RuleNode(6)
    path = [3, 2, 1]
    swap_node(modified_fizzbuzz_syntaxtree, newnode, path)
    rulenode2expr(modified_fizzbuzz_syntaxtree, grammar_fizzbuzz)
end
:(if input1 % 5 == 0 && input1 % 3 == 0
      "FizzBuzz"
  else
      if input1 % 3 == 0
          "Buzz"
      else
          if input1 % 5 == 0
              "Buzz"
          else
              string(input1)
          end
      end
  end)

Let's confirm that we modified the AST, and hence the program, correctly:

# test solution on same inputs as before
execute_on_input(grammar_fizzbuzz, modified_fizzbuzz_syntaxtree, input)
4-element Vector{Any}:
 "Buzz"
 "Buzz"
 "FizzBuzz"
 "22"

An alternative way to modify the AST is by using insert!(). This requires to provide the location of the node that we want to as NodeLoc. NodeLoc points to a node in the tree and consists of the parent and the child index of the node. Again, we make a deep copy of the original AST first.

begin
    anothermodified_fizzbuzz_syntaxtree = deepcopy(fizzbuzz_syntaxtree)
    # get the node we want to modify and instantiate a NodeLoc from it.
    node = get_node_at_location(anothermodified_fizzbuzz_syntaxtree, [3, 2, 1])
    nodeloc = NodeLoc(node, 0)
    # replace the node
    insert!(node, nodeloc, newnode)
    rulenode2expr(anothermodified_fizzbuzz_syntaxtree, grammar_fizzbuzz)
end
:(if input1 % 5 == 0 && input1 % 3 == 0
      "FizzBuzz"
  else
      if input1 % 3 == 0
          "Buzz"
      else
          if input1 % 5 == 0
              "Buzz"
          else
              string(input1)
          end
      end
  end)

Again, we check that we modified the program as intended:

# test on same inputs as before
execute_on_input(grammar_fizzbuzz, anothermodified_fizzbuzz_syntaxtree, input)
4-element Vector{Any}:
 "Buzz"
 "Buzz"
 "FizzBuzz"
 "22"