Defining Grammars in Herb.jl using HerbGrammar

The program space in Herb.jl is defined using a grammar. This notebook demonstrates how such a grammar can be created. There are multiple kinds of grammars, but they can all be defined in a very similar way.

Setup

First, we import the necessary Herb packages.

using Herb

Creating a simple grammar

This cell contains a very simple arithmetic grammar. The grammar is defined using the @csgrammar macro. This macro converts the grammar definition in the form of a Julia expression into Herb's internal grammar representation. Macro's are executed during compilation. If you want to load a grammar during execution, have a look at the HerbGrammar.expr2csgrammar function.

g₁ = HerbGrammar.@csgrammar begin
    Int = 1
    Int = 2
    Int = 3
    Int = Int * Int
    Int = Int + Int
end
1: Int = 1
2: Int = 2
3: Int = 3
4: Int = Int * Int
5: Int = Int + Int

Defining every integer one-by-one can be quite tedious. Therefore, it is also possible to use the following syntax that makes use of a Julia iterator:

g₂ = HerbGrammar.@csgrammar begin
    Int = |(0:9)
    Int = Int * Int
    Int = Int + Int
end
1: Int = 0
2: Int = 1
3: Int = 2
4: Int = 3
5: Int = 4
6: Int = 5
7: Int = 6
8: Int = 7
9: Int = 8
10: Int = 9
11: Int = Int * Int
12: Int = Int + Int

You can do the same with lists:

g₃ = HerbGrammar.@csgrammar begin
    Int = |([0, 2, 4, 6, 8])
    Int = Int * Int
    Int = Int + Int
end
1: Int = 0
2: Int = 2
3: Int = 4
4: Int = 6
5: Int = 8
6: Int = Int * Int
7: Int = Int + Int

Variables can also be added to the grammar by just using the variable name:

g₄ = HerbGrammar.@csgrammar begin
    Int = |(0:9)
    Int = Int * Int
    Int = Int + Int
    Int = x
end
1: Int = 0
2: Int = 1
3: Int = 2
4: Int = 3
5: Int = 4
6: Int = 5
7: Int = 6
8: Int = 7
9: Int = 8
10: Int = 9
11: Int = Int * Int
12: Int = Int + Int
13: Int = x

Grammars can also work with functions. After all, + and * are just infix operators for Julia's identically-named functions. You can use functions that are provided by Julia, or functions that you wrote yourself:

begin
    f(a) = a + 1
    
    g₅ = HerbGrammar.@csgrammar begin
        Int = |(0:9)
        Int = Int * Int
        Int = Int + Int
        Int = f(Int)
        Int = x
    end
end
1: Int = 0
2: Int = 1
3: Int = 2
4: Int = 3
5: Int = 4
6: Int = 5
7: Int = 6
8: Int = 7
9: Int = 8
10: Int = 9
11: Int = Int * Int
12: Int = Int + Int
13: Int = f(Int)
14: Int = x

Similarly, we can also define the operator times (x) manually.

begin
    ×(a, b) = a * b
    
    g₆ = HerbGrammar.@csgrammar begin
        Int = |(0:9)
        Int = a
        Int = Int + Int
        Int = Int × Int
    end
end
1: Int = 0
2: Int = 1
3: Int = 2
4: Int = 3
5: Int = 4
6: Int = 5
7: Int = 6
8: Int = 7
9: Int = 8
10: Int = 9
11: Int = a
12: Int = Int + Int
13: Int = Int × Int

Working with grammars

If you want to implement something using these grammars, it is useful to know about the functions that you can use to manipulate grammars and extract information. This section is not complete, but it aims to give an overview of the most important functions.

It is recommended to also read up on Julia metaprogramming if you are not already familiar with the concept.

One of the most important things about grammars is that each rule has an index associated with it:

begin
    g₇ = HerbGrammar.@csgrammar begin
        Int = |(0:9)
        Int = Int + Int
        Int = Int * Int
        Int = x
    end
    
    collect(enumerate(g₇.rules))
end
13-element Vector{Tuple{Int64, Any}}:
 (1, 0)
 (2, 1)
 (3, 2)
 (4, 3)
 (5, 4)
 (6, 5)
 (7, 6)
 (8, 7)
 (9, 8)
 (10, 9)
 (11, :(Int + Int))
 (12, :(Int * Int))
 (13, :x)

We can use this index to extract information from the grammar.

isterminal

isterminal returns true if a rule is terminal, i.e. it cannot be expanded. For example, rule 1 is terminal, but rule 11 is not, since it contains the non-terminal symbol :Int.

HerbGrammar.isterminal(g₇, 1)
true
HerbGrammar.isterminal(g₇, 11)
false

return_type

This function is rather obvious; it returns the non-terminal symbol that corresponds to a certain rule. The return type for all rules in our grammar is :Int.

HerbGrammar.return_type(g₇, 11)
:Int

child_types

child_types returns the types of the nonterminal children of a rule in a vector. If you just want to know how many children a rule has, and not necessarily which types they have, you can use nchildren

HerbGrammar.child_types(g₇, 11)
2-element Vector{Symbol}:
 :Int
 :Int
HerbGrammar.nchildren(g₇, 11)
2

nonterminals

The nonterminals function can be used to obtain a list of all nonterminals in the grammar.

HerbGrammar.nonterminals(g₇)
1-element Vector{Symbol}:
 :Int

Adding rules

It is also possible to add rules to a grammar during execution. This can be done using the add_rule! function. The exclamation mark is a Julia convention and is appended to name if a function modifies its arguments (in our example the grammar).

A rule can be provided in the same syntax as is used in the grammar definition. The rule should be of the Expr type, which is a built-in type for representing expressions. An easy way of creating Expr values in Julia is to encapsulate it in brackets and use a colon as prefix:

HerbGrammar.add_rule!(g₇, :(Int = Int - Int))
1: Int = 0
2: Int = 1
3: Int = 2
4: Int = 3
5: Int = 4
6: Int = 5
7: Int = 6
8: Int = 7
9: Int = 8
10: Int = 9
11: Int = Int + Int
12: Int = Int * Int
13: Int = x
14: Int = Int - Int

Removing rules

It is also possible to remove rules in Herb.jl, however, this is a bit more involved. As said before, rules have an index associated with them. The internal representation of programs that are defined by the grammar makes use of those indices for efficiency. Blindly removing a rule would shift the indices of other rules, and this could mean that existing programs get a different meaning or become invalid.

Therefore, there are two functions for removing rules:

  • remove_rule! removes a rule from the grammar, but fills its place with a placeholder. Therefore, the indices stay the same, and only programs that use the removed rule become invalid.

  • cleanup_removed_rules! removes all placeholders and shifts the indices of the other rules.

HerbGrammar.remove_rule!(g₇, 11)
1: Int = 0
2: Int = 1
3: Int = 2
4: Int = 3
5: Int = 4
6: Int = 5
7: Int = 6
8: Int = 7
9: Int = 8
10: Int = 9
11: nothing = nothing
12: Int = Int * Int
13: Int = x
14: Int = Int - Int
HerbGrammar.cleanup_removed_rules!(g₇)
1: Int = 0
2: Int = 1
3: Int = 2
4: Int = 3
5: Int = 4
6: Int = 5
7: Int = 6
8: Int = 7
9: Int = 8
10: Int = 9
11: Int = Int * Int
12: Int = x
13: Int = Int - Int

Context-sensitive grammars

Context-sensitive grammars introduce additional constraints compared to context-free grammars (like the simple grammar examples above). As before, we use the @csgrammar macro:

g₈ = HerbGrammar.@csgrammar begin
    Int = |(0:9)
    Int = Int + Int
    Int = Int * Int
    Int = x
end
1: Int = 0
2: Int = 1
3: Int = 2
4: Int = 3
5: Int = 4
6: Int = 5
7: Int = 6
8: Int = 7
9: Int = 8
10: Int = 9
11: Int = Int + Int
12: Int = Int * Int
13: Int = x

Constraints can be added using the addconstraint! function, which takes a context-sensitive grammar and a constraint and adds the constraint to the grammar.

For example, we can add a `constraint to enforce that the input symbolx` (rule 13) appears at least once in the program, to avoid programs that are just a constant.

HerbGrammar.addconstraint!(g₈, Contains(13))
1-element Vector{AbstractConstraint}:
 Contains(13)

There is a dedicated tutorial for constraints in Herb.jl and how to work with them.

Probabilistic grammars

Herb.jl also supports probabilistic grammars. These grammars allow the user to assign a probability to each rule in the grammar. A probabilistic grammar can be defined in a very similar way to a standard grammar, but has some slightly different syntax:

begin
    g₉ = HerbGrammar.@pcsgrammar begin
        0.4 : Int = |(0:9)
        0.2 : Int = Int + Int
        0.1 : Int = Int * Int
        0.3 : Int = x
    end
    
    for r ∈ 1:length(g₃.rules)
        p = HerbGrammar.probability(g₈, r)
    
        println("$p : $r")
    end
end

The numbers before each rule represent the probability assigned to that rule. The total probability for each return type should add up to 1.0. If this isn't the case, Herb.jl will normalize the probabilities.

If a single line in the grammar definition represents multiple rules, such as 0.4 : Int = |(0:9), the probability will be evenly divided over all these rules.

File writing

Saving & loading context-free grammars

If you want to store a grammar on the disk, you can use the store_csg, read_csg and functions to store and read grammars respectively. The store_csg grammar can also be used to store probabilistic grammars. To read probabilistic grammars, use read_pcsg. The stored grammar files can also be opened using a text editor to be modified, as long as the contents of the file doesn't violate the syntax for defining grammars.

HerbGrammar.store_csg(g₇, "demo.txt")
HerbGrammar.read_csg("demo.txt")
1: Int = 0
2: Int = 1
3: Int = 2
4: Int = 3
5: Int = 4
6: Int = 5
7: Int = 6
8: Int = 7
9: Int = 8
10: Int = 9
11: Int = Int * Int
12: Int = x
13: Int = Int - Int

Saving & loading context-sensitive grammars

Saving and loading context-sensitive grammars is very similar to how it is done with context-free grammars. The only difference is that an additional file is created for the constraints. The file that contains the grammars can be edited and can also be read using the reader for context-free grammars. The file that contains the constraints cannot be edited.

HerbGrammar.store_csg( g₈, "demo.grammar", "demo.constraints")
g₈, g₈.constraints
(1: Int = 0
2: Int = 1
3: Int = 2
4: Int = 3
5: Int = 4
6: Int = 5
7: Int = 6
8: Int = 7
9: Int = 8
10: Int = 9
11: Int = Int + Int
12: Int = Int * Int
13: Int = x
, AbstractConstraint[Contains(13)])
g₁₀  = HerbGrammar.read_csg("demo.grammar", "demo.constraints")
1: Int = 0
2: Int = 1
3: Int = 2
4: Int = 3
5: Int = 4
6: Int = 5
7: Int = 6
8: Int = 7
9: Int = 8
10: Int = 9
11: Int = Int + Int
12: Int = Int * Int
13: Int = x
g₁₀, g₁₀.constraints
(1: Int = 0
2: Int = 1
3: Int = 2
4: Int = 3
5: Int = 4
6: Int = 5
7: Int = 6
8: Int = 7
9: Int = 8
10: Int = 9
11: Int = Int + Int
12: Int = Int * Int
13: Int = x
, AbstractConstraint[Contains(13)])