Defining Grammars in Herb.jl using HerbGrammar
The program space in Herb.jl is defined using a grammar. This notebook demonstrates how such a grammar can be created. There are multiple kinds of grammars, but they can all be defined in a very similar way.
Setup
First, we import the necessary Herb packages.
using Herb
Creating a simple grammar
This cell contains a very simple arithmetic grammar. The grammar is defined using the @csgrammar
macro. This macro converts the grammar definition in the form of a Julia expression into Herb's internal grammar representation. Macro's are executed during compilation. If you want to load a grammar during execution, have a look at the HerbGrammar.expr2csgrammar
function.
g₁ = HerbGrammar.@csgrammar begin
Int = 1
Int = 2
Int = 3
Int = Int * Int
Int = Int + Int
end
1: Int = 1 2: Int = 2 3: Int = 3 4: Int = Int * Int 5: Int = Int + Int
Defining every integer one-by-one can be quite tedious. Therefore, it is also possible to use the following syntax that makes use of a Julia iterator:
g₂ = HerbGrammar.@csgrammar begin
Int = |(0:9)
Int = Int * Int
Int = Int + Int
end
1: Int = 0 2: Int = 1 3: Int = 2 4: Int = 3 5: Int = 4 6: Int = 5 7: Int = 6 8: Int = 7 9: Int = 8 10: Int = 9 11: Int = Int * Int 12: Int = Int + Int
You can do the same with lists:
g₃ = HerbGrammar.@csgrammar begin
Int = |([0, 2, 4, 6, 8])
Int = Int * Int
Int = Int + Int
end
1: Int = 0 2: Int = 2 3: Int = 4 4: Int = 6 5: Int = 8 6: Int = Int * Int 7: Int = Int + Int
Variables can also be added to the grammar by just using the variable name:
g₄ = HerbGrammar.@csgrammar begin
Int = |(0:9)
Int = Int * Int
Int = Int + Int
Int = x
end
1: Int = 0 2: Int = 1 3: Int = 2 4: Int = 3 5: Int = 4 6: Int = 5 7: Int = 6 8: Int = 7 9: Int = 8 10: Int = 9 11: Int = Int * Int 12: Int = Int + Int 13: Int = x
Grammars can also work with functions. After all, +
and *
are just infix operators for Julia's identically-named functions. You can use functions that are provided by Julia, or functions that you wrote yourself:
begin
f(a) = a + 1
g₅ = HerbGrammar.@csgrammar begin
Int = |(0:9)
Int = Int * Int
Int = Int + Int
Int = f(Int)
Int = x
end
end
1: Int = 0 2: Int = 1 3: Int = 2 4: Int = 3 5: Int = 4 6: Int = 5 7: Int = 6 8: Int = 7 9: Int = 8 10: Int = 9 11: Int = Int * Int 12: Int = Int + Int 13: Int = f(Int) 14: Int = x
Similarly, we can also define the operator times (x) manually.
begin
×(a, b) = a * b
g₆ = HerbGrammar.@csgrammar begin
Int = |(0:9)
Int = a
Int = Int + Int
Int = Int × Int
end
end
1: Int = 0 2: Int = 1 3: Int = 2 4: Int = 3 5: Int = 4 6: Int = 5 7: Int = 6 8: Int = 7 9: Int = 8 10: Int = 9 11: Int = a 12: Int = Int + Int 13: Int = Int × Int
Working with grammars
If you want to implement something using these grammars, it is useful to know about the functions that you can use to manipulate grammars and extract information. This section is not complete, but it aims to give an overview of the most important functions.
It is recommended to also read up on Julia metaprogramming if you are not already familiar with the concept.
One of the most important things about grammars is that each rule has an index associated with it:
begin
g₇ = HerbGrammar.@csgrammar begin
Int = |(0:9)
Int = Int + Int
Int = Int * Int
Int = x
end
collect(enumerate(g₇.rules))
end
13-element Vector{Tuple{Int64, Any}}: (1, 0) (2, 1) (3, 2) (4, 3) (5, 4) (6, 5) (7, 6) (8, 7) (9, 8) (10, 9) (11, :(Int + Int)) (12, :(Int * Int)) (13, :x)
We can use this index to extract information from the grammar.
isterminal
isterminal
returns true
if a rule is terminal, i.e. it cannot be expanded. For example, rule 1 is terminal, but rule 11 is not, since it contains the non-terminal symbol :Int
.
HerbGrammar.isterminal(g₇, 1)
true
HerbGrammar.isterminal(g₇, 11)
false
return_type
This function is rather obvious; it returns the non-terminal symbol that corresponds to a certain rule. The return type for all rules in our grammar is :Int
.
HerbGrammar.return_type(g₇, 11)
:Int
child_types
child_types
returns the types of the nonterminal children of a rule in a vector. If you just want to know how many children a rule has, and not necessarily which types they have, you can use nchildren
HerbGrammar.child_types(g₇, 11)
2-element Vector{Symbol}: :Int :Int
HerbGrammar.nchildren(g₇, 11)
2
nonterminals
The nonterminals
function can be used to obtain a list of all nonterminals in the grammar.
HerbGrammar.nonterminals(g₇)
1-element Vector{Symbol}: :Int
Adding rules
It is also possible to add rules to a grammar during execution. This can be done using the add_rule!
function. The exclamation mark is a Julia convention and is appended to name if a function modifies its arguments (in our example the grammar).
A rule can be provided in the same syntax as is used in the grammar definition. The rule should be of the Expr
type, which is a built-in type for representing expressions. An easy way of creating Expr
values in Julia is to encapsulate it in brackets and use a colon as prefix:
HerbGrammar.add_rule!(g₇, :(Int = Int - Int))
1: Int = 0 2: Int = 1 3: Int = 2 4: Int = 3 5: Int = 4 6: Int = 5 7: Int = 6 8: Int = 7 9: Int = 8 10: Int = 9 11: Int = Int + Int 12: Int = Int * Int 13: Int = x 14: Int = Int - Int
Removing rules
It is also possible to remove rules in Herb.jl, however, this is a bit more involved. As said before, rules have an index associated with them. The internal representation of programs that are defined by the grammar makes use of those indices for efficiency. Blindly removing a rule would shift the indices of other rules, and this could mean that existing programs get a different meaning or become invalid.
Therefore, there are two functions for removing rules:
remove_rule!
removes a rule from the grammar, but fills its place with a placeholder. Therefore, the indices stay the same, and only programs that use the removed rule become invalid.cleanup_removed_rules!
removes all placeholders and shifts the indices of the other rules.
HerbGrammar.remove_rule!(g₇, 11)
1: Int = 0 2: Int = 1 3: Int = 2 4: Int = 3 5: Int = 4 6: Int = 5 7: Int = 6 8: Int = 7 9: Int = 8 10: Int = 9 11: nothing = nothing 12: Int = Int * Int 13: Int = x 14: Int = Int - Int
HerbGrammar.cleanup_removed_rules!(g₇)
1: Int = 0 2: Int = 1 3: Int = 2 4: Int = 3 5: Int = 4 6: Int = 5 7: Int = 6 8: Int = 7 9: Int = 8 10: Int = 9 11: Int = Int * Int 12: Int = x 13: Int = Int - Int
Context-sensitive grammars
Context-sensitive grammars introduce additional constraints compared to context-free grammars (like the simple grammar examples above). As before, we use the @csgrammar
macro:
g₈ = HerbGrammar.@csgrammar begin
Int = |(0:9)
Int = Int + Int
Int = Int * Int
Int = x
end
1: Int = 0 2: Int = 1 3: Int = 2 4: Int = 3 5: Int = 4 6: Int = 5 7: Int = 6 8: Int = 7 9: Int = 8 10: Int = 9 11: Int = Int + Int 12: Int = Int * Int 13: Int = x
Constraints can be added using the addconstraint!
function, which takes a context-sensitive grammar and a constraint and adds the constraint to the grammar.
For example, we can add a `constraint to enforce that the input symbol
x` (rule 13) appears at least once in the program, to avoid programs that are just a constant.
HerbGrammar.addconstraint!(g₈, Contains(13))
1-element Vector{AbstractConstraint}: Contains(13)
There is a dedicated tutorial for constraints in Herb.jl and how to work with them.
Probabilistic grammars
Herb.jl also supports probabilistic grammars. These grammars allow the user to assign a probability to each rule in the grammar. A probabilistic grammar can be defined in a very similar way to a standard grammar, but has some slightly different syntax:
begin
g₉ = HerbGrammar.@pcsgrammar begin
0.4 : Int = |(0:9)
0.2 : Int = Int + Int
0.1 : Int = Int * Int
0.3 : Int = x
end
for r ∈ 1:length(g₃.rules)
p = HerbGrammar.probability(g₈, r)
println("$p : $r")
end
end
The numbers before each rule represent the probability assigned to that rule. The total probability for each return type should add up to 1.0. If this isn't the case, Herb.jl will normalize the probabilities.
If a single line in the grammar definition represents multiple rules, such as 0.4 : Int = |(0:9)
, the probability will be evenly divided over all these rules.
File writing
Saving & loading context-free grammars
If you want to store a grammar on the disk, you can use the store_csg
, read_csg
and functions to store and read grammars respectively. The store_csg
grammar can also be used to store probabilistic grammars. To read probabilistic grammars, use read_pcsg
. The stored grammar files can also be opened using a text editor to be modified, as long as the contents of the file doesn't violate the syntax for defining grammars.
HerbGrammar.store_csg(g₇, "demo.txt")
HerbGrammar.read_csg("demo.txt")
1: Int = 0 2: Int = 1 3: Int = 2 4: Int = 3 5: Int = 4 6: Int = 5 7: Int = 6 8: Int = 7 9: Int = 8 10: Int = 9 11: Int = Int * Int 12: Int = x 13: Int = Int - Int
Saving & loading context-sensitive grammars
Saving and loading context-sensitive grammars is very similar to how it is done with context-free grammars. The only difference is that an additional file is created for the constraints. The file that contains the grammars can be edited and can also be read using the reader for context-free grammars. The file that contains the constraints cannot be edited.
HerbGrammar.store_csg( g₈, "demo.grammar", "demo.constraints")
g₈, g₈.constraints
(1: Int = 0 2: Int = 1 3: Int = 2 4: Int = 3 5: Int = 4 6: Int = 5 7: Int = 6 8: Int = 7 9: Int = 8 10: Int = 9 11: Int = Int + Int 12: Int = Int * Int 13: Int = x , AbstractConstraint[Contains(13)])
g₁₀ = HerbGrammar.read_csg("demo.grammar", "demo.constraints")
1: Int = 0 2: Int = 1 3: Int = 2 4: Int = 3 5: Int = 4 6: Int = 5 7: Int = 6 8: Int = 7 9: Int = 8 10: Int = 9 11: Int = Int + Int 12: Int = Int * Int 13: Int = x
g₁₀, g₁₀.constraints
(1: Int = 0 2: Int = 1 3: Int = 2 4: Int = 3 5: Int = 4 6: Int = 5 7: Int = 6 8: Int = 7 9: Int = 8 10: Int = 9 11: Int = Int + Int 12: Int = Int * Int 13: Int = x , AbstractConstraint[Contains(13)])