An Approach to Internal Domain-Specific Languages in Java
Posted by
Alex Ruiz and Jeff Bay
on
Feb 19, 2008 04:29 PM
- Java
- Topics
- Domain Specific Languages
Introduction
A domain-specific language (DSL) is commonly described as a
computer language targeted at a particular kind of problem and it is
not planned to solve problems outside of its domain. DSLs have been
formally studied for many years. Until recently, however, internal DSLs
have been written into programs only as a happy accident by programmers
simply trying to solve their problems in the most readable and concise
way possible. Lately, with the advent of Ruby and other dynamic
languages, there has been a growing interest in DSLs amongst
programmers. These loosely structured languages offer an approach to
DSLs which allow a minimum of grammar and therefore the most direct
representation of a particular language. However, discarding the
compiler and the ability to use the most powerful modern IDEs such as
Eclipse is a definite disadvantage with this approach. The authors have
successfully compromised between the two approaches, and will argue
that is quite possible and helpful to approach API design from the DSL
orientation in a structured language such as Java. This article
describes how it is possible to write domain-specific languages using
the Java language and suggests some patterns for constructing them.
Is Java suited for creation of internal Domain-Specific Languages?
Before we examine the Java language as a tool for creation of DSLs
we need to introduce the concept of "internal DSLs." An internal DSL is
created with the main language of an application without requiring the
creation (and maintenance) of custom compilers and interpreters. Martin
Fowler has written extensively on the various types of DSL, internal
and external, as well as some nice examples of each. Creating a DSL in
a language like Java, however, he only addresses in passing.
It is important to note as well that it is difficult to
differentiate between a DSL and an API. In the case of internal DSLs,
they are essentially the same. When thinking in terms of DSL, we
exploit the host language to create a readable API with a limited
scope. "Internal DSL" is more or less a fancy name for an API that has
been created thinking in terms of readability and focusing on a
particular problem of a specific domain.
Any internal DSL is limited to the syntax and structure of its base
language. In the case of Java, the obligatory use of curly braces,
parenthesis and semicolons, and the lack of closures and
meta-programming may lead to a DSL that is more verbose that one
created with a dynamic language.
On the bright side, by using the Java language we can take
advantage of powerful and mature IDEs like Eclipse and IntelliJ IDEA,
which can make creation, usage and maintenance of DSLs easier thanks to
features like "auto-complete," automatic refactoring and debugging. In
addition, new language features in Java 5 (generics, varargs and static
imports) can help us create a more compact API than previous versions
of the language.
In general, a DSL written in Java will not lead to a language that
a business user can create from scratch. It will lead to a language
that is quite readable by a business user, as well as
being very intuitive to read and write from the perspective of the
programmer. It has the advantage over an external DSL or a DSL written
in a dynamic language that the compiler can enforce correctness along
the way, and flag inappropriate usage where Ruby or Pearl would happily
accept nonsensical input and fail at run-time. This reduces the
verbosity of testing substantially and can dramatically improve
application quality. Using the compiler to improve quality in this way
is an art however, and currently, many programmers are bemoaning the
"hard work" of satisfying the compiler instead of using it to build a
language that uses syntax to enforce semantics.
There are advantages and disadvantages to using Java for creation
of DSLs. In the end, your business needs and the environment you work
in will determine whether it is the right choice for you.
Java as a platform for internal DSLs
Dynamically constructing SQL is a great example where building a
"DSL"appropriate to the domain of SQL is a compelling advantage.
Traditional Java code that uses SQL would look something like the following:
String sql = "select id, name " + "from customers c, order o " + "where " + "c.since >= sysdate - 30 and " + "sum(o.total) > " + significantTotal + " and " + "c.id = o.customer_id and " + "nvl(c.status, ‘DROPPED‘) != ‘DROPPED‘";
An alternative representation taken from a recent system worked on by the authors:
Table c = CUSTOMER.alias(); Table o = ORDER.alias(); Clause recent = c.SINCE.laterThan(daysEarlier(30)); Clause hasSignificantOrders = o.TOTAT.sum().isAbove(significantTotal); Clause ordersMatch = c.ID.matches(o.CUSTOMER_ID); Clause activeCustomer = c.STATUS.isNotNullOr("DROPPED"); String sql = CUSTOMERS.where(recent.and(hasSignificantOrders) .and(ordersMatch) .and(activeCustomer) .select(c.ID, c.NAME) .sql();
The DSL version has several advantages. The latter version was able to accommodate a switch to using PreparedStatement s transparently - the String
version requires extensive modification to switch to using bind
variables. The latter will not compile if the quoting is incorrect or
an integer parameter is passed to a date column for comparison. The
phrase "nvl(foo, ‘X‘) != ‘X‘ " is a specific form found in
Oracle SQL. It is virtually unreadable to a non-Oracle SQL programmer
or anyone unfamiliar with SQL. That idiom in SQL Server, for example,
would be "(foo is null or foo != ‘X‘) ." By replacing this phrase with the more easily understandable and language-like "isNotNullOr(rejectedValue) ,"
readability has been enhanced, and the system is protected from a later
need to change the implementation to take advantage of facilities
offered by another database vendor.
Creating internal DSLs in Java
The best way to create a DSL is by first prototyping the desired
API and then work on implementing it given the constraints of the base
language. Implementation of a DSL will involve testing continuously to
ensure that we are advancing in the right direction. This "prototype
and test" approach is what Test-Driven Development (TDD) advocates.
When using Java to create a DSL, we might want to create the DSL
through a fluent interface. A fluent interface provides a compact and
yet easy-to-read representation of the domain problem we want to model.
Fluent interfaces are implemented using method chaining. It is
important to note that method chaining by itself is not enough to
create a DSL. A good example is Java‘s StringBuilder which method "append " always return an instance of the same StringBuilder . Here is an example:
StringBuilder b = new StringBuilder(); b.append("Hello. My name is ") .append(name) .append(" and my age is ") .append(age);
This example does not solve any domain-specific domain.
In addition to method chaining, static factory methods and imports
are a great aid in creating a compact, yet readable DSL. We will cover
these techniques in more detail in the following sections.
1. Method Chaining
There are two approaches to create a DSL using method chaining, and
both are related to the return value of the methods in the chain. Our
options are to return this or to return an intermediate object, depending on what we are trying to do.
1.1 Returning this
We usually return this when calls to the methods in the chain can be:
- optional
- called in any order
- called any number of times
We have found two use cases for this approach:
- chaining of related object behavior
- simple construction/configuration of an object
1.1.1 Chaining related object behavior
Many times, we only want to chain methods of an object to reduce
unnecessary text in our code, by simulating dispatch of "multiple
messages" (or multiple method calls) to the same object. The following
code listing shows an API used to test Swing GUIs. The test verifies
that an error message is displayed if a user tries to log into a system
without entering her password.
DialogFixture dialog = new DialogFixture(new LoginDialog()); dialog.show(); dialog.maximize(); TextComponentFixture usernameTextBox = dialog.textBox("username"); usernameTextBox.clear(); usernameTextBox.enter("leia.organa"); dialog.comboBox("role").select("REBEL"); OptionPaneFixture errorDialog = dialog.optionPane(); errorDialog.requireError(); errorDialog.requireMessage("Enter your password");
Although the code is easy to read, it is verbose and requires too much typing.
The following are two methods from TextComponentFixture that were used in our example:
public void clear() { target.setText(""); }
public void enterText(String text) { robot.enterText(target, text); }
We can simplify our testing API by simply returning this , and therefore enable method chaining:
public TextComponentFixture clear() { target.setText(""); return this; }
public TextComponentFixture enterText(String text) { robot.enterText(target, text); return this; }
After enabling method chaining in all the test fixtures, our testing code is now reduced to:
DialogFixture dialog = new DialogFixture(new LoginDialog()); dialog.show().maximize(); dialog.textBox("username").clear().enter("leia.organa"); dialog.comboBox("role").select("REBEL"); dialog.optionPane().requireError().requireMessage("Enter your password");
The result is more compact and readable code. As previously
mentioned, method chaining by itself does not imply having a DSL. We
need to chain methods that correspond to related behavior of an object
that together solve a domain-specific problem. In our example, the
domain-specific problem was Swing GUI testing.
1.1.2 Simple construction/configuration of an object
This case is similar to the previous one, with the difference that
instead of just chaining related methods of an object, we create a
"builder" to create and/or configure objects using a fluent interface.
The following example illustrates a "dream car" created using setters:
DreamCar car = new DreamCar(); car.setColor(RED); car.setFuelEfficient(true); car.setBrand("Tesla");
The code for the DreamCar class is pretty simple:
// package declaration and imports
public class DreamCar {
private Color color; private String brand; private boolean leatherSeats; private boolean fuelEfficient; private int passengerCount = 2;
// getters and setters for each field }
Although creating a DreamCar is easy and the code is quite readable, we can create more compact code using a car builder:
// package declaration and imports
public class DreamCarBuilder {
public static DreamCarBuilder car() { return new DreamCarBuilder(); }
private final DreamCar car;
private DreamCarBuilder() { car = new DreamCar(); }
public DreamCar build() { return car; }
public DreamCarBuilder brand(String brand) { car.setBrand(brand); return this; }
public DreamCarBuilder fuelEfficient() { car.setFuelEfficient(true); return this; }
// similar methods to set field values }
Using the builder we can rewrite the DreamCar creation as follows:
DreamCar car = car().brand("Tesla") .color(RED) .fuelEfficient() .build();
Using a fluent interface, once again, reduced noise in code, which
resulted in more readable code. It is imperative to note that, when
returning this , any method in the chain can be called at any time and any number of times. In our example, we can call the method color
as many times as we wish, and each call will override the value set by
the previous call, which in the context of the application, may be
valid.
Another important observation is that there is no compiler checking
to enforce required field values. A possible solution would be to throw
exceptions at run-time if any object creation and/or configuration rule
is violated (e.g. a required field missing). It is possible to achieve
rule validation by returning intermediate objects from the methods in
the chain.
1.2 Returning an intermediate object
Returning an intermediate object from methods in a fluent interface has some advantages over returning this :
- we can use the compiler to enforce business rules (e.g. required fields)
- we
can guide our users of the fluent interface through a specific path by
limiting the available options for the next element in the chain
- gives
API creators greater control of which methods a user can (or must)
call, as well as the order and how many times a user of the API can
call a method
The following example illustrates a vacation created using constructor arguments:
Vacation vacation = new Vacation("10/09/2007", "10/17/2007", "Paris", "Hilton", "United", "UA-6886");
The benefit of this approach is that it forces our users to specify
all required parameters. Unfortunately, there are too many parameters
and they do not communicate their purpose. Do "Paris" and "Hilton"
refer to the city and hotel of destination? Or do they refer to the
name of our companion? :)
A second approach is to use setters as a way to document each parameter:
Vacation vacation = new Vacation(); vacation.setStart("10/09/2007"); vacation.setEnd("10/17/2007"); vacation.setCity("Paris"); vacation.setHotel("Hilton"); vacation.setAirline("United"); vacation.setFlight("UA-6886");
Our code is more readable now, but it is also verbose. A third
approach could be to create a fluent interface to build a vacation,
like in the example in the previous section:
Vacation vacation = vacation().starting("10/09/2007") .ending("10/17/2007") .city("Paris") .hotel("Hilton") .airline("United") .flight("UA-6886");
This version is more compact and readable, but we have lost the
compiler checks for missing fields that we had in the first version
(the one using a constructor.) In another words, we are not exploiting
the compiler to check for possible mistakes. At this point, the best we
can do with this approach is to throw exceptions at run-time if any of
the required fields was not set.
The following is a fourth, more sophisticated version of the fluent
interface. This time, methods return intermediate objects instead of this :
Period vacation = from("10/09/2007").to("10/17/2007"); Booking booking = vacation.book(city("Paris").hotel("Hilton")); booking.add(airline("united").flight("UA-6886");
Here we have introduced the concept of Period , a Booking , as well as a Location and BookableItem (Hotel and Flight ), and an Airline . The airline, in this context, is acting as a factory for Flight objects; the Location is acting as a factory for Hotel
items, etc. Each of these objects is implied by the booking syntax we
desired, but will almost certainly grow to have many other important
behaviors in the system as well. The use of intermediate objects allows
us to introduce compiler-checked constraints of what the user can and
cannot do. For example, if a user of the API tries to book a vacation
with a starting date and without an ending date, the code will simply
not compile. As we mentioned previously, we can build a language that
uses syntax to enforce semantics.
We have also introduced the usage of static factory methods in the
previous example. Static factory methods, when used with static
imports, can help us create more compact fluent interfaces. For
example, without static imports, the previous example will need to be
coded like this:
Period vacation = Period.from("10/09/2007").to("10/17/2007"); Booking booking = vacation.book(Location.city("Paris").hotel("Hilton")); booking.add(Flight.airline("united").flight("UA-6886");
The example above is not as readable as the one using static
imports. We will cover static factory methods and imports in more
detail in the following section.
Here is a second example of a DSL in Java. This time, we are simplifying usage of Java reflection:
Person person = constructor().withParameterTypes(String.class) .in(Person.class) .newInstance("Yoda");
method("setName").withParameterTypes(String.class) .in(person) .invoke("Luke");
field("name").ofType(String.class) .in(person) .set("Anakin");
We need to be cautious when using method chaining. It is quite easy
to overuse, resulting in "train wrecks" of many calls chained together
in a single line. This can lead to many problems, including a
significant reduction in readability and vagueness in a stack trace
when exceptions arise.
2. Static Factory Methods and Imports
Static factory methods and imports can make an API more compact and
easier to read. We have found that static factory methods are a
convenient way to simulate named parameters in Java, a feature that
many developers wish the language had. For example, consider this code,
which purpose is to test a GUI by simulating a user selecting a row in
a JTable :
dialog.table("results").selectCell(6, 8); // row 6, column 8
Without the comment "// row 6, column 8 ," it would be
easy to misunderstand (or not understand at all) what the purpose of
this code is. We would need to spend some extra time checking
documentation or reading some more lines of code to understand what ‘6‘
and ‘8‘ stand for. We could also declare the row and column indices as
variables or better yet, as constants:
int row = 6; int column = 8; dialog.table("results").selectCell(row, column);
We have improved readability of code, at the expense of adding more
code to maintain. To keep code as compact as possible, the ideal
solution would to write something like this:
dialog.table("results").selectCell(row: 6, column: 8);
Unfortunately, we cannot do that because Java does not support
named parameters. On the bright side, we can simulate them by using a
static factory method and static imports, to get something like:
dialog.table("results").selectCell(row(6).column(8));
We can start by changing the signature of the method, by replacing
all the parameters with one object that will contain them. In our
example, we can change the signature of selectCell(int, int) to:
selectCell(TableCell);
TableCell will contain the values for the row and column indices:
public final class TableCell {
public final int row; public final int column;
public TableCell(int row, int column) { this.row = row; this.column = column; } }
At this point, we just have moved the problem around: TableCell ‘s constructor is still taking two int values. The next step is to introduce a factory of TableCell s, which will have one method per parameter in the original version of selectCell . In addition, to force users to use the factory, we need to change TableCell ‘s constructor to private :
public final class TableCell {
public static class TableCellBuilder { private final int row;
public TableCellBuilder(int row) { this.row = row; }
public TableCell column(int column) { return new TableCell(row, column); } }
public final int row; public final int column;
private TableCell(int row, int column) { this.row = row; this.column = column; } }
By having the factory TableCellBuilder we can create a TableCell having one parameter per method call. Each method in the factory communicates the purpose of its parameter:
selectCell(new TableCellBuilder(6).column(8));
The last step is to introduce a static factory method to replace usage of TableCellBuilder constructor, which is not communicating what 6 stands for. As we did previously, we need to make the constructor private to force our users to use the factory method:
public final class TableCell {
public static class TableCellBuilder { public static TableCellBuilder row(int row) { return new TableCellBuilder(row); }
private final int row;
private TableCellBuilder(int row) { this.row = row; }
private TableCell column(int column) { return new TableCell(row, column); } }
public final int row; public final int column;
private TableCell(int row, int column) { this.row = row; this.column = column; } }
Now we only need to add to our code calling selectCell is include an static import for the method row in TableCellBuilder . To refresh our memories, this is how calls to selectCell look like:
dialog.table("results").selectCell(row(6).column(8));
Our example shows that with some little extra work we can overcome
some of the limitations of our host language. As we mentioned before,
this is only one of the multiple ways we can improve code readability
using static factory methods and imports. The following code listing
shows an alternative way to solve the same problem of table indices,
using a static factory methods and imports in a different way:
/** * @author Mark Alexandre */ public final class TableCellIndex {
public static final class RowIndex { final int row; RowIndex(int row) { this.row = row; } }
public static final class ColumnIndex { final int column; ColumnIndex(int column) { this.column = column; } }
public final int row; public final int column; private TableCellIndex(RowIndex rowIndex, ColumnIndex columnIndex) { this.row = rowIndex.row; this.column = columnIndex.column; }
public static TableCellIndex cellAt(RowIndex row, ColumnIndex column) { return new TableCellIndex(row, column); }
public static TableCellIndex cellAt(ColumnIndex column, RowIndex row) { return new TableCellIndex(row, column); }
public static RowIndex row(int index) { return new RowIndex(index); }
public static ColumnIndex column(int index) { return new ColumnIndex(index); } }
The second version of the solution is more flexible than the first
one, because allows us to specify the row and column indices in two
ways:
dialog.table("results").select(cellAt(row(6), column(8)); dialog.table("results").select(cellAt(column(3), row(5));
Organizing Code
It is a lot easier to organize code of a fluent interface which methods return this ,
than the one which methods return intermediate objects. In the case of
the former, we end up with fewer classes that encapsulate the logic of
the fluent interface, allowing us to use the same rules or conventions
we use when organizing non-DSL code.
Organizing code of a fluent interface using intermediate objects as
return type is trickier because we have the logic of the fluent
interface scattered across several small classes. Since these classes,
together, as a whole, form our fluent interface, it makes sense to keep
them together and we might not want them to mix them with classes
outside of the DSL. We have found two options:
- Create intermediate objects as inner classes
- Have intermediate objects in their own top-level classes, all in the same package
The decision of the approach to use to decompose our system can
depend on several factors the syntax we want to achieve, the purpose of
the DSL, the number and size (in lines of code) of intermediate objects
(if any,) and how the DSL can fit with the rest of the code base, as
well as any other DSLs.
Documenting Code
As in organizing code, documenting a fluent interface which methods return this is a lot easier than documenting a fluent interface returning intermediate objects, especially if documenting using Javadoc.
Javadoc displays documentation of one class at a time, which may
not be the best in a DSL using intermediate objects: the DSL is
composed of a group of classes, not individual ones. Since we cannot
change how Javadoc displays the documentation of our APIs, we have
found that having an example usage of the fluent interface (including
all the participating classes) with links to each of the methods in the
chain, in the package.html file, can minimize the limitations of
Javadoc.
We should be careful and not duplicate documentation, because it
will increase maintenance costs for the API creators. The best approach
is to rely on tests as executable documentation as much as possible.
In Conclusion
Java can be suited to create internal domain-specific languages
that developers can find very intuitive to read and write, and still be
quite readable by business users. DSLs created in Java may be more
verbose than the ones created with dynamic languages. On the bright
side, by using Java we can exploit the compiler to enforce semantics of
a DSL. In addition we can count on mature and powerful Java IDEs that
can make creation, usage and maintenance of DSLs a lot easier.
Creating DSLs in Java also requires more work from API designers.
There is more code and more documentation to create and maintain. The
results can be rewarding though. Users of our APIs will see
improvements in their code bases. Their code will be more compact and
easier to maintain, which can simplify their lives.
There are many different ways to create DSLs in Java, depending on
what we are trying to accomplish. Although there is no "one size fits
all" approach, we have found that combining method chaining and static
factory methods and imports can lead to a clean, compact API that is
both easy to write and read.
In summary, there are advantages and disadvantages when using Java
for creation of DSLs. It is up to us, the developers, to decide whether
is the right choice based on the needs of our projects.
As a side note, Java 7
may include new language features (such as closures) that may help us
create less verbose DSLs. For a comprehensive list of the proposed
features, please visit Alex Miller‘s blog.
About the Authors
Alex Ruiz is a Software Engineer in the development tools
organization at Oracle. Alex enjoys reading anything related to Java,
testing, OOP, and AOP and has programming as his first love. Before
joining Oracle, Alex was a consultant for ThoughtWorks. Alex maintains
a blog at http://www./page/alexRuiz.
Jeff Bay is a Senior Software Engineer at a hedge fund in New York.
He has repeatedly built high quality, high velocity XP teams working on
diverse systems such as program enrollment for Onstar, leasing
software, web servers, construction project management, and others. He
approaches software with a passion for removing duplication and
preventing bugs in order to improve developer efficiency and time on
task.
Resources
Good stuff
by
Manuel Palacio
Posted
Feb 19, 2008 4:16 PM
|
14 comments
Reply