Site icon JVM Advent

Five Golden Backticks

 

Five Kinds of Strings

Ever since Java 1.0, we’ve had string literals "like this". What other strings might we want? Other programming languages give us:

Here I use a syntax that is reminiscent of Scala for demonstration. Other languages made different choices. For example, JavaScript uses backticks for interpolation.

Which of these features would I love most to have in Java? For me, it would be compile-time syntax checking. Right now, IDEs can make an educated guess that a particular string is likely to be, say, a regex, and give a warning if it is malformed. But it would be so much nicer if it was a compile-time error.

Of course, that’s a hard problem. There is no mechanism for adding pluggable checks at compile-time other than annotation processing. It is possible to provide annotations that check string content, and indeed the Checker Framework does just that. But you annotate variables, not string literals, so it isn’t the same thing.

It would also be nice if there was a standard way of doing interpolation and formatting. Right now, we have String.format and MessageFormat.format which are both useful but incompatible.

Instead, JEP 326 promises us raw/multiline strings. That’s nice too.

Raw Strings

Consider for example searching for a period with a regex. The regex is \. since you must escape a period in a regex. So in Java, it’s Pattern.compile("\\."). To match a backslash, it’s Pattern.compile("\\\\"). This can get really confusing.

In fact, it’s so confusing that the author of the JEP gets it wrong—or maybe has a subtle sense of humor. The author’s example is Pattern.compile("\\\"") to match a ". Of course, you don’t need to escape that in a regex, so Pattern.compile("\"") would work fine. Which confirms the point that all that escaping is a mess.

The proposed remedy is simple. Enclose the string in backticks `...`. Nothing inside the backticks needs to be escaped: Pattern.compile(`\.`)

But what if the string contains backticks?

In Scala and Kotlin, you use """ delimiters, but that begs the question. What if the string contains """?

This is where the Java designers came up with a clever idea that I had not seen before. You can use any number of backticks to start a raw string, then use the same number of backticks to end it. For example, if you know that your string doesn’t have five consecutive backticks inside, do this:

String s = `````. . .
. . .
. . .
. . .`````; // Five golden backticks :-)

Everything in the string is taken exactly as it is. If it is some HTML or SQL or whatever that you developed elsewhere, just paste it in.

Actually, the “exactly as it is” has one exception. All line endings are normalized to \n, even if the source file uses Windows-style \r\n line endings.

A Couple of Flies in the Ointment

Stephen Colebourne noted that two backticks could be confused with the empty string. If you have something like

s = ``;
t = ``;

then that doesn’t set s and t to the empty string, but s is set to the string ";\nt = ".

There is a good puzzler in there.

Raw strings cannot start or end with backticks. For example, suppose you want to put the following piece of Markdown into a Java string:

<

pre>“`
alert(“Hello, World!”)

&lt;/pre&gt;
You obviously can&#039;t add backticks at the start, so the best thing you can do is add a space or newline before the &lt;code&gt;```&lt;/code&gt;. And the same holds for the end. Java requires that the ending delimiters exactly match the start. (In contrast, in Scala, you can write &lt;code&gt;"""Hello, "World""""&lt;/code&gt;, and the compiler figures out that one of the terminal quotation marks belongs to the string.)

So, you can write:
&lt;pre&gt;String markdown = `````

alert("Hello, World!")
“`
““`.strip();

The strip call removes the \n at the beginning and the end. Or you can just leave the newlines in place if they don’t matter.

(The strip method is new to Java 11. It is similar to trim, but it strips out leading and trailing Unicode whitespace, whereas trim removes characters ≤ 32, which isn’t the same thing. These days, you should use strip, not trim.)

IDE Support

IntelliJ 2018.3 can convert strings with backslashes into raw strings when you activate the experimental features of JDK 12. (See this blog post for the details.)

I tried converting an old-fashioned multiline string:

   private static final String authorPublisherQuery = "SELECT Books.Price, Books.Title\n"
      + " FROM Books, BooksAuthors, Authors, Publishers\n"
      + " WHERE Authors.Author_Id = BooksAuthors.Author_Id AND BooksAuthors.ISBN = Books.ISBN\n"
      + " AND Books.Publisher_Id = Publishers.Publisher_Id AND Authors.Name = ?\n"
      + " AND Publishers.Name = ?\n";

That didn’t work, but there is no reason why it couldn’t in the future.

Indentation Management

I prefer to line up multiline strings at the leftmost column. For example,

   public static void main(String[] args) {
      String myNameInABox = `
+-----+
| Cay |
+-----+`.strip(); 
      System.out.print(myNameInABox);
   }

It makes the multiline string stand out from the Java code. And it gives you plenty of horizontal room for the whatever-it-is that you are putting into the raw string.

However, quite a few people seem to prefer a style where the contents of the multiline string is aligned with the Java code:

   ...
   String myNameInABox = `
                         +-----+
                         | Cay |
                         +-----+
                         `.align();
   System.out.print(myNameInABox);

The align method (defined in Java 12) removes the common prefixes of spaces as well as leading and trailing blank lines.

There is a risk with this approach. If a mixture of tabs and spaces is used, then each tab is counted as a single space. Something may look aligned to you in your IDE but not to the align method. Of course, your IDE could warn you about such a situation. IntelliJ 2018.3 doesn’t currently do that.

The Road Not Taken

Many of the discussions on new features take place on the “Amber Spec” mailing list that you can observe at http://mail.openjdk.java.net/pipermail/amber-spec-observers/, so you can see which alternatives have been considered.

There was a vigorous discussion on whether indentations should automatically be stripped. Predictably, this was not in the end adopted.

What about Unicode escapes inside raw strings? Should the a \u0060 be a backtick? Sanity prevailed, and it was decided that “raw means raw”.

Should two backticks be outlawed because `` could be confused with an empty string? No—having a simple rule of “any number of backticks on either side” was deemed more important.

What about a newline following the opening backticks? There was some back and forth on whether it should be stripped. I still think it is a bit sad that more attention wasn’t paid to this issue. Including the newline in the opening delimiter would have solved two issues: initial backticks and alignment at the leftmost column.

I timidly asked why the closing delimiter couldn’t be “at least as many backticks as the opening delimiter” (similar to Scala), so that raw strings can end in backticks. Unfortunately, I got no response.

Just now, the JEP was withdrawn, to my amazement. I never saw this happen before. Other JEPs with far bigger problems sail through the review process despite vigorous expression of disgust. The reasons given for withdrawal were mostly sensible:

I thought this feature was eminently fixable. For example, a prefix à la Scala and n ≥ 3 " characters optionally followed by a newline. At any rate, backtick-delimited strings are now the Ghost of Christmas Past.

Author: Cay Horstmann

Exit mobile version