Five Kinds of Strings
Ever since Java 1.0, we’ve had string literals "like this"
. What other strings might we want? Other programming languages give us:
- Expression interpolation:
s"I am ${age - 10} years old."
- Interpolation with formatting:
f"Price: $price%8.2f"
- Strings with internal syntax that is checked at compile time:
r"[0-9]+([.,][0-9]*)?
orxml"<a href='http://java.sun.com'>The Java home page</a>"
- Raw strings in which backslashes are not escapes:
raw"\.*"
- Multiline strings that can contain newlines:
""" +-----+ | Cay | +-----+ """
Here I use a syntax that is reminiscent of Scala for demonstration. Other languages made different choices. For example, JavaScript uses backticks for interpolation.
Which of these features would I love most to have in Java? For me, it would be compile-time syntax checking. Right now, IDEs can make an educated guess that a particular string is likely to be, say, a regex, and give a warning if it is malformed. But it would be so much nicer if it was a compile-time error.
Of course, that’s a hard problem. There is no mechanism for adding pluggable checks at compile-time other than annotation processing. It is possible to provide annotations that check string content, and indeed the Checker Framework does just that. But you annotate variables, not string literals, so it isn’t the same thing.
It would also be nice if there was a standard way of doing interpolation and formatting. Right now, we have String.format
and MessageFormat.format
which are both useful but incompatible.
Instead, JEP 326 promises us raw/multiline strings. That’s nice too.
Raw Strings
Consider for example searching for a period with a regex. The regex is \.
since you must escape a period in a regex. So in Java, it’s Pattern.compile("\\.")
. To match a backslash, it’s Pattern.compile("\\\\")
. This can get really confusing.
In fact, it’s so confusing that the author of the JEP gets it wrong—or maybe has a subtle sense of humor. The author’s example is Pattern.compile("\\\"")
to match a "
. Of course, you don’t need to escape that in a regex, so Pattern.compile("\"")
would work fine. Which confirms the point that all that escaping is a mess.
The proposed remedy is simple. Enclose the string in backticks `...`
. Nothing inside the backticks needs to be escaped: Pattern.compile(`\.`)
But what if the string contains backticks?
In Scala and Kotlin, you use """
delimiters, but that begs the question. What if the string contains """
?
This is where the Java designers came up with a clever idea that I had not seen before. You can use any number of backticks to start a raw string, then use the same number of backticks to end it. For example, if you know that your string doesn’t have five consecutive backticks inside, do this:
String s = `````. . . . . . . . . . . .`````; // Five golden backticks :-)
Everything in the string is taken exactly as it is. If it is some HTML or SQL or whatever that you developed elsewhere, just paste it in.
Actually, the “exactly as it is” has one exception. All line endings are normalized to \n
, even if the source file uses Windows-style \r\n
line endings.
A Couple of Flies in the Ointment
Stephen Colebourne noted that two backticks could be confused with the empty string. If you have something like
s = ``; t = ``;
then that doesn’t set s
and t
to the empty string, but s
is set to the string ";\nt = "
.
There is a good puzzler in there.
Raw strings cannot start or end with backticks. For example, suppose you want to put the following piece of Markdown into a Java string:
<
pre>“`
alert(“Hello, World!”)
</pre> You obviously can't add backticks at the start, so the best thing you can do is add a space or newline before the <code>```</code>. And the same holds for the end. Java requires that the ending delimiters exactly match the start. (In contrast, in Scala, you can write <code>"""Hello, "World""""</code>, and the compiler figures out that one of the terminal quotation marks belongs to the string.) So, you can write: <pre>String markdown = `````
alert("Hello, World!")
“`
““`.strip();
The strip
call removes the \n
at the beginning and the end. Or you can just leave the newlines in place if they don’t matter.
(The strip
method is new to Java 11. It is similar to trim
, but it strips out leading and trailing Unicode whitespace, whereas trim
removes characters ≤ 32, which isn’t the same thing. These days, you should use strip
, not trim
.)
IDE Support
IntelliJ 2018.3 can convert strings with backslashes into raw strings when you activate the experimental features of JDK 12. (See this blog post for the details.)
I tried converting an old-fashioned multiline string:
private static final String authorPublisherQuery = "SELECT Books.Price, Books.Title\n" + " FROM Books, BooksAuthors, Authors, Publishers\n" + " WHERE Authors.Author_Id = BooksAuthors.Author_Id AND BooksAuthors.ISBN = Books.ISBN\n" + " AND Books.Publisher_Id = Publishers.Publisher_Id AND Authors.Name = ?\n" + " AND Publishers.Name = ?\n";
That didn’t work, but there is no reason why it couldn’t in the future.
Indentation Management
I prefer to line up multiline strings at the leftmost column. For example,
public static void main(String[] args) { String myNameInABox = ` +-----+ | Cay | +-----+`.strip(); System.out.print(myNameInABox); }
It makes the multiline string stand out from the Java code. And it gives you plenty of horizontal room for the whatever-it-is that you are putting into the raw string.
However, quite a few people seem to prefer a style where the contents of the multiline string is aligned with the Java code:
... String myNameInABox = ` +-----+ | Cay | +-----+ `.align(); System.out.print(myNameInABox);
The align
method (defined in Java 12) removes the common prefixes of spaces as well as leading and trailing blank lines.
There is a risk with this approach. If a mixture of tabs and spaces is used, then each tab is counted as a single space. Something may look aligned to you in your IDE but not to the align
method. Of course, your IDE could warn you about such a situation. IntelliJ 2018.3 doesn’t currently do that.
The Road Not Taken
Many of the discussions on new features take place on the “Amber Spec” mailing list that you can observe at http://mail.openjdk.java.net/pipermail/amber-spec-observers/, so you can see which alternatives have been considered.
There was a vigorous discussion on whether indentations should automatically be stripped. Predictably, this was not in the end adopted.
What about Unicode escapes inside raw strings? Should the a \u0060
be a backtick? Sanity prevailed, and it was decided that “raw means raw”.
Should two backticks be outlawed because ``
could be confused with an empty string? No—having a simple rule of “any number of backticks on either side” was deemed more important.
What about a newline following the opening backticks? There was some back and forth on whether it should be stripped. I still think it is a bit sad that more attention wasn’t paid to this issue. Including the newline in the opening delimiter would have solved two issues: initial backticks and alignment at the leftmost column.
I timidly asked why the closing delimiter couldn’t be “at least as many backticks as the opening delimiter” (similar to Scala), so that raw strings can end in backticks. Unfortunately, I got no response.
Just now, the JEP was withdrawn, to my amazement. I never saw this happen before. Other JEPs with far bigger problems sail through the review process despite vigorous expression of disgust. The reasons given for withdrawal were mostly sensible:
- Two backticks look like an empty string.
- Raw strings can’t start with backticks.
- What if we want other kinds of strings?
- Should we really use up the backtick—the last unused printable ASCII character—for this feature?
I thought this feature was eminently fixable. For example, a prefix à la Scala and n ≥ 3 "
characters optionally followed by a newline. At any rate, backtick-delimited strings are now the Ghost of Christmas Past.
Alex December 16, 2018
> Instead, Java 12 gives us raw/multiline strings
not anymore
Cay Horstmann December 17, 2018 — Post Author
$ jshell --enable-preview
| Welcome to JShell -- Version 12-ea
| For an introduction type: /help intro
jshell> `````
...> five
...> golden
...> backticks
...> `````
$1 ==> "\nfive\ngolden\nbackticks\n"
jshell> Runtime.version()
$2 ==> 12-ea+24
Alex December 17, 2018
Keep this build! They are going to drop raw strings support from J12 🙂
http://openjdk.java.net/jeps/326
Alex December 17, 2018
Here is discussion https://mail.openjdk.java.net/pipermail/jdk-dev/2018-December/002401.html
Cay Horstmann December 17, 2018 — Post Author
Thanks! I missed that during the pre-holiday rush. I updated the article.