Site icon JVM Advent

Flaky tests

The major problem of test automation nowadays is an instability of automated tests.

As a developer practicing extreme programming, I write both the code and automated tests – both unit-tests and UI tests. That’s why I gathered a huge experience of debugging and fixing flaky tests, and want to share it with you. I hope it will help you to overcome your tests as well.

I will show some typical causes of flaky tests and best practices which could help to fix them. We will mostly consider web applications and UI tests that use a browser to open an application and click the buttons.

What is a flaky test?

The flaky test is a test that is sometimes green and sometimes red – without changing AUT (application under test). It’s really bad thing because it breaks the whole idea of automated testing. People need autotests to quickly check if the AUT works as expected. Most of the time tests should be green, and it should cause an alarm when they become red. Developers should drop all other tasks and fix the bugs as soon as possible.

But what happens if tests fail often?

Really bad thing. People get used to failing tests. It doesn’t cause alarm anymore. People cannot live in alarm mode every day, right? People know that most probably most of those failed tests did fail without a good reason. People lose trust to automated tests.

And what people do with flaky tests?

It may seem unbelievable, but people do the worst thing ever possible: people write code to restart failed tests. It means: people do ignore the problem. People waste their resources on writing a software that helps them to hide the problem. Even Google does it.

How many builds do actually fail?

Nobody has good statistics, but I often hear the number 30%. 30% of tests are flaky in an average company. It means that during every build, about 1/3 of all tests fail (while AUT is ok). Last year Google published an article where they said their number: 1.5%. Sound great, right? Much better than average!

But I think that it’s still a disaster. Our industry is in deep trouble.

What this 1.5% actually means?

Let’s imagine that you have 1000 automated tests in your project (many companies have much more). Having 1.5% of flaky tests, it means that ~10..20 tests will fail in every build. You have no green builds. After every build, you need to manually analyze its results. You need to manually execute the failed tests and assure that the functionality is still ok. What an irony, they call it automation!

In my current project, we have about 0.1% of flaky tests. Quite unbelievable, right? But it’s not for free: we wasted a lot of hours, months and even years on investigating all these problems. And the resulting 0.1% is still bad: we get at least one red build every day. We are tired of it. That’s why I am sharing my experience hoping that somebody will teach me how to fix the last ones.

Typical causes of flaky tests

Ten little tests standin’ in a line,
One opened a browser and then there were nine.

 

For the beginning, let’s look at the most simple, but so common example.

Example 1: Sel Clásico

The following is a primitive Selenium test that opens a browser, loads Google page and tries to find a word there. The question is: what line of the following test can fail?

driver.navigate().to("https://www.google.com/");
driver.findElement(By.name("q")).sendKeys("selenide");
driver.findElement(By.name("btnK")).click();
assertEquals(10, 
       driver.findElements(By.cssSelector("#ires .g")).size());

The answer is:

ANY!

Absolutely, any of these lines can fail at any moment. Here are just a few possible reasons:

WebDriverException:
Element <input value="Google Search" aria-label="Google Search" name="btnK" type="submit" jsaction="sf.chk"> is not clickable at point (448, 411). Other element would receive the click: ...

What happened, you ask?

When the 2nd line started typing “sele”, Google found some results and opened a small “popup” for previewing those first results. And this popup happened to cover the “btnK” button. What is interesting, this test doesn’t fail every time and in every browser. For me, it fails only once in every ~5 runs, and only in one browser.

So, from my experience, TOP causes of flaky tests are:

The good news is there is a cure that automatically resolves 90% of flaky tests:

Selenide

Selenide is an open-source framework for writing automated tests, which solves most of those annoying issues with ajax requests, timeouts etc. Let’s rewrite the previous tests to Selenide:

@Test
public void userCanSearchGoogle() {
 open(“https://google.com”);
 $(By.name(“q”).setValue(“selenide”);
 $(By.name("btnK")).click();
 $$("#ires .g").shouldHave(size(10));
}

This test will not fail anymore. The way how Selenide eliminates those problems is both simple and powerful. Every Selenide methods will retry if needed.

When you write $(By.name(“btnK”)).click(), Selenide tries to click, and if click failed, waits a little bit and tries again.

When you write $$(“#ires .g”).shouldHave(size(10)), Selenide checks if this list already contains 10 elements, and if not yet, it waits and checks again. And again, and again (by default up to 4 seconds, and this timeout is configurable).

It works pretty well and solves most of the timing issues.

But let’s now look at the remaining 10% of flaky tests.

Example 2: nbob

Nine little tests swingin’ on a Jenkins,
One clicked the wrong button and then there were eight.

Once upon a time, a came to a project which had a strange flaky test. Nobody could understand how it happened. This test wanted to log in as user “bob” by clicking the letters “b”, “o”, “b” on the following screen:

Most of the times this test worked but sometimes failed. On the screeshot of the failed test, we see a word “nbob” (instead of “bob”) in the login field.

I look over the entire codebase – we have no “nbob” in the code. Nor in the database, excel spreadsheets etc. Nowhere.

How is it possible?

Fortunately, I could find the answer within few days.

This is the test code (simplified):

@Test 
public void loginKiosk() {
 open(“http://localhost:9000/kiosk”);
 $(“body”).click();
 $(By.name(“username”)).sendKeys(“bob”);
 $(“#login”).click();
}

Here I used a technique named “binary search”. You comment out half of a code and see if the problem happens. If yes, comment half of the remaining code, and so on. In case of flaky tests, you probably need to run the test multiple times, until you get a feeling that “it doesn’t break anymore”.

As a result,  I found that this concrete line is guilty:

$(“body”).click();

After executing this line, sometimes a letter “n” appeared in the login form. And the following line just appends the “bob”.

Wow! That explains why we don’t have “nbob” anywhere in the code.

But why clicking “body” causes appearing a letter “n”? You can probably guess it. In Selenium webdriver, “click” works by the following algorithm:

  1. Calculate coordinates of the center of given element
  2. Click this point by coordinates

(The fact that “click” operation is not atomic, causes many problems, especially when the element is moving or resizing).

As you probably guessed, exactly at the center of the <body>, there is the letter “n”:

You can ask, why this test didn’t fail always?

I guess it’s because the size of the screen could be different. We always opened a browser for maximum screen size, and it could be different. On other screens, the central point could be in the empty space.

You can ask, why we need to click <body>?

I guess it was done just to move focus out of the previous field. There is a problem in Selenium: you can click a field, but cannot “unclick” it. There is no “unclick” or “blur” method. If you need the element to lose focus, there is no other way than just click some other element. In this case, we had no any other elements on the screen that would be safe to click without effect. Still, I consider this code as malignant: we figured out easily how to write the test without it.

So, let’s add to the TOP of causes:

To overcome those issues, I recommend to:

Example 3: back to the future

Eight little tests gayest under a cloud.
One went to sleep and then there were seven.

Once in an internet bank, we had a flaky test. It sometimes failed because the payment time was in future, though it was expected to be “now”. We scanned all the code – it was impossible. The only way to initialize payment time was the following:

payment.time = new Date();

It just cannot be in future. It’s impossible.

But it happened.

Fortunately, we found out the criminal. This time the investigation took a few months.

Intrigued? Read the answer in part 2 of this article which will be published tomorrow. We will continue with more examples and best practices.

Author: Andrei Solntsev

Software developer at Codeborne (Estonia).

Creator of selenide.org

Exit mobile version