How Does TDD Really Work?
How many web tutorials have you watched where someone builds a simple todo list? How many simple todo lists have you been asked to build for money? Not many? No me neither.
Rather than use a contrived example app I’m writing a series of posts that follow my work on a real-world production app as I learn and use Test Driven Development practices.
I work alone on a ecommerce/internal administration application for a smallish coaching company that hosts around 1600 users.
I’m very aware we don’t always have the opportunity to work with other devs and for that reason I’m writing this series for other devs who might find it beneficial to see my own day-to-day work flow.
The stack
I’m a full stack NodeJS dev. I don’t think it’s super-important to know Node if you’re reading this, I’m not going to go too deeply into the syntax, I’m mainly going to focus on practices and workflow which could potentially apply to any stack.
My app’s tech stack is a mix of ExpressJS with pug templates and a modern RemixJS/React app. Remix and pugs are both server rendered so I’ve had to use end-to-end testing (or e2e testing) — more on this later.
The job
My typical process starts with picking up a ticket and writing a test that reflects the general outline of the ticket/job I’m on.
I have a bugfix today which says:
Test and fix scenarios when a split payment is selected after a default voucher payment is already set
I start by taking a test I already have for this area of the app, that way I can simply strip out the bits I don’t need and add in a few extra steps.
I write the new steps in a way that assumes the feature already exists. This is key to TDD. Tests are supposed to fail first sure, but how they’re suppose to fail is important.
Obviously it’s no good if the test fails because I’ve written it wrong. For example, the first time I write this test it does fail, but for the wrong reasons. I’m clicking an element that doesn’t exist yet; it will exist when the DOM’s finished loading but my test is trying to click it too early (a common issue with e2e tests).
It’s tricky to get a hang of this because sometimes you’re not sure if it’s the test that’s broken or the code.
My rule of thumb for these kinds of situations is to write dumb code and then make it smarter. If I don’t 100% understand some syntax but it would be cleaner to write I write it the dirty way that I’m sure works first. If I can get it working then I clean it up later on.
Right now, a ton of my tests have the infamous pyramid problem. I’m using Cypress which uses promise-like functions to target elements an make assertions on them. I say they’re promise-like because they have a then
function but you can’t async/await
them like you can a normal promise.
As a result a lot of my tests look like this…
describe('my messy tests', () => {
it('looks like pyramid time', () => {
cy.task('updateSomething').then(() => {
cy.task('updateSomethingElse').then(() => {
cy.task('updateSomethingElse').then(() => {
cy.task('updateSomethingElse').then(() => {
// Do my test...
});
});
});
});
});
});
Icky, I know. But as I said, let’s make dumb code that works first, then smarten it up later.
Targeting elements
It’s a trend at the moment to test from the user’s perspective, this means writing tests that target elements in the page that matter to the user, usually being visual text or from aria attributes.
If you’re familiar with React Testing Library, you’ll know what I mean by this. They have some great docs and tutorials that direct your testing in this way that I highly recommend for anyone interested:
you want your tests to avoid including implementation details of your components. You want your testbase to be maintainable in the long run so [that changes to your components] don’t break your tests
As I said I’m using Cypress, which makes it a little bit trickier to test in this way. I’ll go into a quick example.
In Cypress I might use the cy.contains
function to find an element by some text on the page…
it('finds the element', () => {
cy.contains('Hi there!').should('exist');
});
This way we’re not testing the way the code is written which allows developers to rewrite the code in any way they want — as long as it doesn’t break functionality. Tests like this care about features and not implementation.
So a test that targets an element by a specific class name is not so good because now, that class name can’t be changed or removed without updating the test. It’s testing implementation.
A test that targets some text that the user can see is testing functionality, that’s better, the text on the page is less likely to change and will stay the same if another developer decides to refactor my code.
A developer could show some text in a number of different ways.
As a variable…
const text = 'Hi there!';
return <div>{text}</div>
…or returned from a function…
const text = () => 'Hi there!';
return <div>{text()}</div>
By targeting the text “Hi there!” the test doesn’t care how the code is written, just that it works — by working I mean shows the text to the user. It gives developer’s the freedom to code a feature however they want, which is great. Tests aren’t supposed to be bossy.
Unfortunately it’s quite difficult to write tests this way using Cypress at the moment. I will often use cy.contains
which allows you to target an element by it’s contained text but it selects elements by the text written in the markup rather than how it appears to the user, which leads to some unexpected failures when, for example, you target something that’s rendered in caps but written in lowercase. The biggest problem with cy.contains
is that it can only target one element, if you have more than one element with the same text you can’t iterate through each, you only ever get the first one.
I haven’t really found a way around this so I often have to target class names or attributes to get to the element I want…no system is perfect.
The first test
My payment journey has a checkout page which allows users to pay using part-card and part-vouchers, which we call a split payment method.
My job is to disallow voucher payments if the user has an item in their basket that isn’t allowed to be paid for with vouchers.
I’ve written most of this feature but it has bugs. If a user clicks the terms and conditions checkbox, above the pay button, we’ll give them a message saying:
We are unable to accept voucher payments for these items
This text should hide when the user goes to choose a split payment. Currently it doesn’t so I write my test to tick this box, choose split payment then check to see if the above message is still visible.
Once the test is working, it fails…but for the right reasons. The message is still showing in the app because I haven’t fixed this bug yet.
Great! That’s the first step done.
The first fix
Now I go into the code where the message is to look at the conditions that control the visibility of the text. [A great tip one of my previous lead devs gave me here, if you want to know where some markup is in the code, rather than searching for the text, search for the class names on that element. Class names are often rendered in the same way they’re written whereas text can often be chopped up or stored in variables, which will take an extra step or two to find.]
I find the text in the code, and fix the conditions to control the visibility of the text. It’s simple fix, there’s one more variable I need to check for.
Now the testing process gives me a lot more confidence that there aren’t any bugs before I merge code into production but I actually find it can make me over-confident.
Since I’ve begun writing tests I physically test less and less. Why bother testing the app by going through it again and again when you can just write the test and have Cypress automate it?
This is a great argument, and I agree it’s better to write tests so they can run automatically, while you do something else, but manually testing is still important to do. I don’t know if any one else has this problem but I have to remind myself to go into the app and click around and explore the user journey. That way I find bugs accidentally I wouldn’t have thought of before-hand.
I’ve had more than one embarrassing demos to clients and bosses where I’ve found bugs in front of them because I’ve neglected manual testing. It’s not the end of the world, everyone knows this happens but I’d rather it happened to me less than it does.
Manual testing
One thing that’s awesome about e2e (end-to-end) testing with Cypress is that you can stop and pause your app in whatever state you want. So, after my test is passing, I can click the url in the automated browser and Cypress will open a new tab with the app in exactly the state it was in at the end of the test. That means the page, database, the cookie session and local storage of the test that was just running.
Love it. No more clicky-clicking around the app, logging in, finding the page, etc.
Also this means Chrome is sort of becoming more and more like a development environment, I even have browser plugins like Vimium and React Dev Tools which can work alongside the Cypress interface. Nice 👍
Test run
So I’ve fixed my bug and run the test to see it pass. Next I’ll run some more tests. Because this is a bugfix I’ve not made a new spec file to add tests to, instead I’ve updated an existing one with some other tests in it that relate to the same area. These are the tests I’m running next; if there’s any problems they’re most likely to be in or around the same area I’ve been updating.
Two more tests fail. Looks like I’ve made a regression. I find the tests that are breaking and stick an only
on the first one, making a note to look at the next one afterwards.
// cy.only causes only this test to run...
cy.only('my test description', () => {
// do stuff...
});
I don’t usually save snapshots for more than one test at a time because I find it slows my machine down too much. This means I have to run the test again to inspect it a bit.
Second fix
Looks like I went a bit too far with my code fix, it’s too strict, there’s some stuff I need to take out so we can keep some of the previous functionality.
The problem’s in a backend controller, so I go to the app and click the button that triggers it, then look at the response in the Network tab of my browser. The response is a generic one found all over the app: “Invalid payment method”. This is fairly helpful but could be better. I go into the controller it’s coming from and update all my error responses to be more unique. That way I know which part of the code it’s coming from and any future developer will get a better idea of the problem, which will be helpful particularly if they have no access to the backend of the app. Finally I update the code to make it more lenient. Both the tests that where broken now pass.
The more I program the more I realise that I’ve spent a lot of wasted time and effort re-writing other people’s code. Even if something isn’t written perfectly, if it’s working; it’s often better to leave it, if it’s not working; it’s often better to tweak it.
Second test run
I run the tests again and find another problem. This time it’s a flakey test. My machine is running pretty hot at this point and I’m finding some of the calls my tests are making take longer to resolve than usual. This is an annoying problem because it never happens when I run the tests in the pipeline.
I’m using Github actions, which are super-easy to set up and run on fairly powerful machines. My tests take about half an hour to run in total and I’m still on the free useage band, which is pretty impressive seeing as I run at least three 40 minute test runs per day.
Getting Cypress to wait for the right part of the page to load before moving on to the next assertion has been pretty challenging and is a constant issue when running end-to-end tests. I haven’t found a solution that works 100% of the time.
You’re best bet is don’t use end-to-end, use unit tests instead. That’s not really an option for me, although there is a part solution for RemixJS apps I’ll go into in another post.
For now, I add in a couple of waits, these are just calls that get the test to pause for an arbitrary amount of time. They’re a cheat but they work. I’m mainly concerned that a test passes when it should fail, if a test fails when it should pass that’s not so bad. The most that’ll happen is I’ll waste time looking into it.
My tests work. Now, for me, I’m tempted to just merge this work. I’ve got a load of passing tests, I’ve found the main bug and everything is looking good. But I’m going to have another manual run over the feature to see if I can break it this time.
The manual test
You know when something’s working well, it’s usually when you’ve done a load of test runs and read the code inside out. If I suspect in the back of my mind, it’s not 100% working, that often means it definitely isn’t.
I go through a few test scenarios and find another case where the feature breaks. As soon as I find it, I write another test.
Tests become like a todo list when I write them this way. If I write a breaking test just before the weekend I find I can pick up my work again on Monday much faster than I would have before. The test is just there waiting for me, it tells me what I need to do, puts my app in the right state to fix the problem and, if the pipeline is set up, blocks me from accidentally merging into production.
Again, I go into my test spec file and copy a similar test to adapt it again. I find myself doing this a lot. I don’t often refine the code for my tests, once they’re in they’re in. I sometimes have to make them less flakey and prone to erroring out or I add to my growing list of database population tasks, but a working test is another feature that’ll never break again — or at least, not without me knowing about it.