Is your power meter really telling you what you think it is?
Scientific study reveals significant discrepancies between data output and actual power - but what does it all actually mean?
by James Spragg
The use of power meters, both in the pro peloton and club run, has risen significantly in recent years. Nowadays some riders even have multiple power meters on different bikes, something which has led to the need to compare data from one power meter to another.
However, the accuracy of comparing power meter data has been called into question by a recent study (you can find the original study here) in which 54 different power meters were tested. The aim of the study was to see just how good modern day power meters really are, and whether they are giving us power figures that are both accurate and precise (more on that in a second).
The researchers cleverly designed their testing so they were able to calculate the true average power that a rider was producing and then compare that true power value with the value recorded on various power meters.
How accurate is your power meter?
To do this, riders were placed on a treadmill set to a one per cent incline - but they were facing downhill (see figure one). A cord was attached to the seatpost, which in turn was linked to a weight to hold the rider in the same position on the treadmill.
A second weight was added to the cord which, if the rider didn’t pedal, would pull them backwards up the slope. Because the mass of the additional weight was known, the researchers could then calculate power output, while also accounting for the loss of power through the drivetrain.
This setup allowed the researchers to compare the average power as recorded on the various power meters to the true value. Each power meter was tested a number of times with a number of different riders.
Accuracy and precision
As I mentioned at the top, throughout this article we are going to looks at the accuracy and precision of power meters. Therefore, before we get too much further, it’s a good idea to start with a quick explanation of these two concepts
The dartboard below can best illustrate accuracy:
All the darts are close to the centre but they are scattered around the bullseye. The darts are accurate in that none of them are far away from the bullseye, but they aren’t very precise as the darts are scattered.
Accuracy is important when it comes to power meters as it allows one person’s power data to be compared to another. It also allows those who are lucky enough to have power meters on multiple bikes to compare their output on one bike to the other.
Precision is best visualised by this second dartboard. All of the darts are very close to one another (precise location) but they are no-where near the bullseye (not very accurate).
Precision is important as it allows data from one day or one effort to be compared to another. Precision is arguably more important when it comes to power meters as without it they are just generating random numbers.
Finally, if we combine both accuracy and precision we get this:
All of the darts are close to one another and in the bullseye. This is the ideal world when it comes to power meters as it means the data can be compared across multiple power meters on multiple days.
Average power comparison and accuracy
Normally when we compare power outputs, it is over a certain period of time - maybe the length of a climb, a TT or even an interval in training. This is exactly what happened in the research. The average power for a certain period of time was compared with the true power value
The accuracy of a power meter gives us an understanding of how close it is measuring to the true power value. In our dartboard example this is how close each dart is to the bullseye. In power terms, essentially we are asking if the 300w being measured on a power meter is the same as a true 300w.
We can see the results in the chart below.
What we need to look at in the chart is the mean deviation (%), which represents the average accuracy across all power meters from the specific brand. Essentially it is a comparison of the average position of the darts to the bullseye across all tests of all models of one brand. Let’s use SRM as an example, which shows values of -0.5% ±2.4%. This means across the test, SRM power meters overall under read by -0.5% – however, there was a range of accuracies; if we look at the average difference between this average and each individual SRM power meter (the standard deviation for those who remember GCSE maths!), we get a figure of ±2.4%.
We’re quite used to seeing power meters manufacturers stating their claimed accuracy – for example, Stages power meters are advertised as ±2%. However, what we can see from the study is that most power meters actually fell outside of their advertised level of accuracy in terms of mean deviation, given here in brackets: Stages (±2%), Power2max (±2%), Polar (±2.5%), Verve (±0.37%) and Rotor (±1%). Only SRM (±1%), Powertap (±1.5%), Quarq (±1.5%) and Garmin (±2%) were within their stated level of accuracy.
I should note here that the Stages discrepancy could be explained by the fact that it only measures on one side. Therefore, if the riders who tested the Stages had a right leg bias this may explain why the Stages under reads slightly.
However, these results alone don’t quite tell us the full story.
When we look at the overall accuracy of all units of a single brand, we get a distorted figure. For example, if two units were tested and one read eight per cent too high and the other eight per cent too low then overall that brand would have a mean deviation of 0%. It would appear the brand in question has perfect accuracy but in reality each individual unit is terribly inaccurate.
For each brand of power meter in the test, apart from the Garmin, Polar and Rotor models, multiple units were tested multiple times, so this allows us to see how accurate each individual power meter is.
From the graph above we can see the accuracy (mean deviation %) of each individual power meter compared with the true power value.
The more grouped the dots, the less variation in accuracy between units. We can see that for the SRM power models that spread of data is +4 to -4 %. This means if you recorded the same session on two power meters, one may measure 8% higher than the other.
The worst performer in the test was the Stages power meter, where we can see an 11% spread in power figures, so if you recorded the same 300w interval on two Stages power meter you could record as high as 309w and the other as low as 273w - a huge variation in power terms.
The Powertap and the Verve power meters came out of this test well and so, based on this study alone, are a good bet if you want to have multiple power meters on a number of bikes. However, only three Verve power meters were tested, compared with 12 SRMs for example, so they may have just been particularly good examples.
Mean deviation also gives us a fresh perspective on the marketing claims of power meter brands. Let’s use SRM as an example (and that is only one example from a whole range of brands) and a claimed accuracy of ±1%. Most customers may think this means each and every power meter is ±1% accurate, but it is more likely this relates to the overall average.
Therefore, a customer may be buying a power meter thinking it is ±1% accurate, however in reality it is actually within a range of approximately ±4% based on the data from this research. However, if they brought ten power meters, the average accuracy across them all would be -0.5%.
Despite all that, the precision of a power meter is arguably of much greater importance than how accurate is it.
In this research, the coefficient of variance can be considered a good measure of precision. Basically what this is measuring in our dartboard example is how far the darts are from one another.
Precision essentially means that the numbers are reliable test after test. So the 300w your power meter is measuring one day is exactly the same as the 300w the next day. To be able to track your performance and train effectively, precision is more important than accuracy.
When considering the coefficient of variance numbers, you need to keep in mind that a lower number is better. For example, overall across the three Verve power meters tested the coefficient of variance was only 0.6%. On average, across those three Verve power meters, if you measure 300w on one day the most you could expect that to differ by on another day is 1.8w for the same power.
The only unit that comes out badly in terms of average precision is the Stages power meter, which had a coefficient of variance of two per cent. Here, 300w might be 300w on one day and 294w or 306w the next day.
Now just as with the average accuracy, the average precision doesn’t tell the entire story. We can also look at the precision of the individual power meters across multiple tests (figure three).
This graph shows some interesting results. To interpret the results, if the dots are closer to the x axis then the precision of that unit is good, so the higher the dot the poorer the precision.
A number of things jump out from this.
Firstly, the SRM, Powertap, Rotor, Quarq and Verve power meters all display good precision and, with each of these, the values are under 1.5% – on any given day 300w may range from 295.5 to 304.5w. That’s not bad at all and probably less than the daily variation in your form.
However, we should also look at where those dots fall. With SRM, Powertap and Verve the dots take a pyramid shape, so most power meters are in the 0-1% range rather than the 1-2% range. So, if you were to buy a power meter from these brands, you are more likely to get a power meter with a coeffecient of variance of less than one per cent, as opposed to one with between one and two per cent variance.
The worst performer on the test was the Stages power meter. One unit showed a coefficient of variance of 6% - that means your 300w could be as much as 282-318w, which is well outside of what is useful when it comes to trying to ride to power or monitor your performance. What is also strange about the Stages power meters is that while some units were very precise, others were very imprecise.
Let me put this into context. With the least accurate power meter with the worst precision, your true 300w may be being measured as 256-298w on any given day. On the other hand, with the most accurate and precise power meter a true 300w could be measured as 297-300w.
So far we have looked at the average power over a longer period of time and how accurate and precise that measurement is. However when we ride we don’t maintain a perfectly smooth power output. Instead, power output on the road is what we call ‘stochastic’ - it varies greatly from second to second.
To put this into context, a 300w average power output for 10 seconds may look something like this on a second-to-second basis:
300w, 305w, 295w, 298w, 307w, 295w, 302w, 300w, 294w, 304w
What we have to consider is that the relevant precision and accuracy of power meters is affecting the recorded power numbers on a measurement-by-measurement basis. So for the same power 300w power output for 10 seconds the recorded power might look something like this:
307w, 298w, 304w, 293w, 305w, 291w, 295w, 306w, 311w, 290w
Overall the average power for the 10 seconds is identical, but if we were looking for a peak one-second power within the effort we would get big differences between the true value and the measured value.
I have personally done some research on this to follow-up on the study. I had a coaching client ride on a laboratory quality stationary trainer, a Cyclus 2, which replaces the rear wheel much like a Wahoo Kickr or Tacx Neo smart trainer. The Cyclus 2 is a highly accurate piece of kit calibrated to laboratory standards for use in scientific research.
We recorded power output on both the laboratory trainer and the rider’s power meter for three and 12 minutes, before comparing both the differences in average power and second-by-second power output. The results showed that while the average power was relatively consistent - two different power meters consistently read 14w and 19w below the laboratory trainer - there were massive differences in the second to second power, with anything up to +60w to -48w.
This is really important when it comes to looking for max power, so the point to take away from this is that, in my experience as a coach, a power meter only becomes useful for periods of five seconds or more. Any shorter and there is too much susceptibility for bad data to creep in and affect the power scores.
What have we learnt from this?
The quick conclusion to draw is that your power meter might not be telling you exactly what you think it is. Indeed, the 300w you are seeing on your computer might not be quite the same as the 300w your ride partner is apparently producing. On top of this, with some power meters it appears that 300w today might not even be the same as 300w tomorrow.
In short, some power meters are more accurate and precise than others. When choosing a power meter, I would urge you to look at the data presented in this article about the accuracy and precision of various power meters, as well as considering other factors that may affect your purchasing decision, like budget, bike compatibility and aesthetics. At the end of the day your power meter is a training aid, however to be useful in that it needs to be providing you with, at the very least, precise data.
One final thing to mention is that all power meters in this study were calibrated via a zero offset before each test (and where possible), so the levels of accuracy and precision we see here represent the best performance you are going to get from a power meter. It is likely that once you add in real word conditions, the levels of error will increase.
Therefore, to maximise the accuracy and precision of any power meter, make sure you look after it. That means calibrating it where possible and making sure it is well charged before each ride. Another tip is to let it acclimitise to the outside temperature before starting your training. Finally, a great pro tip and something I get the riders I coach to do is to recalibrate your power meter before the first effort of the day. That means you know you are getting the most accurate data you can, when it really matters.