how-to-measure-learnability-of-a-user-interface

by

Alita Joyce

on

October 20, 2019

Summary: To measure learnability, determine your metric, gather your data, and plot the averages on a line curve. Analyze the learning curve by looking at its slope and its plateau.


What Is Learnability?

Learnability is one of the five quality components of usability (the others being efficiency, memorability, errors, and satisfaction). Testing learnability is especially valuable for complex applications and systems that users access frequently, though knowing how quickly users can acclimate to your interface is valuable for even objectively simple systems.

Learnability considers how easy it is for users to accomplish a task the first time they encounter the interface and how many repetitions it takes for them to become efficient at that task.

In a learnability study, we want to produce a learning curve, which reveals longitudinal changes of a quantified aspect of human behavior. With the data from the learning curve, we can identify how long it takes users to reach saturation — a plateau in our charted data which tells us that users have learned the interface as much as possible.

For example, let’s say we are redesigning an enterprise file-backup application intended to be run by IT administrators on a regular basis. We assume users will use the application frequently enough that they will progress up that learning curve. For such an application, it is crucial that users be able to complete their work as fast as possible. In this scenario, a learnability study will determine how fast administrators learn to run a backup efficiently. We recruit several representative users and invite them to the lab. Then we ask them to perform the backup and measure how long they take to do so for the first time. Next, we ask them to come back into the lab and do the task for a second time — again, measuring their task-completion time. This process repeats for several more times. The result of our study will be a learning curve which plots the task time over a set number of trials. 

Learning curve with average time on task decreasing across six trials and a saturation point reached at the fourth trial.
This learning curve shows the hypothetical completion time for a backup as a function of the number of task repetitions (or trials). Notice that the time for the first repetition is longest, and then the completion time decreases — by trial 4, it levels off, reaching the saturation plateau. Although details such as how many repetitions are needed to reach saturation will vary from case to case, this learning curve is representative of all human learning.

Learnability vs Efficiency

There are 3 different aspects of learnability, each of which is important to different kinds of users:

  • First-use learnability: How easy is it to use the design the first time you try? This aspect of learnability is of interest to those users who will only perform the task once. These users won’t progress up the learning curve, so they don’t care how it looks.
  • Steepness of the learning curve: How quickly do people get better with repeated use of the design? This facet of learnability is particularly important for users who will use the design multiple times, even though they won’t use it excessively. If people feel that they are progressing and getting better and better at using your system, they’ll be motivated to stick with it. (And conversely, if people feel that it’s hardly getting better, no matter how hard they try, they’ll start looking for a better solution.)
  • Efficiency of the ultimate plateau: How high is the productivity that users can reach with this interface, once they have fully learned how to use it? This aspect is particularly important for people with a frequent and long-lasting need to use the system — for example, when it’s the main tool for important everyday tasks.

Ideally, of course, your system should fare well on all 3 aspects. But, in the real world, design tradeoffs are often necessary, and you should shape the learning curve to cater mostly to those users who have the highest business value.

The relative importance of these dimensions also depends on the stages in the users’ lives. New users want to be able to learn the system quickly and get to the point of optimal (plateau) performance as soon as possible, but expert users want the plateau to be as low (i.e., the optimal task time as short) as possible.

Sometimes these different attributes of learnability may pull the design in different directions. For instance, a learnable system is not always efficient. Coming back to our  example, let’s assume that the backup was performed through a step-by-step wizard workflow with a lot of instructions and explanations. This system may be highly learnable: users may be able to perform the task as fast as possible even as they complete it for the first time. But the curve would be pretty much flat: they would not be able to perform it much faster the second time, as they would need to go through the same screens and answer the same questions.   As users become well-versed in the interface, this design will feel like hand-holding and will be inefficient for repeated use. (It is for this reason that we recommend implementing accelerators, or process shortcuts, for expert users.) Designers must carefully balance learnability and efficiency.

Learning curve with average time on task consistent across all six trials.
This learning curve shows the hypothetical completion time for a backup application with a wizard flow as a function of the number of task repetitions (or trials). Notice that, despite the increase in trials, the task time stays steady around 16 minutes. This system is learnable but not efficient.

Why Measure Learnability?

High learnability contributes to usability. It results in quick system onboarding which translates to low training costs. Additionally, good learnability can result in high satisfaction because users will feel confident in their abilities.

If your system and corresponding tasks are complex and ones that users access frequently, your product may be a good case for a learnability study. Learnability studies are time and budget consuming, so don’t pitch them haphazardly to stakeholders. It wouldn’t make sense to measure learnability for tasks which users complete infrequently or one time (for example, signing up for a service or filing annual taxes) because users will most likely behave like new users each time they encounter the task. In these cases, a standard usability test would be better suited and more cost-effective than a learnability study.

Running a Learnability Study

In learnability studies, we’re focused on gathering metrics, which is why we turn to quantitative research methods. This sort of study requires focused tasks and controlled experiments, and therefore quantitative usability testing is best suited for studying system learnability.

Participants

In running this type of study, we’re trying to determine how easily people learn our interfaces. Therefore, it is important to gather participants with little to no experience using the system that they’ll be testing.

One consideration when it comes to testing learnability is prior experience with similar systems. Prior experience may help users (for example, because they may be already familiar with domain conventions) or may slow them down (for example, because they may suffer from change aversion).  However, this data is still valuable, especially when launching a new product with the goal to steal customers away from existing products. When applicable, recruit participants with no similar system experience and participants with some similar-system experience, and plan to compare corresponding data from both groups.

As for any quantitative study, we recommend that you recruit a fairly large number of participants (usually at least 30–40). The exact number will depend on the complexity of your task, with highly complex tasks requiring more participants to account for the inherently higher data variability, and simpler tasks requiring fewer participants.

Step 1: Determine the Metric

Time on task is the most commonly collected metric for learnability studies. The reason is the power law of learning, which says that the time it takes to complete a task decreases with the number of repetitions of that task. The rest of this article will assume you’re collecting time on task as the primary metric.

Depending on your system, time on task might not be relevant and therefore you need a different metric. In these situations, consider collecting the number of errors users make for a given task.

Step 2: Determine the Number of Trials

The next step consists of deciding how often to collect these metrics — each instance of data collection is known as a trial.

Remember, we’re trying to plot this metric over time, so we need to have the same participants complete the same task multiple times. We recommend you repeat the trials until a plateau is reached. A flattened curve indicates our participants have learned the system (specific to this task) as much as possible.

When considering trials, there are two questions you may be asking: how many trials should I run? And how far apart should the trials be? The answer to both of these questions depend on your circumstances.

To predict the number of trials needed for a user to reach a point of saturated learning, consider your system complexity. As a starting point, consider 5–10 trials but when in doubt, plan for more trials than you think you need, for two reasons: (1) you want to be sure that you’ve reached stable performance and (2)  once you’ve reached a point of stable performance, it’s generally easier to cancel usability sessions than to schedule more.

If you’re wondering how much time is necessary between trials, consider how often you anticipate your customers to use the product and match that interval as closely as possible. For a task that users perform daily or a few times a week, you can have trials on consecutive days. But for tasks done once a month you may want to leave 4 weeks between trials.  

Step 3: Gather and Plot the Data

Remember to recruit the same participants for each trial and have them complete the same task(s) in each trial. (This is different than the normal case, where you want different test users to study different iterations of a design.) You may want to run a learnability study and test with multiple tasks. If this is the case, be sure to randomize your tasks to avoid biasing your results. In research studies, users take what they know from one task and apply it to future tasks; task randomization helps to mitigate this effect.

For each task, calculate the metric averages for each trial and plot them on a line graph with labeled axes. By plotting the data for each trial, you will obtain the learning curve for that task.

Step 4: Analyze the curve

As with any quantitative study, you will want to analyze the data for statistical significance. In other words, you will have to investigate whether the trial effect was indeed significant — namely, whether the drop that you see in your learning curve is real or is just the result of noise in the data. Usually, the statistical method involved will be fairly simple — a one-way repeated-measures ANOVA with trial as the factor.

Once you’ve done your analysis (and presumably found that the trial effect was significant), consider the big picture: What is the slope of your learning curve? Less-learnable interfaces have relatively small drops in the curve and take many trials to reach a point of saturation. Alternatively, highly learnable systems have curves that are steep and drop quickly and reach the saturation point after fewer repetitions.

For example, in our original file-backup example, it took users 4 trials to reach the saturation plateau and become efficient. That may seem acceptable. On the other hand, if it took them 30 trials to reach that same point, the learnability will likely be too low.

Also, consider the final efficiency: is it acceptable that, once users have learned how to perform the task, it will take them 10 minutes? The answer may depend on what that number is for competitor products. If a competitive analysis isn’t viable, you can also compare the findings to costs and ROI. If an administrator spends 10 minutes a day to complete a backup task in an optimal way and performs the task daily for a year, this amounts to 3650 minutes or approximately 60 hours. At a cost of $100 per hour, it means that the company will spend $6000 for completing the backups. Whether that amount is acceptable or may need to be lowered (by improving the design) will depend on the specifics of each product.

Conclusion

The learnability of a product tells us how fast users reach optimal behavior with that product. It is important to measure learnability for UIs that get used relatively frequently. A learnability study involves repeated measurements of the same participants completing the same task. The result of a learnability study is a learning curve that will uncover how many repetitions are needed in order for users to complete the task efficiently.

Even if you don’t conduct a complete learnability research project to plot the full learning curve, thinking about these concepts will help you make the trade-off decisions to design products that target your most important customers.

For more on design tradeoffs, like learnability versus efficiency, check out our course, Design Tradeoffs and UX Decision Frameworks.

References

Tom Tullis, Bill Albert (2013) Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics. Morgan Kaufmann.

Allen Newell, Paul Rosenbloom (1980). Mechanisms of skill acquisition and the law of practiceTechnical Report. School of Computer Science, Carnegie Mellon University.



measure,-monitor,-repeat:-web-performance-tools

At Viget, we’re dedicated to building sites that load fast and stay fast. This begins by following performance best practices on day one, but that’s only half the challenge — how do you identify and fix performance issues that arise as a site grows? My last article on the topic was a company-level overview: Measurement, best practices, and how to target high-value areas of your site for improvement.

This post is a deeper look at one of those topics: How to measure and monitor site performance. There are numerous techniques and tools to explore, but it’s not always clear which one is the best for your problem — I’ll go over a few options and help narrow down the choices. But first…


What are we measuring?

Performance measurement begins by tracking readily-available, universal values that matter on any site. The deeper into measurement you go, the more custom this tracking becomes:

  • How much data do users load on the first visit? (Easy, tracked in nearly any tool)
  • How long do users wait until key content is interactive?
  • How much faster is a user’s second visit?
  • How long do users wait for key API requests?
  • How quickly can users complete a key flow? (Hard, requires custom measurement)

You’re looking for a tool with a mix of characteristics: Good insights, strong customizability, options for deeper integrations down the road. The three big ones:

Lighthouse is Google’s automated tool for evaluating web performance. It’s rock-solid, and provides both immediate value and opportunities for long-term improvements. If you’re just starting your measurement journey, start with Lighthouse. A few ways to use it:

1. On the web via PageSpeed Insights

A hosted version of Lighthouse that runs one test on mobile, one test on desktop. This method is great for testing marketing pages, and since it tests from a central location, you may get more consistent/realistic results than you would running a Lighthouse test locally. The biggest drawback is that you can’t use it for any pages that aren’t publicly accessible.

2. In the Chrome developer console

Still really accessible, but runs inside your console, so you can use it for any site accessible via Chrome. (Note that this will use your local network — work with the throttling feature to get consistent results when testing on different networks/devices).

3. On the command line

Using the node CLI, Lighthouse can be run anywhere and export data as a JSON file. The node CLI can run headless Chrome, enforce performance budgets, and even test on a USB-connected Android device. Learn more on GitHub.

Using Lighthouse this way opens a few doors: You can write scripts that test performance on an entire flow, incorporate Lighthouse checks into your CI, or even roll your own monitoring service (more on this later).

Pros:

  • Fast, accessible, and lightweight.
  • Nicely summarizes data, and does a good job of prioritizing issues.
  • Goes beyond basic load time stats you’ll see elsewhere, exposing derived stats like Speed Index and Time to Interactive.
  • Educates developers about some of the complex aspects of web performance.

Cons:

  • Missing some high-powered features from other services: Built-in scripting, multi-page crawling, asset blocking.
  • Obscures some deeper information (like waterfall views) in favor of a user-friendly UI.
  • Requires choosing to either use your own local network (which may be inconsistent) or sacrificing some power to test from a central server.

More flexibility: Sitespeed.io

Sitespeed.io is an open source suite of performance measurement tools that run directly from the command line. While it’s designed for monitoring, it’s also a great tool for generating detailed performance reports. Sitespeed’s docs are A , I suggest skimming them even if you end up picking another solution.

Try a quick demo with npm install -g sitespeed.io, then sitespeed.io https://amazon.com --html.showAllWaterfallSummary -n 1 --outputFolder results. Open the index.html file in ./results to check out the output.

Pros:

  • Fast, feature-rich, flexible.
  • Easy to configure features like screenshot recording and automated crawling.
  • HTML reports are great for sharing and reviewing performance data.
  • Built to work with modern tools like Slack, Grafana, and even other performance tools like Lighthouse and WebPageTest.
  • Solid, well-documented features like scripting.
  • Built with monitoring in mind, and when you get to that step, has a simple setup for hosting your own dashboard (example).

Cons:

  • With no web-hosted option, Sitespeed isn’t as simple to just run a quick tests with — using it requires some technical knowledge and setup.
  • Lighthouse and WebPageTest can test from a central, consistent network. Sitespeed can’t do this right out of the box, but has some great docs on mitigating this issue with throttling.

If you’re looking for the gold standard in performance measurement, check out WebPageTest. It may not have the slickest UI, but makes up for it:

Pros:

  • Easily runs multiple tests and averages results; can test “repeat views” to verify caching.
  • Provides a huge amount of data: Main thread breakdowns, request waterfalls, and per-request data, Last Painted Hero.
  • Automatically stores test runs at URLs for later review.
  • Huge number of locations to test from, ensuring consistent results between tests.
  • Can test multiple browsers, including IE11, Edge, and Firefox.
  • Server location, browser, asset blocking, and scripting are all configurable from the web app — no local installation required.
  • Provides the option to set up a private instance for more advanced monitoring.

Cons:

  • Typically slower than other web-based tools.
  • Test locations/devices require that you select the most realistic option for your test, and remember to use the same location on future tests.
  • Test output is much more technical, and possibly more confusing. You need to parse some of it yourself to identify meaningful metrics.
  • Much trickier to run tests against a page on a local network.
  • Big, long-running library with a lot of history and quirks to learn.

WebPageTest is my go-to “what’s up with this page?” tool — it’s a great mix of configurability, power, and convenience. Being able to script, test across browsers, and block assets is invaluable for hunting down tricky performance problems.


Monitoring

Now that you’ve measured your performance with a few tests, it’s time to monitor that performance. There are two types of data to think about collecting:

Lab data

So far, I’ve been talking about “lab data” — data you collect in a controlled environment, the same way, every time. This means regularly running measurements on your site and recording the output. For granular, day-by-day data, you’ll want to run measurements on some kind of hosted service, and record it somewhere you can visualize it.

All three tools mentioned above can be configured for monitoring (sitespeed.io especially), but come with the overhead of more work/more services/more costs to worry about. Some hosted services, like Calibre, SpeedCurve, and Pingdom, can provide monitoring with less setup — but the data they collect may not fit your exact needs.

If you’re just starting out, automation can be overkill — if weekly data collection is good enough for you, you can write a basic local script that dumps data into a spreadsheet or visualizer. This will give you insight into performance over time with much lower setup overhead.

Field data

AKA “real user monitoring” (or RUM), field data comes from the actual users on your website (Google has a good doc on the distinction). This data is a double-edged sword — it can expose major holes in your performance, but can also create noise and distraction for a team.

Example: 

If a roomful of conference attendees hears about your product and opens the site on the slow wifi, your data may show a scary-looking performance spike, even though a) your site is no slower than usual, and b) the exposure you’re getting is a good thing, not a crisis. Remember to view field data in aggregate, not little slices.

For typical websites, baseline RUM may be already enabled in your tools — GA automatically collects this data on the Behavior > Site Speed screen, and New Relic records it in the Browser section (Note: For GA, you may want to increase the 1% sample rate). You can also look for field data in the Chrome UX Report, Google’s public dataset of performance timing across Chrome users.

If your site has more SPA-like features, auto-collected field data won’t cut it — users load data and perceive speed differently in an SPA, so initial load time can be much less important to your business. To tune your monitoring, you’ll need to track user timing on key interactions.

Example:

If you have a key graph on your dashboard, you might want to know how long it takes most users to actually see the data. Use performance.mark to mark the first dashboard view, then performance.measure when the chart finishes rendering. Once you have timing details, save them somewhere (example docs from GA).

A measurement like the one in my example may seem silly — can’t you just monitor the API response time for the graph? — but exposes other issues that may be affecting performance. Maybe the API is fast, but other work on the screen causes a 2-second delay before the user sees the data. Maybe the graph library is simply too slow for your needs, requiring a replacement. These timings can be difficult to get right, and costly to debug after a release, so start small by picking key interactions to instrument.

Go forth and measure!

Now that you have the tools, you’re ready to start measuring performance in development, monitoring it regularly, and tracking real user metrics for further insight. For a higher-level look at retooling performance for your web app, check out our post on the process.

If you’re looking for a partner to help speed up your site, let us know!. Viget’s been building fast websites for almost 20 years, and we’d love to answer your performance questions.