lessons-learned-from-trying-to-build-a-reliable-rating-system

The quest for developing a reliable rating system - lessons learned from monitoring our suppliers quality

Trust plays a critical role when purchasing products online, but trusting the brand is not enough when booking a vacation or even ordering a massage at home, as the supplier in these cases is not the brand itself.

93% of consumers use online reviews to support their purchasing decisions; they seek for reassurance, a social proof that the product or service they plan to order is indeed the quality they expect it to be.

Rating and review systems are expected to be accurate because they are based on high volumes, however, as we learned over the years of operating our marketplace for beauty and lifestyle services – this is not always the case.

The problem

When we started Missbeez, we knew we were dealing with a sensitive business.


Haircuts, makeups, massages, even nail treatments – those are all personal treatments, and therefore our customers were concerned about who’s coming over, their work quality and their experience.

The more expensive or personal a product is – the more sensitive users become to the quality question.

To address these concerns, we designed a great business card for every service provider that included a personal photo, a few portfolio photos, details about the years of experience, expertise, ratings, reviews and more.


The app provided all of this information and in addition, encouraged customers to rate their experience and leave a review at the end of the treatment.

The quest for developing a reliable rating system - the mobile spoon

Take 01: The Ratings are too high 

Over time we started seeing anomalies between the ratings and the actual quality of our service providers.


We knew our service providers were very professional because we had a very strict onboarding process, but the scores were just overwhelmingly high (like 4.99 out of 5) and we barely had negative reviews or low ratings.


We did, however, receive enough complaints to conclude that our rating system doesn’t highlight the underperforming service providers. In a few extreme cases, we even saw customers who rated the service as 5 stars and then black-listed the service providers…

There was no correlation between underperforming service providers and their ratings or reviews.

Here’s what we found:

  • 41% of our customers rated their treatments (For Uber it’s ±70% of users, in Fiverr it’s ±65%, based on forums).
  • Only 15% wrote a review (Amazon sellers mention 10% as their benchmark). 
  • 97% of the ratings were 5 stars (the highest possible).
  • Only 2% of the ratings were negative (below 3 stars)
  • That means that 1% of all users reported: “bad service” 

Now, you may look at those figures and think: “what a perfect business they’re in!” but going back to the sensitivity level of our services, compiled with the number of complaints we got through our intercom live chat, our retention rates,  and our other KPIs – we just knew those numbers couldn’t represent the real quality of the supply side.

Our rating system was not reliable.


Take 02: Understanding why this is happening

Through customer interviews and frequent product iterations we came up with 3 main reasons:

1. The majority of customers is silent. These users mean the the world for the business, but they never communicate with it: they don’t leave reviews, they don’t provide feedback, and instead – they just act. If they are unhappy with the product they’ll just stop using it without saying anything. Poof… gone.

2. Complaining about someone publicly is usually an unpleasant act. In our case, it’s even harder: many of our service providers belong to a relatively low socioeconomic status: they work hard, travel from one customer to another carrying around their equipment, working hard to grow their business. Now you show me a customer who feels comfortable leaving a bad review for a beautician that just spent 2 hours working on her nails, just because her Pedicure wasn’t good enough.

3. It’s a question of timing. In our case, many of the problems were discovered a few days after the treatment took place (i.e. the manicure chipped or the hair got all curly again). By that time, it was much easier for the users to launch the app and contact us through the Intercom chat, than to navigate to their historical orders, enter the latest one, click the review button and leave a bad review. The chat option was not only more accessible, but it usually generated an immediate response from our customer success team.

We realized that due to the nature of our business (and customers), the positive feedback is through the rating system, while the negative feedback is received as complaints sent directly to our customer success team, indirectly misleading our rating system.

Check out the best of the mobile spoon


Take 03: Tackling this problem

Users seek transparency and our users were no different. If we wanted our rating system to be trusted by our customers, we had to make it more reliable.


We wanted to collect more reviews (and ratings) and more importantly: encourage customers to criticize us by writing negative reviews.

We added 2 small features/modifications to the product:

1. A prompt to rate the latest treatment no matter when users returned to the app: Since users didn’t always pay attention to the rating option, we made sure they’ll see it as an annoying pop up whenever they returned to the app – even if it happened 10 days after the treatment took place.


By doing so we wanted to increase the chances that users will rate their service, and if enough time has passed – then maybe they will feel more comfortable to criticize it.


This change generated some minor improvements:

  • 53% of customers rated their treatments (up from 41%).
  • 92% of the ratings were 5 stars (down from 97%).
  • 19% of customers wrote a review (up from 15%). 
  • Still, only 2% of reviews were negative. 

2. A single-click satisfaction survey: We added a single-click survey that was sent to customers 24 hours after the treatment ended. The survey asked something like: “how do you feel about your treatment?” and had 3 answers designed as smileys: a happy face, an indifferent face, and an upset one (that were later replaced by a different design).

The quest for developing a reliable rating system - the mobile spoon

The idea was to shift the focus away from the service providers and focus on the overall experience, hoping that the users will feel more comfortable to criticize “us” (the big bad company) vs. shaming the poor service provider.


Whenever a bad satisfaction experience was reported – a support ticket was created automatically so that our customer success team could contact the customer, understand what went wrong, and handle the case.


We tried a few designs and medias for this single-click survey (email, SMS, push, different text, different images) but as we learned from a similar use case by Fiverr – the timing had a big impact as well: sending the survey few hours after the transaction vs. few days can dramatically increase conversion rates. In our case, we started with a small panel embedded in the payment confirmation email, and added an SMS that was sent 24 hours after the treatment ended to all the users who missed the first chance to vote.

The single-click satisfaction survey generated the following results:

  • 16% of customers rated their experience through the single-click survey.
  • 80% of them selected the highest score.
  • Only 5% of them selected the lowest score (i.e. “Bad”)
  • This is where it gets interesting: out of that 5% who were extremely unhappy with their experience – 96% didn’t rate the service providers at all. 

So the changes we’ve made didn’t make a dramatic impact, but they did prove that users who were extremely unsatisfied with the service expressed their “unhappiness” through the single-click survey instead of placing a bad review or a low rating for the service providers.

Subscribe to my newsletter and become 23% more awesome than average!


Take 04: Switching to quantitative data

At this stage, we still had 3 serious problems:

  1. Our rating system was still over-positive and perceived to be unreliable.
  2. The satisfaction survey was only partially efficient in tracking unhappy customers due to low engagement.
  3. Once again, there wasn’t always a correlation between customer complaints, ratings and survey satisfaction rates. 

We started working on 2 smart indexes designed to measure the quality of each treatment without relying on human feedback.

1. Customer satisfaction score 


How do you calculate the satisfaction level of a customer without asking her?


You analyze the behavior of this customer after each treatment, compare it with previous behavioral patterns, and estimate the satisfaction level according to the delta.


For us, we had 8 types of actions that customers could perform and could teach us about their satisfaction level (i.e. mark a service provider as a favorite, pre-book another treatment, give a nice tip, and more). We monitored those actions and created behavioral patterns in order to compare new instances with existing ones and identify exceptions.

The result? Given a set of actions performed by a given customer, we could estimate (with some heuristics of course) how satisfied this customer was with a given treatment.


Next, we took the score gaps (the exceptions, i.e. the deltas) and started aggregating them for each service provider.


So if, for example, certain service providers frequently generated lower satisfaction rates – their average satisfaction delta was negative, while service providers who drove high satisfaction rates – started showing positive deltas.

2. A personal drop rate (churn) index per each service provider 


I’ve written about this in here: minimizing leakage in a marketplace for offline services.


We originally invented the personal drop rate in order to track offline leakage but as we discovered later on, this magic number was also very useful as a quality indicator.


In a nutshell: this index calculates the churn rate every service provider drives by measuring the drop rate of repeat customers who stopped using the product after being served by this specific service provider.


If the average drop rate in the system 20% and a certain service provider has a 50% drop rate, it means that this individual causes a serious churn – which is very bad.


High drop rate could point to one of the following:

  1. This individual provides a bad service quality (= bad).
  2. This individual is taking customers offline (= bad bad bad).

Either way – this individual is damaging our business and should either improve or be removed from the system.

Boom! Now you’re talking!


With these 2 indexes in place, we could now see a correlation between our satisfaction surveys, the satisfaction score, and the drop rate. And the beauty of those indexes was that they were always “on” and didn’t rely on customer engagement or conversion rates. Muhahaha!

At this point, we felt we had enough quality indicators to monitor our service providers community. 

But all of this beauty still didn’t solve the original rating reliability problem…

Take 05: Sharing some sensitive numbers with our customers

So the indexes were great for internal quality monitoring, but we still needed a way to increase the reliability of our providers’ business cards and fix our over-positive rating system.


Instead of trying to lower the ratings, we decided to keep it as is and add additional information based on some internal, biased-free KPIs.

The goal was to promote the top performing service providers based on authentic data, without urging customers to criticize their service providers.

In a bold move (at least this is how we felt about it), we added the following KPIs:

1. Punctuality: we knew late arrivals frustrated our customers, so why not share this information with them in advance?


By tightening our service procedures and logging late arrivals – we were able to maintain a punctuality score for every service provider. We exposed this number to our customers and by doing so we achieved 2 things: 1 – we encouraged the service providers to try harder. 2 – we were transparent with our “time-sensitive” customers who were happy to know this information before selecting a service provider.

2. Popularity (favorites index): so customers didn’t feel comfortable placing bad reviews, but we had something more valuable than their words: their actions. We used our ‘add to favorites’ functionality to measure the number of customers who favorited each service provider and shared this number with our customers.


You could find service providers with an average score of 4.95 stars, but their favorites index was 10% while other service providers had only 4.83 stars but had a 70% favorites rate. The favorites index felt more authentic since it was based on customer behavior instead of votes or words.

3. Rank: remember the drop rate? This brilliant index was not something we could share with our customers, but we could definitely promote some service providers who had the lowest drop rate (and highest retention) by giving them a special rank.


You see, service providers with low drop rates are contributing to the retention of the marketplace – they are good for the business so why not promote them by giving them a special badge? We created 4 rank levels and shared those badges as part of the business card.

The quest for developing a reliable rating system - the mobile spoon

So there you have it.


A very long journey as you now realize.

I believe we are just scratching the surface with this one though, there’s still a lot to do in order to improve the reliability and transparency of our marketplace.

  • We didn’t solve the over-positive rating problem, I believe many products suffer from a similar offset but for us, it felt extreme. 
  • We found some creative ways to get some more feedback from our customers (through our single-click surveys).
  • We developed some sophisticated quality indexes to automatically measure satisfaction rates and drop rates of individuals on both sides of the marketplace. 
  • We exposed some biased-free numbers. This helped us establish trust with our customers and become more transparent about the pros and cons of each service provider. 

Finally, we were able to cross-reference the above list and monitor our supply quality from different angles, using various data points: quantitive data, qualitative data, customers feedback, behavioral patterns – each one of them might fall short in some cases, but when used together they are extremely powerful.

Following these enhancements, we’ve developed a quality dashboard that showed the above list in a few graphs so our operations team can keep track of the quality trends in general and on an individual level. It’s a very powerful tool and I truly recommend each an every product manager dealing with similar issues to think about how to tackle the quality challenges internally and externally.

Check out the best of the mobile spoon and make sure to subscribe to my occasional, practical, non-spammy newsletter and become 23% more awesome than average.

21 comments

Leave a Reply

Your email address will not be published.