March 31, 2022

Precision vs Recall Trade-off

Explaining using a real-world scenario

One of the key points that data scientists should ask end-users is: How much importance do they give to the accuracy of the prediction? For instance, when I work from home, I am not much worried about the afternoon snowfall prediction. Whereas when I go to work, that is very important to me.

A few years back installed my first doorbell camera, I was very excited to seeing motion alerts on my phone. Gradually the alerts started to annoy me. 

All I was interested was alert about human movements near my house. But I got alerts for even leaves swaying in the wind. Then I explored the camera motion settings and lowered the sensitivity. Though it stopped a lot of unnecessary alerts, it also blocked many real alerts from coming in especially after dusk or  at times with low visibility.

Then I decided on changing the setting twice a day, both morning and evening. The following images show how my doorbell camera responded to movements during daytime and night.

Image by author.

Image by author.

That gave me the level of alerts I wanted, but it became an hassle changing the settings twice a day. So finally, I settled with one permanent setting in the middle which was neither too good or bad. I now get more alerts than I want during day and miss some during night. The following image show how my doorbell camera responded to the current settings.

Image by author.

What I really wanted with my current setting was a good level of alert accuracy in the morning. I wanted the Doorbell to alert me only the human movements like delivery drivers coming in, postman etc. Whereas at night, I was willing to tolerate a low alert accuracy level to avoid missing alerts.

Analyzing the alert counts

In order to reach this middle setting, I actually had quantified the accuracy of the settings in my mind, even without realizing it. What I really did was adjusting the trade-off between Precision and Recall. 

Precision measured how successful I was in receiving human movement alerts for a setting, without receiving any non-human movements falsely as human movement alerts. Any non-human movements which were falsely alerted as human movement (False positive count) penalized the precision.

Mathematically:

Precision = True Positive count / (True positive count + False positive count)

Now take a look at the alerts from my old morning setting.

Image by author.

Based on the above formula for precision:

Precision = True Positive count / (True positive count + False positive count)

               = 2 / (2 + 0)

               = 2 / 2

               = 1.0 or 100%!!!!

This implied that I had a high accuracy, but in reality the Doorbell missed alerting 3 human movements.

That is where Recall comes into picture. It measures the true positive rate, which is way to determine if all the 5 human movements were alerted. Any missed human movement (False negative count) alerts were penalized.

Mathematically:

Recall = True Positive count / (True positive count + False negative count)

           = 2 / (2 + 3)

           = 2 / 5 

           = 0.4 or 40%     

Now, let's look at the evening setting.

Image by author.

Precision = True Positive count / (True positive count + False positive count)

               = 5 / (5 + 2)

               = 5 / 7

               = 0.71 or 71%

Recall = True Positive count / (True positive count + False negative count)

           = 5 / (5 + 0)

           = 5 / 5 

           = 1.0 or 100% !!!

Here, since all human-movements were alerted, Recall became 100%. But the Precision calculation was penalized for alerting non-human movements as human.

Finally, let's look at the current setting.

Image by author.

Precision = True Positive count / (True positive count + False positive count)

               = 4 / (4 + 1)

               = 4 / 5

               = 0.8 or 80%

Recall = True Positive count / (True positive count + False negative count)

           = 4 / (4 + 1)

           = 4 / 5 

           = 0.8 or 80% 

Here, I didn't even have a vague idea that the Precision and Recall will mathematically balance when I applied the final setting! But it happened. 

Analyzing the alert counts

A quick way to look at the counts is using a confusion matrix. Here the Precision and Recall are also shown along with the confusion matrix.  As you would notice, when Precision goes up Recall goes down, and vice-versa. Your desired value is based on what problem you are trying to solve, as explained with the morning, evening and current setting.  


Image by author.

F1 Score

In most modern cars, the dashboard shows you a miles per gallon (mpg) metric. You would see the mpg increasing upto a certain speed and then going down. Say after 55 miles speed, the mpg decreases. The downside is 55 miles speed could increase the time to your destination when you are allowed to go upto 75 miles on a highway.

I really wished there was a one single metric in the dashboard that could give me an optimum speed / mpg combo. Like 100 means I am at an optimum speed and mpg, whereas 35 means I am making a poor choice in terms of optimizing. 

Luckily in the case of accuracy calculation, there is metric called F1 score which combines both Precision and Recall. F1 score is the harmonic mean of Precision and Recall. In simpler terms, if any of the values are low, F1 score is low. It gives more weightage to low values as compare to a regular mean that gives equal weightage to both low and high values.

The equation for F1 is:

F1 score  =    2 / (1 / Precision  + 1 / Recall)

Computing the F1 score for the three scenarios:

Morning-

F1 score  =    2 / (1 / 1  + 1 / 0.4)
   
               =.   2 / (1 + 2.5)

               =    0.57

Evening-

F1 score  =    2 / (1 / 0.71  + 1 / 1)
   
               =.   2 / (1.4 + 1)

               =    0.83

Current-

F1 score  =    2 / (1 / 0.8  + 1 / 0.8)
   
               =.   2 / (1.25 + 1.25)

               =    0.8

These numbers shows you that your Evening and Current settings were more optimal compared to Morning. But between Evening and Current, which one to choose? That should make me think again, what problem am I trying to solve?