March 31, 2022

Precision vs Recall Trade-off

Explaining using a real-world scenario

One of the key points that data scientists should ask end-users is: How much importance do they give to the accuracy of the prediction? For instance, when I work from home, I am not much worried about the afternoon snowfall prediction. Whereas when I go to work, that is very important to me.

A few years back installed my first doorbell camera, I was very excited to seeing motion alerts on my phone. Gradually the alerts started to annoy me. 

All I was interested was alert about human movements near my house. But I got alerts for even leaves swaying in the wind. Then I explored the camera motion settings and lowered the sensitivity. Though it stopped a lot of unnecessary alerts, it also blocked many real alerts from coming in especially after dusk or  at times with low visibility.

Then I decided on changing the setting twice a day, both morning and evening. The following images show how my doorbell camera responded to movements during daytime and night.

Image by author.

Image by author.

That gave me the level of alerts I wanted, but it became an hassle changing the settings twice a day. So finally, I settled with one permanent setting in the middle which was neither too good or bad. I now get more alerts than I want during day and miss some during night. The following image show how my doorbell camera responded to the current settings.

Image by author.

What I really wanted with my current setting was a good level of alert accuracy in the morning. I wanted the Doorbell to alert me only the human movements like delivery drivers coming in, postman etc. Whereas at night, I was willing to tolerate a low alert accuracy level to avoid missing alerts.

Analyzing the alert counts

In order to reach this middle setting, I actually had quantified the accuracy of the settings in my mind, even without realizing it. What I really did was adjusting the trade-off between Precision and Recall. 

Precision measured how successful I was in receiving human movement alerts for a setting, without receiving any non-human movements falsely as human movement alerts. Any non-human movements which were falsely alerted as human movement (False positive count) penalized the precision.

Mathematically:

Precision = True Positive count / (True positive count + False positive count)

Now take a look at the alerts from my old morning setting.

Image by author.

Based on the above formula for precision:

Precision = True Positive count / (True positive count + False positive count)

               = 2 / (2 + 0)

               = 2 / 2

               = 1.0 or 100%!!!!

This implied that I had a high accuracy, but in reality the Doorbell missed alerting 3 human movements.

That is where Recall comes into picture. It measures the true positive rate, which is way to determine if all the 5 human movements were alerted. Any missed human movement (False negative count) alerts were penalized.

Mathematically:

Recall = True Positive count / (True positive count + False negative count)

           = 2 / (2 + 3)

           = 2 / 5 

           = 0.4 or 40%     

Now, let's look at the evening setting.

Image by author.

Precision = True Positive count / (True positive count + False positive count)

               = 5 / (5 + 2)

               = 5 / 7

               = 0.71 or 71%

Recall = True Positive count / (True positive count + False negative count)

           = 5 / (5 + 0)

           = 5 / 5 

           = 1.0 or 100% !!!

Here, since all human-movements were alerted, Recall became 100%. But the Precision calculation was penalized for alerting non-human movements as human.

Finally, let's look at the current setting.

Image by author.

Precision = True Positive count / (True positive count + False positive count)

               = 4 / (4 + 1)

               = 4 / 5

               = 0.8 or 80%

Recall = True Positive count / (True positive count + False negative count)

           = 4 / (4 + 1)

           = 4 / 5 

           = 0.8 or 80% 

Here, I didn't even have a vague idea that the Precision and Recall will mathematically balance when I applied the final setting! But it happened. 

Analyzing the alert counts

A quick way to look at the counts is using a confusion matrix. Here the Precision and Recall are also shown along with the confusion matrix.  As you would notice, when Precision goes up Recall goes down, and vice-versa. Your desired value is based on what problem you are trying to solve, as explained with the morning, evening and current setting.  


Image by author.

F1 Score

In most modern cars, the dashboard shows you a miles per gallon (mpg) metric. You would see the mpg increasing upto a certain speed and then going down. Say after 55 miles speed, the mpg decreases. The downside is 55 miles speed could increase the time to your destination when you are allowed to go upto 75 miles on a highway.

I really wished there was a one single metric in the dashboard that could give me an optimum speed / mpg combo. Like 100 means I am at an optimum speed and mpg, whereas 35 means I am making a poor choice in terms of optimizing. 

Luckily in the case of accuracy calculation, there is metric called F1 score which combines both Precision and Recall. F1 score is the harmonic mean of Precision and Recall. In simpler terms, if any of the values are low, F1 score is low. It gives more weightage to low values as compare to a regular mean that gives equal weightage to both low and high values.

The equation for F1 is:

F1 score  =    2 / (1 / Precision  + 1 / Recall)

Computing the F1 score for the three scenarios:

Morning-

F1 score  =    2 / (1 / 1  + 1 / 0.4)
   
               =.   2 / (1 + 2.5)

               =    0.57

Evening-

F1 score  =    2 / (1 / 0.71  + 1 / 1)
   
               =.   2 / (1.4 + 1)

               =    0.83

Current-

F1 score  =    2 / (1 / 0.8  + 1 / 0.8)
   
               =.   2 / (1.25 + 1.25)

               =    0.8

These numbers shows you that your Evening and Current settings were more optimal compared to Morning. But between Evening and Current, which one to choose? That should make me think again, what problem am I trying to solve?


     


February 14, 2021

Pivot view of key value table

 select

userid,

max(case when name = 'EMAIL' then val else 0 end) as email,

max(case when name = 'PHONE' then val else 0 end) as phone,

max(case when name = 'DOB' then val else 0 end) as dob

from key_value_table

group by userid

March 20, 2020

Quadruple Witching

Quadruple Witching refers to the 3rd Friday of every quarter i.e March, June, September and December.

This is the trading day when:

  • market index futures, 
  • market index options, 
  • stock options and 
  • stock futures
expire.

There'll be increased volatility on this day.

The last hour if these trading days (3-4pm EST) is referred to as Quadruple Witching Hour.

Ref: https://investinganswers.com/dictionary/q/quadruple-witching

February 12, 2020

Passwordless authentication with Ionic and Firebase


  • Create an ionic project
> ionic start myproject 
  • Give a package id in file >myproject/config.xml

<widget id="com.mycompany.tasks" version="0.0.8" xmlns="http://www.w3.org/ns/widgets" xmlns:cdv="http://cordova.apache.org/ns/1.0">

  • Add firebase credentials into >myproject/src/environments/envirornment.ts

export const environment = {
  production: false,
  firebase: {
    apiKey: "....",
    authDomain: "....",
    databaseURL: "....",
    projectId: "....",
    storageBucket: "....",
    messagingSenderId: "....",
    appId: "....",
    measurementId: "...."
  }
};

  • Add an App by going to Firebase console > Your project > Project settings


Download the GoogleService-Info.plist file produced in this 
  • Enable Passwordless authentication
  • Build your project
> ionic build
(This will load contents in www folder)

  • Setup Firebase hosting

> npm install -g firebase-tools
> firebase login
> firebase init hosting
(This this step provide www as the public folder. You need to be in your project base folder for this)'
> firebase deploy


Next goto Firebase>Develop>Hosting and note down the hosting domain
It will be something like yourproject.firebaseapp.com

  • Setup Firebase Dynamiclink domain. It's in the Grow section

It will be something like yourproject.page.link

  • Add Firebase Dynamiclink plugin
As a prep step, add yourproject.firebaseapp.com and yourproject.page.link to Authorized Domain in Firebase. Project setttings> Authentication > Sign-in method > Authorize Domain (scroll down)

> cordova plugin add cordova-plugin-firebase-dynamiclinks --variable APP_DOMAIN="yourproject.firebaseapp.com" --variable PAGE_LINK_DOMAIN="yourproject.page.link"

  • Install ionic native libraries
> npm install @ionic-native/firebase-dynamic-links

  • Build for iOS
> ionic cordova build ios

  • Preparing
 Copy the downloaded GoogleService-Info.plist to platforms/ios/resources/ios/GoogleService-Info.plist

Add following to config.xml, <platform name="ios"> section

<resource-file src="resources/ios/GoogleService-Info.plist" />
        <preference name="GoogleIOSClientId" value="client id in GoogleService-Info.plist" />
 </platform>

  • Create Login page

> ionic g page login

In login.page.html

...
<ion-content>
  <ion-item>
    <ion-label>Email</ion-label>
    <ion-input [(ngModel)]="email"></ion-input>
  </ion-item>
<ion-button (click)="sendEmail(email)">Send</ion-button>
</ion-content>

In login.page.ts

import { Component } from "@angular/core";
import { AngularFireAuth } from "@angular/fire/auth";
import { FirebaseDynamicLinks } from "@ionic-native/firebase-dynamic-links/ngx";
import { Platform } from "@ionic/angular";
import { Router } from '@angular/router';

@Component({
  selector: "app-login",
  templateUrl: "login.page.html",
})
export class LoginPage {

constructor(
    private firebaseDynamicLinks: FirebaseDynamicLinks,
    private firebaseAuth: AngularFireAuth,
    private platform: Platform,
    private router: Router
  ) {
    this.platform.ready().then(() => {
      this.firebaseDynamicLinks.onDynamicLink().subscribe(
        (resp: any) => {
          firebaseAuth.auth
            .signInWithEmailLink(this.email, resp["deepLink"])
            .then(() => {
              this.router.navigate(["/home"]);
            })
            .catch(err => {console.error(err););
            });
        },
        (err: any) => { console.error(err);}
      );
    });
  }
async sendEmailLink() {
    var actionCodeSettings = {
      // Redirect url
      url: "https://yourproject.firebaseapp.com/login",
      handleCodeInApp: true,
      iOS: {
        bundleId: "com.mycompany.tasks"
      }     
    };
try {
      await this.firebaseAuth.auth.sendSignInLinkToEmail(
        this.email,
        actionCodeSettings
      );
    } catch (err) {
      console.error(err);
    }
  }
}

  • Next
. Deploy the iOS app using Xcode
. In the Login page that you see in the app, enter your email and click Send
. You will then get an email from Firebase with this sort of content.

Hello,

We received a request to sign in to project-xxxxxxx using this email address. If you want to sign in with your youremail@someemail.com account, click this link:

Sign in to project-xxxxxx

If you did not request this link, you can safely ignore this email.

Thanks,

Your project-xxxxxxx team

Click on the Sign in link from your mobile, this should open the app again and home page will be loaded.







July 25, 2019

Teradata SQL Regular Expression to compare date format

Verify if date is in format dd-MMM-yyyy

select
 regexp_similar(a_date,'^(([0-9])|([0-2][0-9])|([3][0-1]))\-(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)\-\d{4}$') = 1
from table1

June 21, 2019

WITH clause in Teradata SQL

WITH CTE1 AS (select current_date as today)
select today
from CTE1;

Ref: http://dwgeek.com/teradata-with-clause-syntax-usage-and-examples.html/

April 17, 2019

Apple Qualcomm settlement

Interesting read: https://9to5mac.com/2019/04/17/apple-qualcomm-and-intel/

Graphical diagram explaining players involved and their roles.


Semantic Triple in RDF

Composed of:

Subject -> Predicate -> Object

A subject: which is a URI (e.g., a "web address") that represents something.
A predicate: which is another URI that represents a certain property of the subject.
An object: which can be a URI or a literal (a string) that is related to the subject

Example:
Bob knows John
http://example.name#BobSmith12 http://xmlns.com/foaf/0.1/knows http://example.name#JohnDoe34

Ref: https://data-gov.tw.rpi.edu/wiki/A_crash_course_in_SPARQL
https://en.wikipedia.org/wiki/Semantic_triple

John's age 70
http://example.name#JohnDoe34 http://xmlns.com/foaf/0.1/age 70

We can infer from the above 2 statements that Bob know somebody who is 70 years old


April 16, 2019

Apache Jena RDF API

Abstractions

Resource: representing an RDF resource (whether named with a URI or anonymous)
Literal: for data values (numbers, strings, dates, etc)
Statement: representing an RDF triple and
Model: representing the whole graph

Ref: https://jena.apache.org/about_jena/architecture.html

April 10, 2019

Spread


Offer/Ask price: $100.50
Bid price: $99.50
Spread = 100.50 - 99.50 = $1.00

FIX protocol and FIXadtl

Financial Information eXchange (FIX) is a messaging protocol used to exchange trade related messages between trading systems.

FIXatdl (FIX algorithm trading definition language) is a language for trading that relies on FIX protocol.

March 29, 2019

Basis Points

1 basis point = 1/100 of a percentage. Used mainly in Finance like changes in interest rate.
Example:
"The Federal Open Market Committee unanimously voted to increase the federal funds rate by 25 basis points to 1.75% to 2%" https://www.housingwire.com/articles/43672-fed-raises-rates-for-second-time-in-2018


Unicorn

With many unicorn IPOs getting listed in 2019, one question is "what is a unicorn?"

Wikipedia states:
Unicorn is a privately held company valued over $1 billion. The term was coined after the mythical animal due to the statistical rarity of such successful ventures.

NYTimes in an article on March 28, 2019 (day before the Lyft IPO) https://www.nytimes.com/2019/03/28/business/startups-ipo.html?action=click&module=Top%20Stories&pgtype=Homepage has an observation about why some startups with high valuations stay private for long.

"Some industry groups and investors who urge fewer regulations say the emphasis on the private markets is an outgrowth of the Sarbanes-Oxley Act, the federal law passed in 2002 that tightened accounting rules for public companies after the accounting scandals of the early 2000s."


March 25, 2019

Spider - SPDR

Stands for Standard & Poor’s Depository Receipt. It is a S&P 500 index based ETF maintained by State Street Global Advisors. Each share of the ETF contains a 10th of the S&P 500 index and traded at roughly a 10th of the dollar value of S&P 500.

Ref: investopedia

Leading and lagging economic indicators

Leading indicators 
1. Stock market 
2. Manufacturing activity 
3. Inventory levels
4. Retail sales
5. Building permits
6. Housing market
7. Level of new business startups

Lagging indicators 
1. Change in GDP
2. Income and Wages
3. Unemployed rates
4. Consumer Price Index for Inflation
5. Currency strength
6. Interest rates
7. Corporate profits 
8. Balance of trade - trade surplus level
9. Value of Commodity substitutes to US Dollar - against gold and silver

March 14, 2019

Evaluating bank stocks

Ref: Motley Fool youtube. https://www.youtube.com/watch?v=AfT5FaaqNxI&list=PLXIJDn8_-fyEsbjwBto2GzIS1JPq5I2iR&index=8

Objective of banks: Borrow money at a lower interest rate and lend at a higher rate, and make profit from the spread.

3 key metrics:

1. Annual Return of Equity for 10 years. Find the lowest number. If that number is negative, bank had big losses during the 2008 financials crisis. So avoid them. Return of Equity is available in SEC 10k or q filing.

Return of Equity = Annual Income / Shareholder Equity = Annual Income / (Total Assets - Total Liabilities)

Ideally Return of Equity should be greater than 10% and for good banks it should be 15%.

The Return of Equity may be very high some months during good economic conditions if bank underwrite bad loans, which will turn bad later. That's why we need to look many years.

2. Discipline of the bank measured by Efficiency Ratio.

Efficiency Ratio = Operating Expenses / Total Revenue

Typically this ratio will be between 50-60%. If the ratio is greater than 60%, it means the financial discipline of the bank is ideal.

3. Sales Profitability

Sales Profitability => Top-line Revenue or Gross sales / Asset > 4.5%





Shareholder's Equity

Suppose if a company is liquidated today i.e. all assets sold off and liabilities paid off, then the amount of money left for the shareholders is the Shareholder's Equity.

Shareholder's Equity = Total Assets - Total Liabilities

Ref: investopedia

Where to find SEC filings?


It's available on EDGAR (Electronic Data Gathering, Analysis, and Retrieval) website.

Can retired shared be reissued?

No. Whereas Treasury stocks can be.