Rex W. Douglass
Director Machine Learning for Social Science Lab (MSSL)
Center for Peace and Security Studies (cPASS)
Department of Political Science
University of California San Diego
rexdouglass@gmail.com
www.rexdouglass.com
@rexdouglass

1 Course Overview

Please bring a two sided coin(s) and scratch paper to class for passing notes for the demonstrations.

This is a 6 hour introduction to machine learning spread across two three-hour lectures. The goal of this very short course is narrow: to give you enough of an overview, vocabulary, and intuition, so that you can identify machine learning problems in the wild and begin your own research into relevant literatures and possible approaches. The goal is not to train you to execute a particular machine learning solution. There are far too many approaches available; they may not cover whatever problem you find; and the state of the art will be different in a year or two anyway. Instead, we will learn how to think about and classify problems into broad types, how to define and measure the efficacy of different solutions to that problem, how to avoid some common and subtle mistakes, and how to think about a full machine learning pipeline from start to finish.

1.2 Readings Policy

Math and programming are not something you learn, they’re something you get used to. The readings of this course are, with a few exceptions, voluntary and intended for self study. They are to help point you in the right direction when you realize you need to start brushing up on a particular set of tools in order to tackle a particular problem.

1.3 Textbooks

Do not purchase any books - each of these should be available for free on-line at the link given. Any individual one would provide a decent background to the field of machine learning. For this course, I’ve picked select chapters when I thought they did a good job reviewing a specific subtopic.

1.6 Software and Programming

Students are not expected to know any particular language or set of software. We will be demonstrating best practices as used in the Machine Learning for Social Science Lab at the Center for Peace and Security Studies, UCSD. In that lab, our software stack consists of Python and R for data preparation and analysis, Spark for database management, Keras/Tensorflow for deep learning, Github for revision control, and Ubuntu for our operating system and command-line tools.

1.7 Applications

1.8 Notes on this Guide

This guide is written as an R notebook using R-Studio. It renders output as static HTML that you should be able to view on a regular web browser.

#install.packages("pacman")
library(pacman)
p_load(infotheo)
p_load(tidyverse)
The following packages have been unloaded:
tidyverse

Installing package into 㤼㸱C:/Users/skynetmini/Documents/R/win-library/3.4㤼㸲
(as 㤼㸱lib㤼㸲 is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/tidyverse_1.2.1.zip'
InternetOpenUrl failed: 'The operation timed out'Error in download.file(url, destfile, method, mode = "wb", ...) : 
  cannot open URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/tidyverse_1.2.1.zip'
download of package 㤼㸱tidyverse㤼㸲 failed
tidyverse installed
Failed to install/load:
tidyverse
p_load(ggplot2)
p_load(cowplot)
p_load(mlbench)
p_load(Metrics)
set.seed(123)

2 Introduction

2.1 What is machine learning?

  • (CIML), Chapter 1,
  • (WMLW 1.0) “Introduction”
  • (ISL) “Chapter 2 Statistical Learning”"

2.2 What isn’t machine learning?

2.2.1 Statistics

3 Information Theory

4 Information Sources \(Y\)

4.1 Random Variables and Distributions

4.2 Entropy

4.3 Zero-Bit source

  • Zero-Bit information sources are constants that don’t vary.
  • Zero-Bit source, Zero-Bit message: No variation and no measurement.
  • Zero-Bit information source, One-Bit message: No variation, but a single measurement with variation between two states.
  • Zero-Bit information source, N-Bit message: No variation, and an arbitrary number of measurements with arbitrary number of states

  • “Design, Inference, and the Strategic Logic of Suicide Terrorism”, SCOTT ASHWORTH, JOSHUA D. CLINTON, ADAM MEIROWITZ, and KRISTOPHER W. RAMSAY, 2008, APSR

4.4 Binary Source (\(<=1bit\))

One-Bit information sources are variables that can take on two different states, e.g. a coin flip. Call \(Y\) the true state at the source, and \(\hat{Y}\) the mental model of the state at the destination.

(IntroMachineLearningWithR) “Chapter 3 Example datasets” * Binary_classification

library(infotheo)
N <- 699 #Flip a coin N times (Matched to Breast Cancer Dataset Below)
sample_space <- c(1,0) #Heads and Tails

4.4.1 Fair Coin

A fair coin has equal likelihood of both heads and tails. Estimated entropy is close to the true value of 1 bit.

p <- 0.5 #Fair Coin
Y_coin_fair <- sample(sample_space, size = N,  replace = TRUE,  prob = c(p, 1 - p))  
print(table(Y_coin_fair))
Y_coin_fair
  0   1 
358 341 
print(natstobits(entropy(Y_coin_fair, method="emp")))
[1] 0.9995733

4.4.2 Unfair Coin (p=0.8)

An unfair coin is weighted to be more likely to land heads or tails. Estimated entropy is less than a full bit. There is less surprise than from a full fair coin flip.

p <- 0.8 #Unfair Coin
Y_coin_unfair <- sample(sample_space, size = N,  replace = TRUE,  prob = c(p, 1 - p))  
print(table(Y_coin_unfair))
Y_coin_unfair
  0   1 
126 573 
print(natstobits(entropy(Y_coin_unfair, method="emp")))
[1] 0.68064

4.4.3 Two Headed Coin

A two headed coin will only ever land one way.

p <- 1 #Two headed coin
Y_coin_twoheaded <- sample(sample_space, size = N,  replace = TRUE,  prob = c(p, 1 - p))  
print(table(Y_coin_twoheaded))
Y_coin_twoheaded
  1 
699 
print(natstobits(entropy(Y_coin_twoheaded, method="emp")))
[1] 1.281371e-15

4.4.4 UCI Breast Cancer Dataset

“Breast Cancer”, Raul Eulogio, January 26, 2018

data(BreastCancer)
glimpse(BreastCancer)
Observations: 699
Variables: 11
$ Id              <chr> "1000025", "1002945", "1015425", "1016277", "1017023", "1017122", "101...
$ Cl.thickness    <ord> 5, 5, 3, 6, 4, 8, 1, 2, 2, 4, 1, 2, 5, 1, 8, 7, 4, 4, 10, 6, 7, 10, 3,...
$ Cell.size       <ord> 1, 4, 1, 8, 1, 10, 1, 1, 1, 2, 1, 1, 3, 1, 7, 4, 1, 1, 7, 1, 3, 5, 1, ...
$ Cell.shape      <ord> 1, 4, 1, 8, 1, 10, 1, 2, 1, 1, 1, 1, 3, 1, 5, 6, 1, 1, 7, 1, 2, 5, 1, ...
$ Marg.adhesion   <ord> 1, 5, 1, 1, 3, 8, 1, 1, 1, 1, 1, 1, 3, 1, 10, 4, 1, 1, 6, 1, 10, 3, 1,...
$ Epith.c.size    <ord> 2, 7, 2, 3, 2, 7, 2, 2, 2, 2, 1, 2, 2, 2, 7, 6, 2, 2, 4, 2, 5, 6, 2, 2...
$ Bare.nuclei     <fct> 1, 10, 2, 4, 1, 10, 10, 1, 1, 1, 1, 1, 3, 3, 9, 1, 1, 1, 10, 1, 10, 7,...
$ Bl.cromatin     <fct> 3, 3, 3, 3, 3, 9, 3, 3, 1, 2, 3, 2, 4, 3, 5, 4, 2, 3, 4, 3, 5, 7, 2, 7...
$ Normal.nucleoli <fct> 1, 2, 1, 7, 1, 7, 1, 1, 1, 1, 1, 1, 4, 1, 5, 3, 1, 1, 1, 1, 4, 10, 1, ...
$ Mitoses         <fct> 1, 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 4, 1, 1, 1, 2, 1, 4, 1, 1, 1...
$ Class           <fct> benign, benign, benign, benign, benign, malignant, benign, benign, ben...
summary(BreastCancer$Class)
   benign malignant 
      458       241 
print(natstobits(entropy(BreastCancer$Class, method="emp")))
[1] 0.9293179

4.5 Multi-class Sources

4.5.1 Iris Dataset

data(iris)
glimpse(iris)
Observations: 150
Variables: 5
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8...
$ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4, 3.0, 3.0, 4.0...
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 1.4, 1.1, 1.2...
$ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.2, 0.1, 0.1, 0.2...
$ Species      <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, setosa, setosa, s...
summary(iris$Species)
    setosa versicolor  virginica 
        50         50         50 
print(natstobits(entropy(iris$Species, method="emp")))
[1] 1.584962

4.6 Real Valued Sources

4.6.1 Boston Housing Data

data(BostonHousing)
glimpse(BostonHousing)
Observations: 506
Variables: 14
$ crim    <dbl> 0.00632, 0.02731, 0.02729, 0.03237, 0.06905, 0.02985, 0.08829, 0.14455, 0.2112...
$ zn      <dbl> 18.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 0.0, ...
$ indus   <dbl> 2.31, 7.07, 7.07, 2.18, 2.18, 2.18, 7.87, 7.87, 7.87, 7.87, 7.87, 7.87, 7.87, ...
$ chas    <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ nox     <dbl> 0.538, 0.469, 0.469, 0.458, 0.458, 0.458, 0.524, 0.524, 0.524, 0.524, 0.524, 0...
$ rm      <dbl> 6.575, 6.421, 7.185, 6.998, 7.147, 6.430, 6.012, 6.172, 5.631, 6.004, 6.377, 6...
$ age     <dbl> 65.2, 78.9, 61.1, 45.8, 54.2, 58.7, 66.6, 96.1, 100.0, 85.9, 94.3, 82.9, 39.0,...
$ dis     <dbl> 4.0900, 4.9671, 4.9671, 6.0622, 6.0622, 6.0622, 5.5605, 5.9505, 6.0821, 6.5921...
$ rad     <dbl> 1, 2, 2, 3, 3, 3, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...
$ tax     <dbl> 296, 242, 242, 222, 222, 222, 311, 311, 311, 311, 311, 311, 311, 307, 307, 307...
$ ptratio <dbl> 15.3, 17.8, 17.8, 18.7, 18.7, 18.7, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2, 15.2, ...
$ b       <dbl> 396.90, 396.90, 392.83, 394.63, 396.90, 394.12, 395.60, 396.90, 386.63, 386.71...
$ lstat   <dbl> 4.98, 9.14, 4.03, 2.94, 5.33, 5.21, 12.43, 19.15, 29.93, 17.10, 20.45, 13.27, ...
$ medv    <dbl> 24.0, 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15.0, 18.9, 21.7, ...
summary(BostonHousing$medv)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   5.00   17.02   21.20   22.53   25.00   50.00 
print(natstobits(entropy(discretize(BostonHousing$medv), method="emp")))
[1] 2.80701

5 Comparing \(Y\) and \(\hat{Y}\)

(IntroMachineLearningWithR) 5.4 Classification performance

5.1 Binary

How should we compare the true reality \(Y\) to our mental model of it \(\hat{Y}\)?

5.1.1 Confusion Matrix

table(BreastCancer$Class, BreastCancer$Class)
           
            benign malignant
  benign       458         0
  malignant      0       241
table(BreastCancer$Class, Y_coin_fair)
           Y_coin_fair
              0   1
  benign    239 219
  malignant 119 122
table(BreastCancer$Class, Y_coin_unfair)
           Y_coin_unfair
              0   1
  benign     84 374
  malignant  42 199
table(BreastCancer$Class, Y_coin_twoheaded)
           Y_coin_twoheaded
              1
  benign    458
  malignant 241

5.1.2 Accuracy

p_load(Metrics)
BreastCancer$Class_binary <- as.numeric(BreastCancer$Class=="malignant")
accuracy(BreastCancer$Class_binary, BreastCancer$Class_binary)
[1] 1
accuracy(BreastCancer$Class_binary, Y_coin_fair)
[1] 0.5164521
accuracy(BreastCancer$Class_binary, Y_coin_unfair)
[1] 0.4048641
accuracy(BreastCancer$Class_binary, Y_coin_twoheaded)
[1] 0.3447783

5.1.3 Precision

p_load(Metrics)
Metrics::precision(BreastCancer$Class_binary,
         BreastCancer$Class_binary)
[1] 1
Metrics::precision(BreastCancer$Class_binary, Y_coin_fair)
[1] 0.3577713
Metrics::precision(BreastCancer$Class_binary, Y_coin_unfair)
[1] 0.3472949
Metrics::precision(BreastCancer$Class_binary, Y_coin_twoheaded)
[1] 0.3447783

5.1.4 Recall

p_load(Metrics)
Metrics::recall(BreastCancer$Class_binary,
         BreastCancer$Class_binary)
[1] 1
Metrics::recall(BreastCancer$Class_binary, Y_coin_fair)
[1] 0.5062241
Metrics::recall(BreastCancer$Class_binary, Y_coin_unfair)
[1] 0.8257261
Metrics::recall(BreastCancer$Class_binary, Y_coin_twoheaded) 
[1] 1

5.1.5 F1

#Note Metrics::f1 doesn't give the correct values
p_load(MLmetrics)
F1_Score(BreastCancer$Class_binary,
         BreastCancer$Class_binary)
[1] 1
F1_Score(BreastCancer$Class_binary, Y_coin_fair)
[1] 0.5857843
F1_Score(BreastCancer$Class_binary, Y_coin_unfair)
[1] 0.2876712
#F1_Score(BreastCancer$Class_binary, Y_coin_twoheaded) 
#Not happy about all 1 prediction
print( (0.3447783*1)/(0.3447783+1)*2 ) #Calculate by hand
[1] 0.512766

5.2 Probalistic Predictions

5.2.1 Log Loss

Log Loss

MLmetrics::LogLoss(BreastCancer$Class_binary,
         BreastCancer$Class_binary)
[1] 9.992007e-16
MLmetrics::LogLoss(BreastCancer$Class_binary, Y_coin_fair)
[1] 16.70129
MLmetrics::LogLoss(BreastCancer$Class_binary, Y_coin_unfair)
[1] 20.55531
MLmetrics::LogLoss(BreastCancer$Class_binary, Y_coin_twoheaded) 
[1] 22.63056

5.2.2 Area Under the Curve

Measuring classifier performance: a coherent alternative to the area under the ROC curve, David J. Hand, Mach Learn (2009) 77: 103–123

Generate ROC Curve Charts for Print and Interactive Use, Michael C Sachs, 2018-06-23 Illustrated Guide to ROC and AUC, Raffael Vogler, June 23, 2015

#Simulate a probalistic prediction
noised_prediction <- function(prediction){ noised <-runif(N,0,0.5) ; noised[prediction==1] <-noised[prediction==1]+0.5; return(noised)   }
AUC(noised_prediction(BreastCancer$Class_binary), BreastCancer$Class_binary)
[1] 1
AUC(noised_prediction(Y_coin_fair), BreastCancer$Class_binary)
[1] 0.5230118
AUC(noised_prediction(Y_coin_unfair), BreastCancer$Class_binary)
[1] 0.5179384
AUC(noised_prediction(Y_coin_twoheaded), BreastCancer$Class_binary)
[1] 0.4781388
p_load(plotROC)
Installing package into 㤼㸱C:/Users/skynetmini/Documents/R/win-library/3.4㤼㸲
(as 㤼㸱lib㤼㸲 is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/plotROC_2.2.1.zip'
InternetOpenUrl failed: 'A connection with the server could not be established'Error in download.file(url, destfile, method, mode = "wb", ...) : 
  cannot open URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/plotROC_2.2.1.zip'
download of package 㤼㸱plotROC㤼㸲 failedthere is no package called 㤼㸱plotROC㤼㸲Failed to install/load:
plotROC
set.seed(2529)
D.ex <- rbinom(200, size = 1, prob = .5)
M1 <- rnorm(200, mean = D.ex, sd = .65)
M2 <- rnorm(200, mean = D.ex, sd = 1.5)
test <- data.frame(D = D.ex, D.str = c("Healthy", "Ill")[D.ex + 1], 
                   M1 = M1, M2 = M2, stringsAsFactors = FALSE)
basicplot <- ggplot(test, aes(d = D, m = M1)) + geom_roc(labels = FALSE)
Error in is_list(x) : object 'rlang_is_list' not found

5.2.4 Others

5.2.5 Imbalanced Data

5.3 Multiclass

5.4 Real Valued

y_hat=mean(BostonHousing$medv)
MAE(BostonHousing$medv, y_hat)
MSE(BostonHousing$medv, y_hat)

6 Transmitters and Receivers

Function_(mathematics)
* Inverse_function

6.1 What makes a good receiver?

6.1.1 Risk Analysis

  • PRML 1.5 “Decision Theory”
  • ESL 2.4 Statistical Decision Theory

6.1.2 Bias-Variance Tradeoff

6.1.4 No Free Lunch

6.2 Ommitted Variable Bias

6.3 Feature Selection / Included Variable Bias

6.4 Feature Engineering

6.5 Bandwidth / Under determination

6.6 Curse of Dimensionality

