NEON EDUCATION bio photo

NEON EDUCATION

Devoted to open data and open source in science and education.

View All Tutorials

This tutorial is a part of a series!

Click below to view all lessons in the series!

Tags

R programming (52)
Hierarchical Data Formats (HDF5) (15)
Spatial Data & GIS (22)
LiDAR (10)
Raster Data (14)
Remote Sensing (25)
Data Visualization (4)
Hyperspectral Remote Sensing (18)
Time Series (15)
Phenology (7)
Vector Data (6)
Metadata (1)
Git & GitHub (7)
(1) (1) (14)

Tutorial by R Package

dplyr (7)
ggplot2 (16)
h5py (2)
lubridate (time series) (6)
maps (1)
maptools (1)
plyr (2)
raster (26)
rasterVis (raster time series) (3)
rgdal (GIS) (24)
rgeos (2)
rhdf5 (11)
sp (5)
scales (4)
gridExtra (4)
ggtheme (0)
grid (2)
reshape2 (3)
plotly (5)

View ALL Tutorial Series




Twitter Youtube Github


Blog.Roll

R Bloggers

Overview

In this tutorial, we will learn to classify spectral data using the Support Vector Machine (SVM) method.

Objectives

After completing this tutorial, you will be able to:

  • Classify spectral remote sensing data using Support Vector Machine (SVM).

Install Python Packages

  • numpy
  • gdal
  • matplotlib
  • matplotlib.pyplot

Download Data

Download the spectral classification teaching data subset

Additional Materials

This tutorial was prepared in conjunction with a presentation on spectral classification that can be downloaded.

Download Dr. Paul Gader’s Classification 1 PPT

Download Dr. Paul Gader’s Classification 2 PPT

Download Dr. Paul Gader’s Classification 3 PPT

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from scipy import linalg
from scipy import io
from sklearn import linear_model as lmd
InFile1          = 'LinSepC1.mat'
InFile2          = 'LinSepC2.mat'
C1Dict           = io.loadmat(InFile1)
C2Dict           = io.loadmat(InFile2)
C1               = C1Dict['LinSepC1']
C2               = C2Dict['LinSepC2']
NSampsClass    = 200
NSamps         = 2*NSampsClass
### Set Target Outputs ###
TargetOutputs                     =  np.ones((NSamps,1))
TargetOutputs[NSampsClass:NSamps] = -TargetOutputs[NSampsClass:NSamps]
AllSamps     = np.concatenate((C1,C2),axis=0)
AllSamps.shape
(400, 2)
#import sklearn
#sklearn.__version__
LinMod = lmd.LinearRegression.fit?
LinMod = lmd.LinearRegression.fit
M = lmd.LinearRegression()
print(M)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
LinMod = lmd.LinearRegression.fit(M, AllSamps, TargetOutputs, sample_weight=None)
R = lmd.LinearRegression.score(LinMod, AllSamps, TargetOutputs, sample_weight=None)
print(R)
0.911269176982
LinMod
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
w = LinMod.coef_
w
array([[ 0.81592447,  0.94178188]])
w0 = LinMod.intercept_
w0
array([-0.01663028])
### Question:  How would we compute the outputs of the regression model?

Kernels

Now well use support vector models (SVM) for classification.

from sklearn.svm import SVC
### SVC wants a 1d array, not a column vector
Targets = np.ravel(TargetOutputs)
InitSVM = SVC()
InitSVM
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)
TrainedSVM = InitSVM.fit(AllSamps, Targets)
y = TrainedSVM.predict(AllSamps)
plt.figure(1)
plt.plot(y)
plt.show()

d = TrainedSVM.decision_function(AllSamps)
plt.figure(1)
plt.plot(d)
plt.show()

Including Outliers

We can also try it with outliers.

Let’s start by looking at some spectra.

### Look at some Pine and Oak spectra from
### NEON Site D03 Ordway-Swisher Biological Station
### at UF
### Pinus palustris
### Quercus virginiana
InFile1 = 'Pines.mat'
InFile2 = 'Oaks.mat'
C1Dict  = io.loadmat(InFile1)
C2Dict  = io.loadmat(InFile2)
Pines   = C1Dict['Pines']
Oaks    = C2Dict['Oaks']
WvFile  = 'NEONWvsNBB.mat'
WvDict  = io.loadmat(WvFile)
Wv      = WvDict['NEONWvsNBB']
Pines.shape
(809, 346)
Oaks.shape
(1731, 346)
NBands=Wv.shape[0]
print(NBands)
346

Notice that these training sets are unbalanced.

NTrainSampsClass = 600
NTestSampsClass  = 200
Targets          = np.ones((1200,1))
Targets[range(600)] = -Targets[range(600)]
Targets             = np.ravel(Targets)
print(Targets.shape)
(1200,)
plt.figure(111)
plt.plot(Targets)
plt.show()

TrainPines = Pines[0:600,:]
TrainOaks  = Oaks[0:600,:]
#TrainSet   = np.concatenate?
TrainSet   = np.concatenate((TrainPines, TrainOaks), axis=0)
print(TrainSet.shape)
(1200, 346)
plt.figure(3)
### Plot Pine Training Spectra ###
plt.subplot(121)
plt.plot(Wv, TrainPines.T)
plt.ylim((0.0,0.8))
plt.xlim((Wv[1], Wv[NBands-1]))
### Plot Oak Training Spectra ###
plt.subplot(122)
plt.plot(Wv, TrainOaks.T)
plt.ylim((0.0,0.8))
plt.xlim((Wv[1], Wv[NBands-1]))
plt.show()

InitSVM= SVC()
TrainedSVM=InitSVM.fit(TrainSet, Targets)

d = TrainedSVM.decision_function(TrainSet)

plt.figure(4)
plt.plot(d)
plt.show()

Does this seem to be too good to be true?

TestPines = Pines[600:800,:]
TestOaks  = Oaks[600:800,:]
TestSet = np.concatenate((TestPines, TestOaks), axis=0)
print(TestSet.shape)
(400, 346)
dtest = TrainedSVM.decision_function(TestSet)
plt.figure(5)
plt.plot(dtest)
plt.show()

Yeah, too good to be true…What can we do?

Error Analysis

Error analysis can be used to identify characteristics of errors.
Try different Magic Numbers using Cross Validation, etc.


Get Lesson Code

(some browsers may require you to right click.)