Devoted to open data and open source in science and education.

View All Tutorials

This tutorial is a part of a series!

Click below to view all lessons in the series!


LiDAR (9)
R programming (70)
Remote Sensing (12)
Data Visualization (4)
Hyperspectral Remote Sensing (7)
Hierarchical Data Formats (HDF5) (24)
Spatial Data & GIS (18)
Time Series (15)
Phenology (7)
Raster Data (8)
Vector Data (6)
Metadata (1)
Git & GitHub (6)
(1) (1) (1)

Tutorial by R Package

dplyr (8)
ggplot2 (17)
h5py (1)
lubridate (time series) (6)
maps (1)
maptools (3)
plyr (2)
raster (32)
rasterVis (raster time series) (3)
rgdal (GIS) (23)
rgeos (5)
rhdf5 (21)
sp (7)
scales (4)
gridExtra (4)
ggtheme (0)
grid (2)
reshape2 (3)
plotly (6)

View ALL Tutorial Series

Twitter Youtube Github


R Bloggers

First, let’s load the required libraries.

# load libraries

# set working directory
# setwd("C:/Users/kdahlin/Dropbox/NEON_WWDI_2016/20160602")
# setwd("~/Documents/data/NEONDI-2016") # Mac
# setwd("~/data/NEONDI-2016")  # Windows

The first thing that we can do is load the functions that we want to use into our environment. This makes it easy to quickly access these functions without having to retype the function code into our script. This also makes it easy to maintain function code that we use regularly in ONE PLACE.

# import NEON aop R package

# library(devtools)
## install from github
# install_github("lwasser/neon-aop-package/neonAOP")

Import NEON Lidar Data Products

# import digital surface model (dsm) (top of the surface - includes trees and buildings)
dsm <- raster("NEONdata/D17-California/TEAK/2013/lidar/TEAK_lidarDSM.tif")
# import  digital terrain model (dtm), elevation
dtm <- raster("NEONdata/D17-California/TEAK/2013/lidar/TEAK_lidarDTM.tif") 

# import canopy height model (height of vegetation) 
chm <- raster("NEONdata/D17-California/TEAK/2013/lidar/TEAK_lidarCHM.tif")

Explore CHM

Next, let’s explore our CHM data.

# Check out the height distribution - do the values seem reasonable?
     main="Canopy Height\n Lower Teakettle, California") 

     main="Distribution of Canopy Height \nTeakettle, California",
     xlab="Tree Height (m)", 

## Warning in .hist1(x, maxpixels = maxpixels, main = main, plot = plot, ...):
## 32% of the raster cells were used. 100000 values used.

Valid Data Range

The valid range of data for a NEON canopy height models (CHM) is >= 2m. This is because the LiDAR system is not sensitive enough to distinguish objects that are closer than ~2m apart vertically.

Explore Veg Height data

Let’s have a close look at the vegetation height values. Do they seem reasonable?

# view chm mean and max
cellStats(chm, max)

## [1] 55.68

cellStats(chm, mean)

## [1] 5.62654

Create LiDAR Raster Brick

Next, we can stack the rasters together to create a brick.

# for simplicity later let's stack these rasters together
lidar.brick <- brick(dsm, dtm, chm)

Read Hyperspectral Data

Next, let’s read in HSI data.

We could use the NDVI data product, however, let’s calculate NDVI ourselves. Note, that there are many bands in HSI data within the red and near-infrared (NIR) region. Thus, simply selecting one band in each region is not always the most robust way to go.

# first identify the file of interest
f <- "NEONdata/D17-California/TEAK/2013/spectrometer/reflectance/Subset3NIS1_20130614_100459_atmcor.h5"

# then id the projection code
# define the CRS definition by EPSG code
epsg <- 32611

# create a list of bands
bands <- c(60,83)

# Let's read in a few spectral bands as a stack using a function
ndvi.stack <- create_stack(f, 
                           bands = bands,

# calculate ndvi
ndvi <- (ndvi.stack[[2]]-ndvi.stack[[1]]) / (ndvi.stack[[2]]+ndvi.stack[[1]])
names(ndvi) <- "TEAK_hsiNDVI"

# plot ndvi
     main="NDVI \nNEON Lower Teakettle Field Site")

Create Brick of LiDAR and NDVI

# Create a brick from the data <- brick(ndvi, lidar.brick)

## Error in compareRaster(x): different extent

Heterogeneous Data - Varying Extents

Looks like we have an error. Why didn’t the brick function work? Let’s check out the extents of both R objects - chm and ndvi.

# view extents

## class       : Extent 
## xmin        : 325963 
## xmax        : 326506 
## ymin        : 4102905 
## ymax        : 4103482


## class       : Extent 
## xmin        : 325963 
## xmax        : 326507 
## ymin        : 4102904 
## ymax        : 4103482

Dealing with Different Extents

The extents are slightly different. Let’s write a if statment that checks the extents and crops them in case they are different.

Note: this could become a function that you use over and over! If you used it that way you’d want to implement a crop of BOTH datasets just in case neither are perfectly within the overlap region.

# check the extents of the two raster layers -- if they are different
# crop the data 

if (extent(chm) == extent(ndvi)){
 } else {
    print("Extents are different, cropping data")
 overlap <- intersect(extent(ndvi), extent(lidar.brick))
  # now let's crop the lidar data to the HSI data
 lidar.brick <- crop(lidar.brick, overlap)
 ndvi <- crop(ndvi, overlap)

## [1] "Extents are different, cropping data"

Now let’s try to create a brick again.

# Create a brick from the data <- brick(ndvi, lidar.brick)
# make names nice!
all.names <- c("NDVI", "DSM", "DTM", "CHM" )
names( <- all.names

Import NEON NDVI Data Product

We can import the NEON NDVI data product next to use in our analysis. Let’s then compare that to NDVI that we calculated. Are they the same?

# import NEON NDVI product
ndvi2 <- raster("NEONdata/D17-California/TEAK/2013/spectrometer/veg_index/TEAK_NDVI.tif")

# compare the two products
ndvi.diff <- ndvi-ndvi2

# plot difference
     main="NDVI DIFFERENCE \nLower Teakettle Field Site")

Different Extents - Optional

Ok - now we’ve run in to this extent issue twice. Maybe it’s time to create a crop extent function.

# check the extents of the two raster layers -- if they are different
# crop the data 
same_extent <- function(raster1,raster2){
  if (extent(raster1) == extent(raster2)){
    print("Both rasters have the same extent.")
   } else {
   overlap <- intersect(extent(raster1), extent(raster2))
    # now let's crop both rasters
   # note it wold be better to figure out which raster is outside of
   # the overlap just in case the crop is time intensive
   raster1 <- crop(raster1, overlap)
   raster2 <- crop(raster2, overlap)
   # create a stack of the two rasters
   raster.stack <- stack(raster1, raster2)
   print("Extents are different. Cropping data")

Compare NEON NDVI to Calculated NDVI

Next, let’s compare the NEON NDVI data product to NDVI that we calculated from the same HDF5 file.

# compare NEON data product to our calculated NDVI

ndvi.diff <- ndvi-ndvi2
     main="Difference - NEON NDVI Product vs Our Calculated NDVI",

# view hist of differences
     main="histogram of differences")

Consider Slope & Aspect

Next, let’s test a simple hypothesis.

Because central California is:

  • Dry and
  • In the northern hemisphere,

we may expect to find taller, greener vegetation on north facing slopes than on south facing slopes. To test this, we need to:

  1. Import the NEON aspect data product.
  2. Isolate north and south facing slopes.
  3. Decide what we mean by ‘tall’ and ‘green’.
  4. Isolate tall, green pixels on north & south facing slopes.
  5. Examine the percent of pixels for tall green pixels on north vs south facing slopes.
  6. Run a t-test to compare all pixels.

Let’s get started.

Step 1. Import Aspect data product

# 1. Import aspect data product (derived from the DTM)
aspect <- raster("NEONdata/D17-California/TEAK/2013/lidar/TEAK_lidarAspect.tif")
# crop the data to the extent of the other rasters we are working with
aspect <- crop(aspect, extent(chm))

Data Tip: You can create an aspect layer from a DEM / DTM using the terrain function: terrain([[3]], opt = "aspect", unit = "degrees", neighbors = 8)

2. Create Aspect Mask

Next we will create a mask using the aspect data product. Values are as follows:

  • South Facing: 135-225 degrees
  • North Facing: 315-360 and 0-45 degrees

We can do this by reclassifying the aspect data product using the reclassify function in the raster package.

First we need to create matrix that has 3 columns. the first two columns represent the data values within a range that we want to classify. The third column contains the new value that we will assign that range of values to. For example:

  • 0 to 45 degrees should be classified as 1 (North Facing)
  • 135 to 225 degrees should be classified as 2 (South Facing)
  • Greater than 315 should be classified as 1 (North Facing)

Now we can create the mask.

# Create a classified aspect intermediate output 
# first create a matrix of values that represent the classification ranges
# North face = 1
# South face = 2
# classify classes start to the RIGHT of the beginning value. So we start at -99
# to capture zeros
class.m <- c(-.99, 45, 1, 
             45, 135, NA, 
             135, 225, 2,  
             225 , 315, NA, 
             315, 360, 1)
# reshape into a matrix
rcl.m <- matrix(class.m, 

##        [,1] [,2] [,3]
## [1,]  -0.99   45    1
## [2,]  45.00  135   NA
## [3,] 135.00  225    2
## [4,] 225.00  315   NA
## [5,] 315.00  360    1

# classify the aspect product using the classification matrix
asp.ns <- reclassify(aspect, rcl.m)
# set 0 values to NA
asp.ns[asp.ns==0] <- NA

Plot aspect

# define the extetn of the map -
# this is used to place the legend on the plot.
ns.extent <- extent(asp.ns)

# plot data
     main="North and South Facing Slopes \nNEON Lower Teakettle Field Site",

# force a border
plot(extent(asp.ns), add=T) 
# allow legend to plot outside of bounds

legend((par()$usr[2] + 20), ns.extent@ymax-100, # set xy legend location
       legend = c("North", "South"),
       fill = c("blue", "green"), 
       bty="n") # turn off border

North / South Facing Slopes

Next, we can create a north and south facing mask object. A mask is a layer where the pixels that you want to exclude are set to NA. The pixels that you wish to include in your analysis have a value. In this case, that value is 1 if north facing and 2 if south facing.

# create north facing mask object
north.facing <- asp.ns==1
north.facing[north.facing == 0] <- NA

# Create south facing mask object
south.facing <- asp.ns==2
south.facing[south.facing == 0] <- NA

Export North/South Aspect GeoTIFF

Before we go any further, let’s export a GeoTIFF. This could be useful for another analysis.

# export geotiff 
            overwrite = TRUE,
            NAflag = -9999)

3. Identify Vegetation Metrics

Now we want to determine what defines “tall” and “green”. We can explore histograms of our data and use descriptive statistics to determine what values might make the most sense.

# histogram of tree ht
     main="Distribution of Canopy Height Model (CHM) values \nNEON Lower Teakettle Field Site",

# get mean, min max stats for all layers <- data.frame(t(summary(, 
                                       na.rm=T)))$mean <- ht.mean <- cellStats(, 
                                            na.rm=T)$sd <- ht.mean <- cellStats(, 

row.names( <- all.names

# view data.frame

##              Min.     X1st.Qu.       Median     X3rd.Qu.         Max. NA.s
## NDVI   -0.1495505    0.1341227    0.4074749    0.6767036    0.9049774    0
## DSM  2172.8298340 2283.6398926 2310.2099609 2328.3000488 2391.8298340    0
## DTM  2172.8298340 2277.0100098 2306.5600586 2322.7900391 2385.2299805    0
## CHM     0.0000000    0.0000000    0.0000000    8.2100000   55.6800003  855
##              mean         sd
## NDVI    0.4126061  0.2745163
## DSM  2305.6755206 37.6958661
## DTM  2301.1770607 39.1300683
## CHM     5.6265399 10.0842808

Calculate Tall Trees Threshold

Note that the data aren’t normally distributed - something to consider when you are determining what your thresholds are.

Uncertainty discussion: selecting thresholds.

# create threshold dataframe
thresholds <- data.frame(id=1)

# let's be semi-robust and call 'tall' trees those with mean + 1 sd
thresholds$height <-["CHM","mean"] +["CHM", "sd"]

## [1] 15.71082

Next, look at NDVI.

# now let's look at ndvi
     main="Distribution of NDVI values\n Teakettle",

# this is a nice bimodal data set, so let's just take the top 1/3 of the data
# or manually calculate the top third
thresholds$greenRange <-["NDVI","Max."] -["NDVI","Min."]
thresholds$greenThresh <-["NDVI","Max."] - (thresholds$greenRange/3)

# or manually calculate mean + 1 sd
# thresholds$greenThresh <-["NDVI","mean"] +["NDVI","sd"]

4. Calculate Percent of Tall, Green Pixels

Next, let’s calculate the percent of tall, green pixels that occur on north and south facing slopes respectively. Our pixels are exactly 1 x 1 m in size, thus we can use the % of pixels as a proxy for % area.

Remember that 1 = North Facing and 2 = South Facing in our classified aspect object asp.ns.

# North = 1 and South facing = 2, calculate total pixels
north.count <- freq(asp.ns, value =1)
south.count <- freq(asp.ns, value =2)

# note there's more south facing area in this image than north facing

# create a new layer with pixels that are north facing, above the green threshold and
# above the CHM height threshold <- asp.ns == 1  & 
          [[1]] >= thresholds$greenThresh & 
          [[4]] >= thresholds$height

# assign values of 0 to NA so this becomes a mask[ == 0] <- NA

# how many pixels fit the "north, tall green" criteria? <- freq(, value =1)

# repeat the same steps for south facing slopes. Note
# we are repeating code - this could become a nice function! <- asp.ns == 2 & 
          [[1]] >= thresholds$greenThresh & 
          [[4]] >= thresholds$height <- freq(, value=1)[ == 0] <- NA

# divide the number of pixels that are green by the total north facing pixels <-, value=1) <-, value=2)

# if we look at these fractions, >11% of the pixels on north facing slopes should
# meet our tall and green criteria, while <6% of the pixels on south facing
# slopes do. So that's reassuring. (using original data set)

Plot Color Infrared (CIR) Image

Next, let’s have a look at the site that we are working with. We can use the hyperspectral remote sensing data to plot a color infrared image.

We will use the following bands:

Color Band Number Wavelength    
Blue 35 ~ 550nm    
Green 60 ~ 550nm    
Near-Infrared 83 ~ 550nm    

We can use the create_stack function that is a part of the NEON AOP R package of functions to quickly import the three bands. Then we can use plotRGB to plot the bands as an RGB image.

f <- "NEONdata/D17-California/TEAK/2013/spectrometer/reflectance/Subset3NIS1_20130614_100459_atmcor.h5"

# define the CRS definition by EPSG code
epsg <- 32611

# create a list of bands
bands <- c(83, 60, 35)

# Let's read in a few spectral bands as a stack using a function
cir.stack <- create_stack(file=f,
                          bands = bands,

# ignore reflectance values > 1
cir.stack[cir.stack > 1] <- NA

# plot cir image
        scale = 1, 
        stretch = "lin")

     col = "cyan", 
     add = T, 
     legend = F)

     col = "blue", 
     add = T, 
     legend = F)

# (5) let's do some stats! t-test and boxplots of veg height and greenness 
# distributions in north versus south facing parts of scene.

# let's start with NDVI - isolate NDVI on north and south facing slopes
north.NDVI <- mask([[1]], north.facing)
south.NDVI <- mask([[1]], south.facing)

Grab Pixel Metrics

Next, let’s extract pixel values for our masked areas of interest.

## get values and coerce to north values to dataframe
north.ndvi.df <- na.omit(
north.ndvi.df$aspect <- rep("north", length(north.ndvi.df[,1]))
names(north.ndvi.df) <- c("NDVI","aspect")

south.ndvi.df <- na.omit(
south.ndvi.df$aspect <- rep("south", length(south.ndvi.df[,1]))
names(south.ndvi.df) <- c("NDVI","aspect")

ndvi.df <- rbind(north.ndvi.df, south.ndvi.df)
# convert aspect to factor - NOTE you don't have to do this
ndvi.df$aspect <- as.factor(ndvi.df$aspect)

boxplot(NDVI ~ aspect, 
        data = ndvi.df, 
        col = "cornflowerblue", 
        main = "NDVI on North versus South facing slopes")

# and now a t-test - note that since these aren't normally distributed, this
# might not be the best approach, but ok for a quick assessment.
NDVI.ttest <- t.test(north.ndvi.df$NDVI, 
                     alternative = "greater")

Challenge Activity: Vegetation Height

Your turn! Using the technique that we used above, run the same analysis but this time, use tree height instead of NDVI as the variable of interest. Create a boxplot of treeheight compared to aspect. Then run a t-test. Are NDVI and tree height related? Why might this be?

## 	Welch Two Sample t-test
## data:  north.veght.df$veght and south.veght.df$veght
## t = 75.357, df = 41602, p-value < 2.2e-16
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  4.316169      Inf
## sample estimates:
## mean of x mean of y 
##  6.069046  1.656562

Notice, once again we are repeating code. This would make for a nice function! If it’s a set of functions, we could have changed the methods in one place from NDVI to tree height and then simply re-run the code for the challenge!

Get Lesson Code

(some browsers may require you to right click.)