NEON EDUCATION bio photo

NEON EDUCATION

Devoted to open data and open source in science and education.

View All Tutorials

This tutorial is a part of a series!

Click below to view all lessons in the series!

Tags

R programming (50)
Hierarchical Data Formats (HDF5) (11)
Spatial Data & GIS (19)
LiDAR (5)
Remote Sensing (9)
Data Visualization (4)
Hyperspectral Remote Sensing (7)
Time Series (15)
Phenology (7)
Raster Data (8)
Vector Data (6)
Metadata (1)
Git & GitHub (6)
(1) (1)

Tutorial by R Package

dplyr (7)
ggplot2 (17)
h5py (2)
lubridate (time series) (6)
maps (1)
maptools (1)
plyr (2)
raster (25)
rasterVis (raster time series) (3)
rgdal (GIS) (22)
rgeos (2)
rhdf5 (11)
sp (5)
scales (4)
gridExtra (4)
ggtheme (0)
grid (2)
reshape2 (3)
plotly (5)

View ALL Tutorial Series




Twitter Youtube Github


Blog.Roll

R Bloggers

Overview

R Skill Level: intermediate

Goals / Objectives

After completing this activity, you will:
  1. Understand how HDF5 files can be created and structured in R using the rhdf5 libraries.
  2. Understand the 3 key HDF5 elements: the HDF5 file itself and groups and datasets.
  3. Understand how to add and read attributes from an HDF5 file.

What You'll Need

  • R or R studio installed.
  • Recommended Background: Consider reviewing the documentation for the RHDF5 libraries

Data to Download

We will use the file below in the optional challenge activity at the end of this tutorial.
Download NEON Teaching Data Subset: Field Site Spatial Data
These remote sensing data files provide information on the vegetation at the National Ecological Observatory Network's San Joaquin Experimental Range and Soaproot Saddle field sites. This data is intended for educational purposes, for access to all the data for research purposes visit the NEON Data Portal.

A Brief Review - About HDF5

The HDF5 file can store large, heterogeneous datasets that include metadata. It also supports efficient data slicing, or extraction of particular subsets of a dataset which means that you don’t have to read large files read into the computers memory / RAM in their entirety in order work with them.

Read more about HDF5 here.

HDF5 in R

To access HDF5 files in R, we will use the rhdf5 library which is part of the Bioconductor suite of R libraries. It might also be useful to install the free HDF5 viewer which will allow you to explore the contents of an HDF5 file using a graphic interface.

More about working with HDFview and a hands-on activity here.

First, let’s get R setup. We will use the RHDF5 library.

# To access HDF5 files in R, we will use the rhdf5 library which is part of the 
#Bioconductor suite of R libraries.

#install rhdf5 package
#source("http://bioconductor.org/biocLite.R")
#biocLite("rhdf5")

#Call the R HDF5 Library
library("rhdf5")

Read more about the rhdf5 package here.

Create an HDF5 File in R

First, let’s create a basic H5 file with one group and one dataset in it.

# Create hdf5 file
h5createFile("vegData.h5")

## [1] TRUE

h5createFile()

## Error in h5createFile(): argument "file" is missing, with no default

#create a group called aNEONSite within the H5 file
h5createGroup("vegData.h5", "aNEONSite")

## [1] TRUE

#view the structure of the h5 we've created
h5ls("vegData.h5")

##   group      name     otype dclass dim
## 0     / aNEONSite H5I_GROUP

Next, let’s create some dummy data to add to our H5 file.

# create some sample, numeric data 
a <- rnorm(n=40, m=1, sd=1) 
someData <- matrix(a,nrow=20,ncol=2)

Write the sample data to the H5 file.

# add some sample data to the H5 file located in the aNEONSite group
# we'll call the dataset "temperature"
h5write(someData, file = "vegData.h5", name="aNEONSite/temperature")

# let's check out the H5 structure again
h5ls("vegData.h5")

##        group        name       otype dclass    dim
## 0          /   aNEONSite   H5I_GROUP              
## 1 /aNEONSite temperature H5I_DATASET  FLOAT 20 x 2

View a “dump” of the entire HDF5 file. NOTE: use this command with CAUTION if you are working with larger datasets!

# we can look at everything too 
# but be cautious using this command!
h5dump("vegData.h5")

## $aNEONSite
## $aNEONSite$temperature
##              [,1]        [,2]
##  [1,]  0.57002015  1.65021857
##  [2,] -0.11734323  0.67155051
##  [3,]  1.09386684  1.50329628
##  [4,]  0.67163631  0.30794922
##  [5,]  0.95862965 -0.78184862
##  [6,] -0.12896460  0.59268695
##  [7,]  0.05685202  2.28120086
##  [8,]  3.26047306  0.88577061
##  [9,] -0.30819170  0.47880539
## [10,]  0.96746933  1.82340641
## [11,] -0.54678513  2.22194159
## [12,]  1.29111825  1.05592524
## [13,]  0.16009552  0.41711324
## [14,]  1.05293700  0.07952516
## [15,]  1.92990794 -0.41059841
## [16,]  1.37178810  1.29995404
## [17,]  0.43722480  1.64796389
## [18,]  2.28480893  1.72972411
## [19,]  1.33004885 -0.61370131
## [20,]  1.99261923  1.64787132

#Close the file. This is good practice.
H5close()

Add Metadata (attributes)

Let’s add some metadata (called attributes in HDF5 land) to our dummy temperature data. First, open up the file.

#open the file, create a class
fid <- H5Fopen("vegData.h5")
#open up the dataset to add attributes to, as a class
did <- H5Dopen(fid, "aNEONSite/temperature")

# Provide the NAME and the ATTR (what the attribute says) 
# for the attribute.
h5writeAttribute(did, attr="Here is a description of the data",
                 name="Description")
h5writeAttribute(did, attr="Meters",
                 name="Units")

#let's add some attributes to the group
did2 <- H5Gopen(fid, "aNEONSite/")
h5writeAttribute(did2, attr="San Joaquin Experimental Range",
                 name="SiteName")
h5writeAttribute(did2, attr="Southern California",
                 name="Location")

#close the files, groups and the dataset when you're done writing to them!
H5Dclose(did)
H5Gclose(did2)
H5Fclose(fid)

Working with an HDF5 File in R

Now that we’ve created our H5 file, let’s use it! First, let’s have a look at the attributes of the dataset and group in the file.

#look at the attributes of the precip_data dataset
h5readAttributes(file = "vegData.h5", 
                 name = "aNEONSite/temperature")

## $Description
## [1] "Here is a description of the data"
## 
## $Units
## [1] "Meters"

#look at the attributes of the aNEONsite group
h5readAttributes(file = "vegData.h5", 
                 name = "aNEONSite")

## $Location
## [1] "Southern California"
## 
## $SiteName
## [1] "San Joaquin Experimental Range"

# let's grab some data from the H5 file
testSubset <- h5read(file = "vegData.h5", 
                 name = "aNEONSite/temperature")

testSubset2 <- h5read(file = "vegData.h5", 
                 name = "aNEONSite/temperature",
                 index=list(NULL,1))
H5close() Once we've extracted data from our H5 file, we can work with it in R. 


#create a quick plot of the data
hist(testSubset2)

Challenge –

Time to test your skills. Open up the D17_2013_SJER_vegStr.csv in R.

  • Create a new HDF5 file called vegStructure.
  • Add a group in your HDF5 file called SJER.
  • Add the veg structure data to that folder.
  • Add some attributes the SJER group and to the data.
  • Now, repeat the above with the D17_2013_SOAP_vegStr csv.
  • Name your second group SOAP

Some code is below to remind you how to import a CSV into R.

#options(stringsAsFactors = FALSE)
#newData <- read.csv("D17_2013_SJER_vegStr.csv")

Get Lesson Code

(some browsers may require you to right click.)