A NEON #WorkWithData Event
Date: 13 August 2015 - Ecological Society of America Meeting
Location: Baltimore, Maryland - Baltimore Convention Center Rm 311
Scientists often need to create continuous datasets, in raster or gridded format, of biomass, carbon, vegetation height and other metrics from points sampled on the landscape. However, when converting points to pixels, there are many processing choices that can impact the uncertainty of derived raster datasets. Incomplete understanding of the uncertainty in derived products, in turn, impacts downstream analytical and model results and can lead to erroneous conclusions drawn from the data. This lunchtime brown-bag workshop will explore how different gridding methods and associated settings can impact rasters derived from sample points. We will use a LiDAR point cloud, which represents canopy height values, to create several raster grids using different point-to-pixel conversion methods. We will then quantify and assess differences in height values derived using these different methods.
Participants will leave the workshop with a better understanding of various point-to-pixel conversion methods (interpolators and other gridding methods), how to interpret the resulting pixel values, how to perform basic raster math, and some of the key questions we should ask ourselves before creating a seamless grid from a point-based dataset. ArcGIS will be the primary demonstration tool used in this workshop however all concepts can be applied using any program with gridding capabilities.
Things to Do Before the WorkshopThis is a demonstration based brown-bag and thus nothing is required to attend this workshop! However, the data used in the workshop are available to download for further exploration.
Download The DataData to Be posted soon...
Useful Background Materials:
- Leah Wasser @leahawasser, Supervising Scientist, NEON, Inc
Workshop Fearless Instruction Assistants
- Natalie Robinson, Staff Scientist, NEON, Inc
- Claire Lunch @dr_lunch, Staff Scientist
- Kate Thibault @fluby, Senior Staff Scientist
Please tweet using the hashtag: #WorkWithData” during this workshop!
Also you can tweet at <a href=”https://twitter.com/neon_sci target=”_blank”> @NEON_Sci</a>!
Please note that we are still developing the agenda for this workshop. The schedule below is subject to change.
|11:30||Spatial Gridding vs. Interpolation|
|1:00||——- Wrap Up! ——-|
NOTE: we will not cover geostatistical methods in this brown bag!
About the Data
The LiDAR data that we will use for this brown-bag were collected over the NEON San Joaquin field site located in California. The data are natively in a point cloud format with X,Y and Z (elevation) values associated with each point. In this case, most points above the group represent vegetation.
The data as gridded, create a Digital Surface Model (DSM) which represents the top of the canopy (or buildings and other objects) elevation values.
- More about Lidar Data.
- More about Digital Surface Models, Canopy Height Models and Digital Terrain Models here.
A Short Video - How LiDAR Works
##Workshop Approach In this brown-bag we will work with a dessimated (thinned) version of the lidar point clouds to see how interpolation impacts a lidar dataset gridded at its native resolution with all of the available points present compared to a thinned dataset gridding using various interpolation methods.
Creating Surfaces From Points
Triangulated Irregular Network (TIN)
The Triangulated Irregular Network (TIN) is a vector based surface where sample points (nodes) are connected by a series of edges creating a triangulated surface. The TIN format remains the most true to the point distribution, density and spacing of a dataset. It also may yield the largest file size!
More on the TIN Format
A raster is a dataset made up of cells or pixels. Each pixel represents a value associated with a region on the earth’s surface. We can create a raster from points through a process sometimes called gridding. Gridding is the process of taking a set of points and using them to create a surface composed of a regular grid.
When creating a raster, you may chose to perform a direct gridding of the data.
This means that you calculate one value for every cell in the raster where there
are sample points. This value may be a mean of all points, a max, min or some other
mathematical function. All other cells will then have
no data values associated with
them. This means you may have gaps in your data if the point spacing is not well
distributed with atleast one data point within the spatial coverage of each raster
Spatial interpolation involves calculating the value for a query point (or a raster cell) with an unknown value from a set of known sample point values that are distributed across an area. There is a general assumption that points closer to the query point, are more strongly related to that cell than those further away. However this general assumption is applied differently across different interpolation functions.
Deterministic vs. Probabilistic Interpolators
There are two main types of interpolation approaches:
- Deterministic: create surfaces from measured points using a weighted distance or area function.
- Probabilistic (Geostatistical): utilize the statistical properties of the measured points. Probabilistic techniques quantify the spatial auto-correlation among measured points and account for the spatial configuration of the sample points around the prediction location.
We will focus on deterministic methods in this workshop.
Deterministic Interpolation Methods
Let’s look at a few different deterministic interpolation methods to understand how different methods can affect an output raster.
Inverse Distance Weighted (IDW)
Inverse distance weighted interpolation (IDW) calculates the values of a query point (a cell with an unknown value) using a linearly weighted combination of values from nearby points.
Key Attributes of IDW Interpolation:
- Raster is derived based upon an assumed linear relationship between the location of interest and the distance to surrounding sample points.
- Sample points closest to the cell of interest are assumed to be more related to its value than those further away.
- Exact - Can not interpolate beyond the min/max range of data point values.
- Can only estimate within the range of EXISTING sample point values - this can yield “flattened” peaks and valleys” especially if the data didn’t capture those high and low points.
- Point average values
- Good for data that are equally distributed and dense. Assumes a consistent trend / relationship between points and does not accommodate trends within the data (e.g. east to west, etc).
There power setting in IDW interpolation specifies how strongly points further away from the cell value of interest impact the calculated value for that call. Power values range from 0-3+ with a default settings generally being 2. A larger power value produces a more localized result - values further away from the cell have less impact on it’s calculated value, values closer to the cell impact it’s value more. A smaller power value produces a more averaged result where sample points further away from the cell have a greater impact on the cell’s calculated value.
The impacts of power
Lower Power Values more averaged result, potential for a smoother surface. As Power decreases, the influence of sample points is larger. This yields a smoother surface that is more averaged.
Higher Power Values: more localized result, potential for more peaks and around sample point locations. As Power increases, the influence of sample points falls off more rapidly with distance. The output cell values become more localized and less averaged.
IDW Summary Take Home Points
**GOOD FOR: **
- Data whose distribution is strongly (and linearly) correlated with distance. For example, noise falls off very predictably with distance.
- Provides explicit control over the influence of distance; (compared to Spline or Kriging).
NOT AS GOOD FOR:
- Not as good for data whose distribution depends on more complex sets of variables because it can account only for the effects of distance.
- You can create a smoother surface by decreasing the power, increasing the number of sample points used, or increasing the search radius.
- Create a surface that more closely represents the peaks and dips of your sample points by decreasing the number of sample points used / decreasing the search radius, increasing the power.
- Increase IDW surface accuracy by adding breaklines to the interpolation process that serve as barriers. Breaklines represent abrupt changes in elevation, such as cliffs.
Spline interpolation fits a curved surface through the sample points of your dataset. Imagine stretching a rubber sheet across your points and glueing it to each sample point along the way. Unlike IDW, spline can estimate values above and below the min and max values of your sample points. Thus it is good for estimating high and low values not already represented in your data.
Regularized vs Tension Spline
There are two types of curved surfaces that can be fit when using spline interpolation:
Tension spline: a flatter surface that forces estimated values to stay closer to the known sample points.
Regularized spline: a more elastic surface that is more likely to estimate above and below the known sample points.
SPLINE IS GOOD FOR:
- Estimating values outside of the range of sample input data.
- Creating a smooth continuous surface.
NOT AS GOOD FOR:
- Points that are close together and have large value differences. Slope calculations can yield over and underestimation.
- Data with large, sudden changes in values that need to be represented (eg fault lines, extreme vertical topographic changes, etc). IDW allows for breaklines. NOTE: some tools like ARCGIS have introduces a spline with barriers function in recent years.
More on Spline
Natural Neighbor Interpolation
Natural neighbor interpolation finds the closest subset of data points, to the query point of interest. It then applies weights to those points, to calculate an average estimated value based upon their proportionate areas derived from their corresponding voronoi polygons. The natural neighbor interpolator adapts locally to the input data, using points surrounding the query point of interest. Thus there are no radius, number of points or other settings needed when using this approach.
This interpolation method works equally well on regular vs irregularly spaced data.
FROM THE ESRI HELP: > By comparison, a distance-based interpolator tool such as IDW (inverse distance weighted) would assign similar weights to the northernmost point and to the northeastern point based on their similar distance from the interpolation point. Natural neighbor interpolation, however, assigns weights of 19.12 percent and 0.38 percent, respectively, which is based on the percentage of overlap. >
Notes about Natural Neighbor
- Local Interpolator
- Interpolated values fall within the range of values of the sample data
- Surface passes through input samples
- Supports breaklines
Natural Neighbor is good for:
- Data who’s spatial distribution is variable (AND data that are equally distributed).
- Categorical data
- Providing a smoother output raster.
Natural Neighbor is not as good for:
- Data where the interpolation needs to be spatially constrained (to a particular number of points of distance).
- Data where sample points further away from or beyond the immediate “neighbor points” need to be considered in the estimation.
Please find some additional resources associated with gridding in
GRASS GIS and
- Interpolation plugin
- Rasterize (Vector to Raster)
The QGIS processing toolbox provides easy access to Grass commands.
v.surf.idw- Surface interpolation from vector point data by Inverse Distance Squared Weighting
v.surf.bspline- Bicubic or bilinear spline interpolation with Tykhonov regularization
v.surf.rst- Spatial approximation and topographic analysis using regularized spline with tension
v.to.rast.value- Converts (rasterize) a vector layer into a raster layer
v.delaunay- Creates a Delaunay triangulation from an input vector map containing points or centroids
v.voronoi- Creates a Voronoi diagram from an input vector layer containing points
The above commands are just a few that support grinding in GRASS. You can also access GRASS via command line.
The commands below can be used to create grids in R.