Extending the geographic extent of existing land cover data using active machine learning and covariate shift corrective sampling

Date and Time: 
Monday, April 4th, 2016
Center Green
Galen Maclaurin

Consistent land cover data provided at national and regional scales are increasingly relevant for a wide range of research topics from landscape ecology to population dynamics. As one example, the National Land Cover Database (NLCD) provides a valuable resource for research conducted at broad geographic scales across the U.S. where survey- or field-based land cover data are not available. However, the national extent of the NLCD (and similar databases produced in other countries) prevents studies from reaching across borders and thus limits potential applications at broader (e.g., multinational) scales. This paper presents a framework for automated spatial extrapolation of a national land cover database such as the NLCD using Landsat imagery alone. The extrapolation of high quality land cover data represents a unique opportunity to efficiently generate similar quality data for regions not originally covered. Extending the NLCD in the spatial domain based on remote sensing imagery alone manifests itself as a domain adaptation challenge know as covariate shift, where the distribution of spectral information for the target data does not follow that of the source data. To overcome this problem, the algorithm implements a novel corrective sampling technique that facilitates the spatial extrapolation of land cover data. Using the corrected sample, an active machine learning routine was implemented with a maximum entropy classifier to replicate the NLCD for a different geographic extent. This framework was tested in three study sites to assess stability under different landscape conditions and the overall generalizability of the approach. Results produced similar levels of overall agreement as the NLCD when compared against reference datasets, showing that the NLCD can effectively be extended to new geographic extents using the proposed framework.

Speaker Description: 

I am a geospatial data scientist at the National Renewable Energy Laboratory (NREL) in Golden, CO, where I work on diverse problems in renewable energy involving spatiotemporal data. My recently completed PhD research in the Department of Geography at the University of Colorado-Boulder focused on image-based machine learning for spatial and temporal replication of land cover data.

Event Category: