Fundamentals of GeoAI: Deep Learning for Geospatial Analysis

Name: Fundamentals of GeoAI: Deep Learning for Geospatial Analysis
Rating: 4.5 (29 reviews)

Build U-Net models in PyTorch for satellite imagery, crop mapping, change detection, building segmentation and LiDAR

Created byMilan Janosov

Last updated 6/2026

English

What you'll learn

Build and train U-Net deep learning models in PyTorch for pixel-wise segmentation of satellite and aerial imagery from scratch.
Apply GeoAI to real-world use cases: crop mapping, temporal change detection, building segmentation, and LiDAR urban analysis.
Download and preprocess real satellite data from AWS Sentinel-2 and government LiDAR sources for deep learning pipelines.
Evaluate spatial deep learning models correctly using geographic train/test splits to prevent data leakage and ensure real-world generalization.
Create interactive geospatial maps with Folium to visualize and compare deep learning predictions across crop fields, buildings, and urban areas.

Course content

7 sections • 34 lectures • 3h 25m total length

Introduction0:31
Welcome to Fundamentals of GeoAI: Deep Learning for Geospatial Analysis.
This course is designed for anyone who wants to move beyond traditional geospatial analysis and into the world of deep learning applied to real spatial data. Whether you work in GIS, remote sensing, data science, or environmental analysis, GeoAI is rapidly becoming one of the most in-demand skill sets in the field — and this course gives you the practical foundation to work with it confidently.
We cover five real-world use cases from start to finish: crop mapping from Sentinel-2 satellite imagery, temporal change detection using a Siamese U-Net, building segmentation from ultra-high-resolution aerial imagery, and multi-class urban segmentation from LiDAR elevation data. Every module uses real, freely available datasets — from AWS, Dutch and Scottish government portals — and every model is built from scratch in PyTorch.
No prior deep learning experience is required. Module 1 builds the complete foundation — from the basics of neural network architecture to a fully functional U-Net — before we apply it to increasingly complex geospatial problems across the remaining modules.
By the end of this course you will not just understand GeoAI conceptually — you will have built it, trained it, and applied it to data that matters.
Let's get started.
Setting Up3:42
Before starting the actual learning materials, here we are setting up a brand new Python environment while using Anaconda Navigator, and a series of command-line prompts to set up all the libraries needed to fully enjoy this course.
Course Data Sources0:17
All datasets used throughout this course are available for download below. For now, simply download and unzip the archive into your working directory — no further action is needed at this stage.
We will explore each data file in detail during its corresponding module, where you will also find step-by-step download instructions. The version provided here is identical to what is used in the recordings, ensuring full reproducibility of all results.
Course Source Code0:13
All Jupyter notebooks used throughout this course are available for download below. Simply download and unzip the archive into your working directory.
Each notebook corresponds to a specific module and is identical to the version used in the recordings, so you can follow along step by step or revisit any section at your own pace.

Module Introduction0:15
This module builds the foundation you need to work with GeoAI in practice. We start from first principles — how spatial data becomes tensors, how neural networks process information through weighted connections and activation functions, and what convolution actually means mathematically and visually.
From there, we move step by step from a single convolution operation to a full encoder-decoder architecture, arriving at U-Net — the backbone model used throughout this entire course. Every component is built hands-on in PyTorch: single filters, multiple filters, encoder blocks, decoder blocks, and finally the complete U-Net assembled from scratch.
By the end of this module, you will have trained your first U-Net model, evaluated its ability to generalize to unseen data, and previewed the four real-world use cases we will tackle in the modules ahead: crop mapping, temporal change detection, building segmentation, and LiDAR urban analysis.
No prior deep learning or PyTorch experience is required — everything is built from the ground up.
Foundations20:09
In this section, we introduce the core concepts behind GeoAI and neural networks. You will learn how spatial data becomes tensors, how neural networks process numerical inputs through weighted connections and activation functions, and what convolution actually is.
From Convolution to U-Net11:18
We move from the theory of basic convolution operations to full encoder-decoder architectures. By combining convolution, pooling, and upsampling, we arrive at U-Net - a model designed for pixel-wise segmentation of spatial data.
Neural Networks with PyTorch24:54
In this lecture, we progressively construct a complete convolutional neural network architecture — starting from first principles and culminating in a fully functional U-Net model for image segmentation.
We begin with the fundamentals of convolution by manually implementing a single filter in PyTorch and applying it to a small test image to understand how edge detection emerges from basic operations. We then extend this intuition to multiple filters, exploring how convolutional layers extract richer feature representations.
Building on this foundation, we develop an encoder block by combining convolution, activation, and pooling operations into a reusable module. We then stack multiple encoder blocks to create a deeper feature extractor using PyTorch’s Sequential class.
Next, we shift focus to the decoder, introducing transposed convolutions and implementing upsampling to reconstruct spatial resolution from compressed feature maps. With both encoder and decoder components in place, we assemble the complete U-Net architecture for image segmentation.
Finally, we implement the full training pipeline, train the model across multiple epochs while monitoring loss, and evaluate its ability to generalize by testing it on previously unseen images.
By the end of this lecture, you will have built, trained, and validated a complete U-Net model from scratch — gaining both architectural intuition and practical PyTorch implementation skills.
Use-case summaries3:06

Module Introduction0:21
This module takes you through a complete crop mapping workflow using real Sentinel-2 satellite imagery over an agricultural region in Hungary. We work with two time points — April and July 2025 — capturing how crops evolve through the growing season, and use that temporal difference as the foundation for classification.
We explore three approaches of increasing sophistication. First, simple color thresholding using hand-crafted spectral rules. Second, SLIC superpixel segmentation that groups pixels into field-like parcels, producing significantly cleaner boundaries than pixel-level classification. Third, a full U-Net deep learning model trained on 4-channel input combining RGB and NDVI, with a proper spatial train/test split to ensure honest evaluation on unseen geographic areas.
Satellite data is downloaded directly from the free AWS Earth Search STAC catalog — no accounts or manual downloads required. By the end of this module, you will have built an end-to-end deep learning pipeline for crop detection and compared all three methods side by side on an interactive Folium map.
Define Study Area3:40
As a target site for this Section, we select an agricultural region in Hungary with diverse crop fields as our study area. Using latitude/longitude coordinates, we define a bounding box that encompasses several square kilometers of farmland for analysis.
Then, we create an interactive Folium map with satellite imagery as the basemap and add a red bounding box as an overlay. This lets you explore the study area visually and verify we've selected a region with clear agricultural patterns.
Download and Prepare Sentinel-2 Data8:40
In this lecture, we prepare multi-temporal satellite imagery to analyze crop development across different stages of the growing season.
We begin by configuring two agricultural time periods — April 2025 (early growing season) and July 2025 (mid growing season). These two time points allow us to compare how crops evolve over time and capture different spectral signatures that are useful for classification.
Next, we connect to the AWS Earth Search STAC API to query cloud-free Sentinel-2 scenes over our study area. This open catalog provides direct access to large volumes of satellite imagery without requiring manual downloads or account setup.
We then download the Red, Green, and Blue spectral bands from the selected scenes, crop them to our bounding box, and reproject them into a common coordinate system. Since Sentinel-2 stores each band separately, we stack them into standard RGB images suitable for visualization and further analysis.
After that, we normalize the raw Sentinel-2 pixel values (0–10,000) to physical reflectance (0–1) by dividing by 10,000. This preprocessing step ensures that pixel values represent actual surface reflectance, providing stable and physically meaningful inputs for machine learning models.
Finally, we create side-by-side visualizations comparing April and July imagery with contrast enhancement. You will clearly observe the tonal differences of crop fields at different maturity stages — for example, greener crops in April transitioning toward yellow by July. This temporal variation is precisely what makes crop mapping and classification possible.
Color Thresholding6:02
In this lecture, we move from data preparation to simple, interpretable classification using spectral information derived from satellite imagery.
We begin by extracting the individual RGB channels and computing a vegetation index proxy by comparing green versus red reflectance. In addition, we calculate overall brightness to help distinguish bright bare soil from darker surfaces such as water or shadows.
Next, we implement a set of hand-crafted classification rules. A high green-to-red ratio indicates likely crop areas, high brightness without strong vegetation signals suggests bare soil, and very dark pixels are classified as water or infrastructure. These thresholds are manually tuned through trial and error to achieve reasonable performance on the scene.
Finally, we visualize the results using a three-panel comparison: the original RGB image, the classified map with color-coded classes, and an overlay combining both. This allows us to clearly see where the rule-based approach performs well — such as in homogeneous crop fields — and where it struggles, particularly around edges, shadows, and mixed pixels.
Simple Linear Iterative Clustering (SLIC) Superpixel Segmentation8:18
In this lecture, we move beyond pixel-based classification and introduce spatial segmentation to improve crop mapping results.
We begin by applying the SLIC (Simple Linear Iterative Clustering) algorithm with 150 target segments. This groups similar neighboring pixels into superpixels that approximate real-world field boundaries, reducing noise and capturing spatial structure.
Next, we visualize the segmentation results by overlaying the detected superpixel clusters on the processed April snapshot. This helps us assess how well the algorithm aligns with visible agricultural parcels.
For future use in the next module, we export the detected cluster boundaries as a GeoJSON vector dataset, converting the segmentation output into reusable field parcels.
We then extract parcel-level features such as mean NDVI proxy, mean RGB values, and parcel size. Using the same classification rules introduced earlier — but now applied at the parcel level — we classify each segment as cropland or non-cropland. This approach produces significantly cleaner results, particularly when distinguishing green and yellow crop fields, because entire fields are classified instead of individual noisy pixels.
Finally, we visualize the classified parcels, colored by NDVI and overlaid on the RGB imagery. Comparing these results to pixel-based thresholding clearly demonstrates the advantage of spatial segmentation in producing coherent field boundaries and more robust classifications.
U-Net Deep Learning17:39
In this lecture, we transition from rule-based classification to a fully data-driven deep learning approach using a U-Net architecture in PyTorch.
We begin by preparing a 4-channel input tensor, stacking the RGB bands together with normalized NDVI in PyTorch’s (C, H, W) format. This multi-channel representation provides richer spectral information than RGB alone, improving the model’s ability to distinguish crops from non-crops.
Next, we generate training labels using color thresholding, automatically creating crop masks to serve as synthetic ground truth. To ensure proper evaluation, we apply spatial splitting by using only the left half of the dataset for training, reserving unseen regions for later testing. Within the training half, 80% is used for training and 20% for validation during each epoch.
We then implement a custom PyTorch Dataset class that extracts random 128×128 pixel patches using balanced sampling. This patch-based approach improves computational efficiency and follows standard practices in image segmentation workflows.
With the data pipeline in place, we define a simplified U-Net architecture consisting of an encoder (feature extraction through downsampling), a bottleneck (compressed representation), and a decoder (upsampling to restore spatial resolution). The network accepts a 4-channel input and outputs a single-channel crop probability for each pixel.
We train the model for 15 epochs using Binary Cross-Entropy loss and the Adam optimizer, monitoring both training and validation performance. In addition to loss, we track Intersection over Union (IoU) to quantify how well predicted crop regions overlap with ground truth labels.
Finally, we apply the trained U-Net to the full image — not just the training patches — generating a complete crop probability map. By thresholding at 0.5, we produce a binary classification mask and compare it to the original labels to evaluate how well the model generalizes beyond the data it was explicitly trained on.
Compare All Methods4:20
We create an interactive Folium map with layer controls that let you toggle between all three crop detection methods. You can show/hide color thresholding, SLIC parcels, and U-Net predictions to visually compare where methods agree and differ at any zoom level.

Module Introduction0:21
This module tackles one of the most powerful applications of satellite imagery — detecting how landscapes change over time. Building directly on the Sentinel-2 data and field boundaries from Module 2, we analyze seasonal transformation across Hungarian agricultural fields between April and July 2025.
We work through three levels of analysis. We start with NDVI-based change detection as a transparent baseline, computing vegetation difference maps that reveal which fields were planted, harvested, or remained stable. We then move to field-level analysis, aggregating pixel statistics across the SLIC parcel boundaries from Module 2 to produce actionable, polygon-based change maps that agricultural managers can actually use.
Finally, we build and train a Siamese U-Net — a specialized architecture designed specifically for change detection that processes two images simultaneously and learns to identify meaningful differences between them. We apply the trained model across the full scene using sliding window prediction, producing a raster-wide change probability map and binary mask.
By the end of this module, you will understand the full spectrum from simple index-based change detection to deep learning approaches, and know when each method is appropriate for real-world applications.
NDVI Change Detection (Baseline)6:19
In this lecture, we analyze seasonal crop dynamics by comparing multi-temporal satellite imagery.
We begin by loading the processed Sentinel-2 images from Module 2 — April 2025 (early growing season) and July 2025 (mid growing season). Both images cover the same agricultural region in Hungary, allowing us to detect seasonal changes within identical field boundaries.
Next, we display the two dates side-by-side to visually observe the transformation from fresh green vegetation in April to more mature, harvest-ready fields in July. This visual inspection clarifies the type of change we aim to detect automatically.
We then compute NDVI for both dates using our RGB-based proxy (green–red difference). This produces vegetation index maps where higher values indicate dense vegetation and lower values correspond to bare soil or water.
To quantify change, we subtract April NDVI from July NDVI, creating a difference map. Positive values indicate vegetation gain (e.g., newly planted or growing crops), while negative values indicate vegetation loss (e.g., harvested fields). We apply a ±0.15 threshold to isolate significant changes and suppress minor fluctuations and noise.
Finally, we generate a comprehensive visualization showing NDVI for both dates, a continuous diverging change map (red = vegetation loss, blue = vegetation gain), and discrete change categories. This allows us to clearly identify which fields experienced planting or harvesting between the two time periods.
Field-Level Change Analysis5:09
In this lecture, we shift from pixel-level analysis to field-level change detection using vector boundaries and zonal statistics.
We begin by loading the SLIC-derived field parcel boundaries exported from Module 2 as a GeoJSON file. These polygons approximate real agricultural field boundaries and serve as meaningful analysis units instead of treating individual pixels independently.
Next, we compute zonal statistics for each field parcel. Using raster masking, we calculate the mean NDVI in April, the mean NDVI in July, and the difference between them. This process aggregates millions of pixels into hundreds of structured, field-level statistics, including parcel area in hectares.
Based on the magnitude and direction of NDVI change, we classify each field as “vegetation increase,” “vegetation decrease,” or “stable.” This transforms raw spectral differences into actionable information about which fields were likely planted, harvested, or unchanged between the two dates.
Finally, we visualize the results using an interactive Folium map, allowing dynamic exploration of field-level seasonal changes across the study area.
Siamese U-Net for Deep Learning Change Detection12:20
In this lecture, we move from rule-based and index-based change detection toward a deep learning approach using a Siamese U-Net architecture.
We begin by preparing the training data, creating paired image patches from the April and July snapshots along with their corresponding change labels generated in Section 3.1. These paired inputs allow the model to learn differences between two temporal observations of the same location.
Next, we define the Siamese U-Net architecture. This model processes the two satellite images through parallel encoder branches that share weights, enabling the network to extract comparable feature representations before combining them to detect change patterns.
We then train the Siamese U-Net to identify meaningful differences between the two raster snapshots. During training, the model learns to distinguish real vegetation changes from noise or illumination differences.
Finally, we apply the trained model to the full image using a sliding window prediction approach. This produces a raster-wide change probability map and a corresponding binary mask, completing the transition from handcrafted rules to learned temporal change detection.

Module Introduction0:22
This module shifts from satellite to aerial imagery, working with ultra-high-resolution 7.5cm data from a Dutch government geoportal over Amsterdam — roughly 100 times more detailed than Sentinel-2. At this resolution, individual buildings, cars, and rooftop features are clearly visible, opening up a completely different class of urban analysis problems.
A key challenge in deep learning is obtaining training labels. Here we solve it elegantly without any manual annotation — using Canny edge detection combined with size and shape filtering to automatically generate building masks directly from the imagery itself. This practical technique is highly transferable to other datasets and contexts.
We then build a full training pipeline with a proper spatial split — left half for training, right half for validation — ensuring the model is evaluated on a geographically distinct area it has never seen. The U-Net is trained for 20 epochs with data augmentation, learning rate scheduling, and IoU-based model selection.
Finally, we run sliding window inference across the full 6250×6250 pixel image, averaging overlapping predictions for smooth results, and visualize everything on an interactive Folium map with toggleable layers comparing ground truth against U-Net predictions across Amsterdam.
RGB Aerial Image Data from Amsterdam7:14
In this lecture, we introduce ultra-high-resolution aerial imagery as the foundation for detailed building extraction.
We begin by obtaining 7.5 cm resolution imagery from Beeldmateriaal.nl, a Dutch government geoportal. This dataset is approximately 100× more detailed than Sentinel-2 imagery, allowing us to clearly observe individual buildings, cars, rooftop structures, streets, and vegetation patterns.
Next, we load the GeoTIFF file using rasterio and visualize dense neighborhoods in Amsterdam, where building footprints and urban structures are clearly distinguishable.
To improve computational efficiency, we crop the imagery to the lower-right quadrant of the study area. While reducing the spatial extent, we preserve georeferencing by maintaining the transform metadata, ensuring that all building polygons retain accurate geographic coordinates.
We then examine the image metadata, including pixel dimensions, ground resolution, geographic coverage, and coordinate reference system. Understanding these spatial properties is essential before performing any further analysis.
Finally, we apply contrast enhancement using 2–98 percentile stretching. This improves visual clarity by enhancing subtle differences in brightness, making buildings, roads, and vegetation more distinct — a crucial preprocessing step for subsequent edge detection and segmentation tasks.
Create Training Labels5:18
In this lecture, we implement an automated building extraction workflow using classical computer vision techniques.
We begin by applying Canny edge detection to grayscale aerial imagery to identify sharp structural boundaries. The process includes Gaussian blurring to reduce noise, edge detection to highlight building contours, and morphological operations to close small gaps in detected edges. This allows us to generate building masks without requiring manual annotation.
Next, we apply size and shape filtering to remove false positives. By restricting detected objects to realistic building sizes and aspect ratios, we eliminate smaller artifacts such as cars, trees, and elongated structures like streets.
Finally, we visualize the results through a three-panel comparison: the original RGB image, the binary building mask, and an overlay highlighting the detected building outlines produced by the Canny-based workflow. This enables us to assess both the strengths and limitations of classical edge-based building detection.
Prepare Training Data with Spatial Split8:07
In this lecture, we prepare the dataset for deep learning by implementing a spatially robust training and validation pipeline.
We begin by performing a vertical spatial split of the image — using the left half for training and the right half for validation. This prevents data leakage and ensures the model is evaluated on a completely different geographic area rather than on neighboring pixels.
Next, we extract 256×256 pixel patches with a 128-pixel stride (50% overlap) from each region separately. To avoid training on empty scenes, we retain only patches containing more than 5% building coverage, ensuring the dataset focuses on meaningful examples.
We then implement custom PyTorch BuildingDataset classes. The training dataset includes random flip augmentation to artificially expand the data and improve generalization, while the validation dataset remains unaltered to provide an unbiased performance estimate.
Finally, we create DataLoaders with a batch size of 16 and visualize sample patches with red building overlays. This confirms that the extracted patches contain diverse building densities across both training and validation regions, completing the dataset preparation stage for model training.
Train U-Net Model5:10
In this lecture, we define and train a convolutional neural network to detect buildings in aerial imagery.
We begin by implementing a simplified U-Net architecture (similar to the one developed earlier), using a 3-channel RGB input and an encoder–decoder structure. The model contains approximately 100,000 parameters and produces pixel-wise building probability masks.
Next, we define the core training components, including the loss function and optimizer required to guide model learning.
To evaluate performance, we implement the Intersection over Union (IoU) metric, which measures the overlap between predicted building masks and ground truth labels. IoU provides a robust and interpretable measure of segmentation quality.
We then build the full training loop, running for 20 epochs with separate training and validation phases. During training, we track both loss and IoU, saving the best-performing model based on validation IoU to ensure optimal generalization.
Finally, we visualize the training progress by examining the evolution of loss and IoU across epochs, allowing us to assess convergence, stability, and overall model performance.
Test on Full Image7:07
In this lecture, we apply the trained model to the full aerial image and evaluate its performance across both seen and unseen regions.
We begin by loading the saved model that achieved the best validation IoU and preparing it for inference on the complete image, covering both the training and previously unseen areas.
To handle the large 6250×6250 raster efficiently, we implement a sliding window approach using overlapping 256×256 patches with a 128-pixel stride. Predictions from overlapping tiles are averaged to generate smooth and consistent probability maps across the entire scene.
Next, we calculate IoU separately for the training (left-hand side) and unseen (right-hand side) regions. This comparison allows us to quantify generalization performance and confirm that the model is not simply memorizing spatial patterns from the training area.
Finally, we create an interactive Folium map with toggleable layers, including the input aerial imagery, red ground truth masks derived from edge detection, and blue U-Net building predictions. This enables detailed spatial exploration of segmentation results across Amsterdam.

Module Introduction0:30
This final module introduces a fundamentally different data type — LiDAR elevation data — and the most complex segmentation task in the course: distinguishing buildings, trees, and ground simultaneously across a dense urban area in Edinburgh.
We work with freely available Scottish Government LiDAR tiles, merging DSM and DTM products to derive normalized surface heights (nDSM) that isolate objects above ground. A key insight here is that height alone is insufficient to separate buildings from trees — both can reach similar elevations. We solve this through feature engineering, calculating surface texture as local height variance, which cleanly separates smooth building rooftops from rough tree canopies.
Training labels are created through a combination of interactive polygon annotation and physics-based height thresholds — a practical hybrid approach that brings human spatial reasoning together with elevation verification. The full U-Net in this module is the most advanced architecture in the course, incorporating real skip connections, batch normalization, class-weighted loss, and an ignore index for unlabeled pixels.
We train with early stopping, evaluate per-class IoU separately for ground, buildings, and trees on a geographically unseen test region, and close with a 4-layer interactive Folium map over Esri satellite imagery — the most comprehensive visualization in the entire course.
Load and Prepare LiDAR Data6:37
In this lecture, we prepare LiDAR-derived elevation data for multi-class urban segmentation analysis.
We begin by defining a dense urban study area in Edinburgh using bounding box coordinates. The selected region contains buildings, trees, and open ground, providing a suitable environment for height-based classification.
Scottish LiDAR data is distributed in 1 km tiles. We load two adjacent tiles for both the Digital Surface Model (DSM), which includes buildings and vegetation, and the Digital Terrain Model (DTM), which represents bare earth. These tiles are then merged into seamless mosaics covering the entire study area.
Next, we compute the normalized Digital Surface Model (nDSM) by subtracting the DTM from the DSM (nDSM = DSM − DTM). This removes terrain elevation and isolates object heights above ground level, such as buildings, trees, and other structures. The nDSM becomes our primary input channel for segmentation.
To focus the analysis, we clip the merged mosaic to the defined Edinburgh study area. The clipped dataset is saved as a georeferenced GeoTIFF, reducing file size while preserving spatial accuracy.
We then visualize the DSM, DTM, and nDSM side-by-side. While DSM shows total elevation and DTM displays terrain only, the nDSM clearly highlights buildings and trees as elevated structures above ground.
Finally, we generate a hillshade visualization to simulate sunlight and enhance 3D perception. Buildings appear as sharp rectangular blocks, while trees display more organic, irregular shapes — confirming the quality and interpretability of the elevation data.
Engineer Features1:59
In this lecture, we enhance our elevation-based representation by incorporating surface texture to better distinguish buildings from vegetation.
Height alone is often insufficient for separating buildings and trees, as both can range between 10–20 meters. To address this, we compute a texture measure by calculating the local standard deviation of height values within a defined pixel window. Buildings typically exhibit smoother surfaces with lower texture variation, while trees display irregular canopy structures with higher local variability.
Next, we visualize the two input channels side-by-side: normalized height (nDSM) and the derived texture layer. This comparison highlights how texture complements height by emphasizing structural differences between man-made and natural objects.
Finally, we stack height and texture into a 2-channel array in (C, H, W) format, preparing the data as input for a U-Net model. This multi-channel representation captures both elevation and surface characteristics, forming a richer feature space for multi-class segmentation.
Create Training Labels8:53
In this lecture, we create high-quality ground truth labels by combining manual annotation with height-based validation.
We begin by loading the nDSM and visualizing the full study area alongside height and texture layers. This provides the spatial context needed to prepare accurate annotations for buildings and trees.
Next, we demonstrate an interactive polygon labeling workflow, where users outline buildings (red) and trees (green) by clicking vertices. For recording efficiency, we use pre-made labels that were previously created for the Edinburgh study area.
To improve label reliability, we refine the manually drawn polygons using physics-based height thresholds. Buildings, trees, and ground pixels are validated against expected elevation ranges, while all pixels outside the polygons are assigned an unlabeled value (255) and excluded from training. This approach combines human spatial interpretation with objective elevation verification.
We then visualize the cleaned labels alongside height data, clearly showing ground (brown), buildings (red), trees (green), and unlabeled areas (black) that will be ignored during model training.
Finally, we save the cleaned, physics-verified labels as a GeoTIFF file, producing a training-ready dataset with three classes and masked unlabeled regions for use in segmentation loss calculation.
Prepare Training Data with Spatial Split7:16
In this lecture, we prepare the finalized dataset for multi-class segmentation using height and texture inputs.
We begin by loading and visualizing the cleaned ground truth labels. The dataset contains three clearly separated classes — buildings, trees, and ground — derived from physics-based height rules, along with unlabeled areas shown in black that will be excluded from training.
To prevent data leakage, we perform a vertical spatial split, assigning the left half of the study area for training and the right half for testing. This split is applied consistently to height, texture, and label layers, ensuring that model evaluation occurs on a completely different geographic region.
We visualize the split to confirm that both training (left, blue) and test (right, red) regions contain all three classes. This guarantees that the model is evaluated on familiar object types but in unseen spatial contexts.
Next, we extract 128×128 pixel patches from each region using balanced sampling to ensure adequate representation of all classes. Each patch is then normalized independently to zero mean and unit variance, promoting stable training regardless of absolute elevation differences across the scene.
We visualize sample patches, displaying height on the left and texture with label overlays on the right. This confirms that the training data captures diverse building densities and class distributions.
Finally, we wrap the patches into custom PyTorch datasets, stacking height and texture as 2-channel tensors and converting labels to LongTensor format for use with CrossEntropyLoss. Training patches are drawn exclusively from the left region, while testing patches come strictly from the right — ensuring complete spatial separation between training and evaluation data.
Train U-Net Model5:43
In this lecture, we build and train a production-level U-Net model for multi-class segmentation.
We begin by defining an enhanced U-Net architecture with three key upgrades: proper skip connections that concatenate encoder features into the decoder, BatchNorm2d layers within each convolutional block for training stability, and a three-class output layer for ground, buildings, and trees.
Next, we configure the training setup. We use CrossEntropyLoss with class weights to compensate for class imbalance between ground, building, and tree pixels, and set ignore_index=255 to exclude unlabeled pixels from loss calculation. The model is optimized using Adam with learning rate scheduling to improve convergence.
To evaluate performance, we implement per-class Intersection over Union (IoU), measuring segmentation quality separately for each class while ignoring unlabeled pixels. This ensures fair and interpretable multi-class evaluation.
We then build the full training loop with early stopping. Each epoch trains exclusively on left-half patches and validates on right-half unseen patches, allowing us to monitor spatial generalization. Loss and accuracy metrics are tracked throughout.
Finally, we visualize training progress through loss curves (training vs. validation) and accuracy trends, and conclude with a summarized prediction report that captures overall segmentation performance across classes and regions.
Evaluate and Visualize Results3:36
In this lecture, we evaluate the trained multi-class U-Net model and apply it to the full study area.
We begin by loading the best-performing saved model and running inference on all test patches extracted from the right-hand (unseen) region. The model outputs three-channel class probabilities, which we convert to discrete class predictions using the argmax operation.
Next, we compute Intersection over Union (IoU) separately for each class — ground, buildings, and trees — on the unseen test region. This per-class evaluation demonstrates the model’s ability to generalize multi-class segmentation to completely new geographic areas.
We then perform full-image inference using a sliding window approach across both the training (left) and test (right) halves. Using a 128-pixel stride with 50% overlap, we stitch predictions together into seamless coverage suitable for visualization and spatial analysis.
Finally, we create an interactive Folium map over Esri satellite imagery with four toggleable layers: height (nDSM), texture (surface roughness), ground truth labels (brown/red/green), and U-Net predictions in matching colors. This allows full spatial exploration of segmentation performance across the entire study area.

Course Outro0:28
Congratulations on completing Fundamentals of GeoAI: Deep Learning for Geospatial Analysis.
Over the course of five modules you have built U-Net architectures from scratch, processed real satellite and aerial imagery, worked with LiDAR elevation data, and trained deep learning models on genuine geospatial problems. More importantly, you have done it correctly — with proper spatial train/test splits, honest evaluation on unseen geographic areas, and interactive visualizations that communicate results clearly.
These are not toy examples. The workflows you have built here — crop mapping, change detection, building segmentation, urban classification — are the same workflows being used in agriculture, urban planning, environmental monitoring, and infrastructure analysis around the world.
From here, the natural next steps are to apply these techniques to your own data and your own study areas. The tools are in your hands.
Thank you for being part of this course. If you found it valuable, a review on Udemy would mean a great deal and helps others in the geospatial community find this content.
I will see you in the next one.

Requirements

Basic Python programming experience — loops, functions, and working with libraries like NumPy and Matplotlib.
Familiarity with geospatial raster data concepts (what pixels, bands, and coordinate systems are).
A computer with Anaconda installed or the ability to set up a Python environment — setup instructions are provided in the course.
No prior deep learning or PyTorch experience needed — neural network foundations are built from scratch in Module 1.

Description

Whether you work in GIS, remote sensing, environmental science, or data science, deep learning is rapidly transforming how we analyze the world from above. This course gives you the practical foundation to work with GeoAI confidently — building real models on real data, from scratch.

Across five hands-on modules, you will tackle the most important use cases in geospatial deep learning today: crop mapping from Sentinel-2 satellite imagery, temporal change detection using a Siamese U-Net, building segmentation from ultra-high-resolution aerial imagery, and multi-class urban segmentation from LiDAR elevation data.

Every dataset in this course is real and freely available. Sentinel-2 imagery is downloaded directly from the AWS Earth Search STAC catalog. Aerial imagery comes from a Dutch government geoportal at 7.5cm resolution. LiDAR tiles are sourced from the Scottish Government open data portal. No synthetic data, no toy examples.

Every model is built from scratch in PyTorch. You will implement single convolutional filters, build encoder and decoder blocks step by step, assemble complete U-Net architectures, and train them on genuine geospatial problems. The course also covers a Siamese U-Net — a specialized architecture designed specifically for change detection that processes two images simultaneously.

A key methodological focus throughout is doing things correctly. Every module uses proper spatial train/test splits to prevent data leakage, ensuring models are evaluated on geographically distinct areas they have never seen. This is how professional geospatial deep learning is done in the real world — and it is what separates this course from generic image segmentation tutorials.

By the end of this course you will have:

Built and trained U-Net models in PyTorch for pixel-wise segmentation
Processed real satellite, aerial, and LiDAR data end to end
Implemented spatial train/test splits for honest model evaluation
Created interactive Folium maps to visualize and compare model predictions
Applied deep learning to crop mapping, change detection, building extraction, and urban classification

No prior deep learning or PyTorch experience is required. Module 1 builds the complete foundation from first principles before applying it to increasingly complex geospatial problems across the remaining modules. Basic Python experience and familiarity with geospatial raster data concepts are recommended.

This is the GeoAI course built for people who want to do real work — not just understand the theory.

Who this course is for:

GIS professionals and geospatial analysts who want to move beyond traditional analysis into deep learning and AI-powered spatial workflows.
Data scientists and machine learning practitioners who want to apply their skills to satellite imagery, aerial data, and real-world geospatial problems.
Remote sensing specialists and earth observation researchers looking to modernize their workflows with PyTorch and neural network architectures.
Students and academics in geography, environmental science, or urban planning who want hands-on AI skills applicable to real spatial datasets.

Fundamentals of GeoAI: Deep Learning for Geospatial Analysis

What you'll learn

Explore related topics

Course content

Getting Started4 lectures • 5min

Module 1- Understanding Neural Networks5 lectures • 1hr

Module 2 - Crop Mapping with Deep Learning7 lectures • 49min

Module 3: Temporal Change Detection with Siamese U-Net4 lectures • 24min

Module 4 - Building Segmentation from Aerial Imagery6 lectures • 33min

Module 5 - LiDAR-Based Urban Segmentation7 lectures • 35min

Course Outro1 lecture • 1min

Requirements

Description

Who this course is for: