
Before starting geostatistical modeling, it’s important to build a solid foundation by understanding key concepts and definitions. These core ideas will guide you through every step of the modeling process, from data analysis to spatial prediction.
In this section, you will learn about:
BLUE (Best Linear Unbiased Estimator): A key principle in geostatistics used for accurate and unbiased spatial estimation.
Exploratory Data Analysis (EDA): Basic statistical techniques and visual tools to identify trends, outliers, and data patterns before modeling.
Normality and Data Transformation: Why a normal distribution is important and how to apply transformations to prepare your data.
Theory of Regionalized Variables: The backbone of geostatistics. This theory assumes that spatial variables (like groundwater levels) are made up of a structured component (trend) and a random component that varies spatially. It allows us to apply statistical methods to spatial data.
Negative Kriging Values: Why negative values sometimes appear in kriging results, what they mean, and how to deal with them when interpreting groundwater data.
To support your learning, we’ve attached:
A PDF guide containing clear definitions of each concept, and
A research article showing how these theories are applied in real-world groundwater studies
By mastering these definitions and theories, you’ll avoid common mistakes and gain the clarity needed to confidently move forward in geostatistical modeling. Think of this section as your toolkit before diving into spatial analysis.
In Lecture 2 of this section, we introduce the dataset that will be used for geostatistical modeling. This dataset consists of environmental data, focusing on groundwater level fluctuations during pre-monsoon and post-monsoon periods, which are influenced by India’s cropping and climatic patterns.
The modeling work will be carried out using data from the Bundelkhand region of India, known for its distinct agricultural and groundwater conditions.
You will also learn to generate grid point data for the study area using R programming, which is essential for spatial estimation and kriging.
To help you understand the Indian cropping patterns and the background of the dataset, it is highly recommended that you read the following research paper:
Akhtar, S. (2023). Spatial-temporal trends mapping and geostatistical modelling of groundwater level depth over northern parts of Indo-Gangetic Basin, India. Journal of Geography, Environment and Earth Science International, 27(10), 96–112.
The PDF of this research paper has also been attached for your reference.
All three types of data required for modeling are attached:
The shapefile of the study area
The input variable: groundwater fluctuation data
The grid data generated for spatial modeling
In this exciting lecture, we’ll take the first step into the world of geostatistical modeling using the powerful R programming language and its friendly interface, RStudio. Whether you're a beginner or brushing up your skills, this session will make it super easy and fun to follow along. ?
Step 1: How to Install R and RStudio
Before we start coding, let’s get the tools ready:
Download R
Go to https://cran.r-project.org
Choose your operating system (Windows, macOS, Linux)
Download and install it like any other software
Download RStudio
Go to https://posit.co/download/rstudio-desktop/
Download the free RStudio Desktop version
Install it and open — this is where you’ll write your R code
We will learn and explain step-by-step the R code used in geostatistical modeling, especially in groundwater studies. You’ll not only write code — you’ll understand what every line means.
Some key things you'll do:
Set your working folder
Load and clean your dataset
Visualize data using bubble plots
Build and plot variograms (core of spatial analysis)
Fit a variogram model and evaluate it
Perform kriging cross-validation
Create beautiful, interactive plots using ggplot2 and plotly
Build a complete Shiny App to explore your model interactively!
Why Is This Important?
You’ll gain real-world skills in spatial data analysis
You’ll see how R can turn raw data into smart decisions
You’ll be ready to handle geological, environmental, and mining datasets with confidence!
We’ll use popular and powerful libraries like:
gstat for geostatistics
sp and raster for spatial data
ggplot2 and plotly for plots
shiny for building interactive dashboards
Metrics and caret for model evaluation
By the end of this lecture, you will not only be able to write code — but also explain it clearly and apply it to real-life spatial problems.
So let’s open RStudio, load our dataset, and begin our journey into geostatistics — one line of code at a time!
Before we analyze spatial relationships using a semivariogram, we must first understand the basic nature of the data. That’s where EDA and statistical modeling help us.
To clean the data:
We need to remove missing or wrong values, fix errors, and prepare the dataset so it doesn't mislead our spatial analysis.
To understand the distribution:
We check if the data is normally distributed or skewed. This is important because kriging (used after semivariogram) assumes normal or nearly normal data.
To check for outliers:
Outliers (extreme values) can affect the variogram badly. We need to spot and handle them.
To check variability:
We look at the mean, variance, standard deviation, etc., to understand how much the values change overall.
To confirm stationarity assumption:
Semivariogram modeling assumes that the statistical properties of the data (like mean and variance) stay constant over space. We check this using statistical tools.
Semivariogram tells us how much groundwater levels change as distance between wells increases.
If two wells are close, their groundwater levels are usually similar. As distance increases, the difference also increases. After a certain distance, this difference stays constant — meaning wells are no longer related.
So, semivariogram shows how groundwater values vary with space and helps us understand how far one well can influence others.
The internal or experimental semivariogram tells us the actual variation we observe between all pairs of wells at different distances. It is based directly on the real data and gives us the raw spatial relationship before fitting any model.
Point Kriging Cross-Validation (PKCV) is a method used to check how well the fitted semivariogram model performs. It works by removing one data point (e.g., one well) at a time and predicting its groundwater level using kriging based on the remaining data. This predicted value is then compared to the actual observed value. The process is repeated for every point in the dataset. From this, we get statistical values like kriging mean error (should be close to zero), R² (goodness of fit), the ratio of kriging variance to estimation variance (ideally between 0.95 and 1.05), and the mean absolute error. These indicators help us decide whether the semivariogram model is accurate and reliable. PKCV ensures the model is not only mathematically fitted but also practically valid for making predictions in areas without data.
Shiny app in R is a tool that lets you create interactive web applications directly from your R code. It allows users to interact with your data and visualizations — without needing to know any R themselves.
With Shiny, you can build apps that:
Show interactive plots, maps, and tables
Have sliders, dropdowns, buttons for input
React to user input in real time
It’s useful when you want to:
Present your data or model in a user-friendly way
Share your analysis with others on a website
Let others explore different scenarios without writing code
Block kriging grids at unmeasured locations are necessary because they divide the study area into small blocks where no data exists. This allows us to predict average values within each block using nearby measured points. It gives a smooth, complete map of groundwater levels, even in areas without wells.
Overestimate happens when we predict more (In case of mining- ore or higher grade) than actually exists. This leads to wrong planning, extra costs, and false investment decisions.
Underestimate means we predict less (ore or lower grade) than real. This causes missed profit, poor mine planning, and underused resources.
Misclassified tonnage means we mistake ore as waste or waste as ore. This reduces recovery and increases processing cost (In case of minng)
Block kriging is used to predict average ore grade in blocks. It’s better than point estimation because it gives a full area prediction.
When groundwater levels change, it affects grade and density, so we must be careful in block kriging:
Use separate models for above and below the water table.
Include seasonal data (monsoon vs summer).
Account for groundwater flow direction (anisotropy).
Use well log and piezometer data.
Use cokriging if moisture or saturation data is available.
Perform uncertainty analysis with simulation.
Validate the model using cross-validation.
Use dynamic models if groundwater changes over time.
Divide the area into hydrogeological units (aquifers, aquitards).
This course is designed to help students, researchers, and professionals understand and apply geostatistical methods for groundwater modeling. It covers the entire workflow—from data preparation to final visualization—using simple, practical steps.
Through this course, learners will gain essential skills in data analysis, spatial modeling, and geostatistics, which are important for groundwater assessment and environmental planning. Whether you're a beginner or someone with basic GIS knowledge, this course will guide you through each stage with clear explanations and examples.
What You Will Learn:
Groundwater Data Cleaning and Preparation:
Learn how to clean, organize, and prepare groundwater datasets using R programming. You will also convert spatial coordinates to UTM format for accurate spatial analysis.
Exploratory Data Analysis (EDA):
Use plots and summary statistics in R (such as histograms, box plots, and scatter plots) to explore patterns, trends, and outliers in groundwater data.
Normality Check and Data Transformation:
Understand the importance of data normality in modeling. Learn how to test for normality and apply transformations (e.g., log or z-score) to make data suitable for analysis.
Polynomial Trend Surface Analysis:
Detect large-scale spatial trends using polynomial regression in R. Remove the trend, perform analysis on residuals, and then add the trend back for accurate final outputs.
Variogram and Semivariogram Modeling:
Learn to construct variograms and semivariograms to study spatial correlation. Use this knowledge to understand how groundwater levels vary with distance and direction.
Grid Creation and Block Kriging:
Prepare spatial grids (e.g., 1 km × 1 km) and perform block kriging in R to estimate groundwater levels and uncertainty across the study area.
Cross-Validation Techniques:
Apply Point Kriging Cross Validation (PKCV) to evaluate model accuracy and choose the most reliable semivariogram model.
Statistical Modeling and Interpretation:
Develop and apply statistical models (like regression and kriging) using R. Learn how to interpret model parameters and results, understand uncertainty, and extract meaningful insights.
Visualization and Interpretation:
Use QGIS to create professional maps showing estimated groundwater levels and kriging variance. Learn how to interpret spatial patterns for research, planning, and decision-making.
Integration of R and QGIS:
Combine the power of R for analysis and QGIS for mapping. Learn a complete, real-world geostatistical modeling workflow using both tools together.
Course Benefits:
Easy to understand: Concepts explained in simple and clear language
Practical focus: Real-world workflow followed by professionals
Comprehensive learning: Covers all major steps in geostatistical groundwater modeling
Decision-making support: Learn how to interpret maps and make informed decisions
Job-ready skills: Useful for careers in hydrogeology, environmental science, GIS, and civil engineering