Global Deforestation: Classifying Forest Loss Drivers from Satellite Imagery
Overview
Developed under the Stanford AI for Climate Change (AICC) initiative, this system classifies the drivers of global deforestation — commodity agriculture, shifting cultivation, forestry, wildfire, and urbanization — from multi-temporal Landsat satellite imagery (2000–2018) at 15m/pixel resolution. Understanding why forests are lost (not just where or when) is essential for targeted policy interventions: commodity-driven deforestation requires supply-chain regulation, while wildfire demands fire management and shifting agriculture needs sustainable farming support. The system implements a complete pipeline from satellite image acquisition (Descartes Labs API with multi-sensor fallback across Landsat 8/7/5) through cloud-masked median compositing, 82-feature geospatial enrichment, and deep learning classification via 14 CNN backbone architectures, an LRCN temporal model (CNN + LSTM), and a FusionNet multimodal late fusion architecture combining visual features with geographic coordinates, fire radiative power, population density, and forest loss metrics. Achieves ~80% classification accuracy across 5 driver categories on a global test set.
Five Deforestation Driver Categories
| Class | Definition | Training Distribution |
|---|---|---|
| Commodity-Driven | Permanent conversion to agriculture/mining | 18.7% |
| Shifting Agriculture | Small-scale rotational cultivation | 25.7% |
| Forestry | Large-scale timber harvesting (expected regrowth) | 34.2% |
| Wildfire | Natural or anthropogenic fire-driven loss | 16.5% |
| Urbanization | Settlement and infrastructure expansion | 4.9% |
Ground-truth labels from Curtis et al. (2018, Science); loss-year detection from Hansen Global Forest Change (Hansen et al., 2013, Science).
Satellite Image Acquisition Pipeline
The TileDownloader creates 10km × 10km tiles (666 × 666 pixels at 15m) centered on deforestation coordinates via dl.scenes.DLTile.from_latlon(), with a multi-sensor fallback chain: Landsat 8 → Landsat 7 → Landsat 5 — ensuring coverage across the full 2000–2018 period despite sensor availability gaps.
Band acquisition: RGB + NIR + SWIR1 + SWIR2 + cloud/quality bands. Cloud filtering applies a three-stage pipeline: scene-level cloud fraction threshold (0.5), per-pixel cloud masking using cloud-mask AND bright-mask bands, and NDVI quality control for Landsat 7 (mean NDVI > 48,000). Clean pixels are composited via masked median of the top-5 lowest-cloud scenes into annual composites for temporal sequences.
Forest loss polygon extraction: Hansen GFC product is queried for loss-year masks (tree cover > 30% threshold), then OpenCV contour detection (cv2.findContours) extracts polygon shapes, filtered by minimum size (> 2 pixels), with areas normalized by MAX_LOSS_AREA = 331,836.0.
82 Auxiliary Geospatial Features
Beyond satellite imagery, each sample is enriched with structured geospatial data:
- Fire metrics (21 features): Brightness, count, fire radiative power (FRP) at 1km and 10km scales with max/mean/sum aggregations
- Forest loss metrics (14 features): Fire-associated loss, total loss, loss-year differences at multi-scale
- Forest gain metrics (7 features): Gain at 10km scale with various aggregations
- Land cover types (4 features): Deciduous broadleaf, evergreen broadleaf, mixed, needleleaf
- Population metrics (21 features): Density in 2000/2015, population change, at multi-scale
- Tree cover metrics (7 features): Cover percentage in year 2000 at various spatial scales
- Temporal patterns (10 features): Loss-year difference features at 1km and 10km
All auxiliary features are Z-score normalized using training set statistics, with values below −1e30 imputed as 0 (following Curtis et al.).
Model Architectures
14 CNN Backbones
DenseNet121/161/201, ResNet18/34/101/152, Inceptionv3/v4, ResNeXt101, NASNetA, MNASNet, SENet154, SEResNeXt101 — sourced from both TorchVision and Cadene’s pretrained model zoo. Multi-temporal channel expansion: for temporal sequences, the first convolutional layer is expanded from 3 to 3 × 4 = 12 channels, with pretrained weights replicated across additional channels.
LRCN — Long-term Recurrent Convolutional Network
Based on Donahue et al. (2015), processes temporal satellite image sequences:
Input: [B, C, S, H, W] → Reshape to [B×S, C, H, W]
→ Shared-weight CNN backbone (feature extraction)
→ Reshape to [B, S, num_features]
→ LSTM (hidden=128, 1 layer, batch_first=True)
→ Last hidden state → Linear classifier → 5 classes
Captures temporal dynamics of deforestation — distinguishing between permanent conversion (commodity) and cyclical patterns (shifting agriculture, forestry with regrowth).
FusionNet — Multimodal Late Fusion
Combines CNN visual features with structured geospatial data:
CNN backbone → features → Linear(1024→128) → [128-dim]
+ Sinusoidal lat/lon encoding [20-dim] (Vaswani et al. positional encoding)
OR one-hot region embedding [7-dim]
+ Polygon loss area [1-dim]
+ 82 auxiliary features [82-dim]
→ Concatenated → MLP(→128, ReLU, Dropout(0.2), →128, ReLU, Dropout(0.2), →5)
Positional encoding follows Vaswani et al. (2017): sin(v / 10^(i/N)) and cos(v / 10^(i/N)) for normalized lat/lon coordinates, producing 20-dimensional geographic embeddings. This enables the model to learn spatial patterns of deforestation drivers (e.g., commodity-driven deforestation clusters in South America and Southeast Asia).
RegionModel — Per-Continent Ensemble
Routes samples to 7 continent-specific model instances (NA, LA, EU, AF, AS, SEA, OC), exploiting the strong geographic clustering of deforestation drivers.
SeCo — Self-Supervised Pretraining
Seasonal Contrast (Manas et al., ICCV 2021) — ResNet-18/50 encoders pretrained on unlabeled satellite imagery using seasonal contrast learning, addressing limited labeled data in remote sensing. Four pretrained checkpoints (100K and 1M samples for each backbone).
Baselines
Random Forest and Logistic Regression with grid search (n_estimators, max_depth, regularization), 5-fold cross-validation, and a nearest-neighbor fallback — when classifier confidence < 0.5, predict the label of the geographically closest training sample, exploiting spatial autocorrelation of deforestation drivers.
Training Configuration
| Parameter | Value |
|---|---|
| Optimizer | Adam (lr=3e-5, weight_decay=1e-2) |
| LR scheduler | ReduceLROnPlateau (patience=4, monitoring val accuracy) |
| Batch size | 10 (train), 1 (val/test) |
| Max epochs | 100 |
| Early stopping | Patience 25, monitoring val accuracy |
| Gradient clipping | 0.5 |
| Loss | Cross-entropy (optional class weighting inversely proportional to frequency) |
| Data parallel | distributed_backend="dp" |
| Augmentation | 7 presets; production uses aggressive: salt-and-pepper (0–10%), flips, scale, translate, rotation, shear, elastic deformation |
Hyperparameter optimization: Grid search via Launchpad (LR × weight decay × 20 samples), Bayesian optimization via NNI (TPE tuner, 50 max trials, 12 concurrent), and FusionNet-specific sweeps.
Evaluation
- Per-class precision, recall, F1 via
sklearn.classification_report - Loss-area-weighted evaluation: Metrics weighted by forest loss polygon area — prioritizing classification accuracy on large-scale deforestation events with greater environmental impact
- Per-region breakdown: Separate reports for each of 7 continental regions
- Normalized confusion matrices: Visualized as seaborn heatmaps
Results
~80% classification accuracy across 5 deforestation driver categories on a global test set spanning all 7 continental regions. Per-region convergence epochs: OC=7, NA=7, EU=13, AS=14, SA=19, AF=25 — reflecting varying regional complexity and data availability.
Tech Stack
Python (3.7+), PyTorch, PyTorch Lightning (0.9), Descartes Labs API (satellite imagery), scikit-learn (baselines, metrics), pretrainedmodels (Cadene CNN zoo), imgaug (7 augmentation presets), NNI (Bayesian optimization), OpenCV (contour detection), SymPy, TensorBoard, CircleCI, Travis CI