Technical course: Machine learning on satellite imagery

About
Skills in Machine Learning (ML) are ubiquitous for 21st century statisticians or Data Scientists. In this course you will be introduced to this field of science by covering the most popular tasks from supervised and unsupervised ML.

Motivated by the fact that remote-sensing imagery is already being used to address development issues, i.e. revealing changes in soil quality or water availability, informing agricultural interventions and even measuring poverty; we structured this course around ML methods which can be applied to satellite imagery, aiming to help statistical teams to leverage this modern and omnipresent data source.

Subject matter
The use of Big Data is accelerating within the development and humanitarian practice. If used right, its implementation can foster inclusion, efficiency, and lower project costs which may benefit public and private organizations involved on development programs. Therefore, our technical courses cover different aspects on data science and data engineering relevant for the context of official statistics and sustainable development.

Methodology
All the programing material is provided in Python using the conventional Open Source libraries for Data Science. Most of the sessions are interactive and on a Jupyter Notebook (.ipynb). A practical exercise is completed at the end of each session.

Format and instructors
This course is offered face-to-face (or via videoconference if necessary), it has a duration of 18 hours ideally distributed along 3 days, and is designed for 20 participants. Each course is delivered by a team of 2 training specialists.

Requirements
Some programming experience is required; Python is preferable though not necessary.

Syllabus

A. Introduction to supervised Machine Learning

Methods from supervised machine learning are those which have progressed more in both academic and industrial environments. We cover the task of classification, training a model for being able to categorize the observations we give to it. We examine algorithms as Logistic Regression, Support Vector Machines, Gradient Boosted Trees and Neural Networks going though its theoretical basis and applying them in practice.

B. Introduction to unsupervised Machine Learning

Unsupervised machine learning is currently the second most popular area of the field. We will cover the task of clustering, forming groups of observations which are similar in a previously defined sense. We will go through the theoretical principles and we will put in practice different algorithms such as k-means, Gaussian Mixture Model and others, which are popular in the field of computer vision.

C. Case study: Satellite imagery for measuring urban extent (SDG 11.3.1)

This case-study is developed with Landsat 8 satellite images, which are free and accessible for everyone. Both supervised and unsupervised machine learning methods presented previously will be used to be able to infer urban extent from open satellite images, as part of the calculation of the SDG 11.3.1 (Tier 2).

We focus this exercise in using satellite imagery to bridge data gaps in measuring SDG the mentioned Tier 2 indicator, this means it is “conceptually clear, has an internationally established methodology and standards are available, but for which data is not regularly produced by countries”.

Target Audience

This course is aimed at professionals for which programming is part of their daily activities or whom are leading a technical team.

Learning Objectives

Upon completion of the workshop you will be able to:

Understand the basic concepts of supervised and unsupervised machine learning.
Put into practice machine learning techniques for calculating an SDG indicator (SGD 11.3.1) using free satellite imagery.