Predicting the Weather

Predictions using AI are getting easier.

Gen.AI is making predictions more accessible.

You still need good, clean data. But newer technology is making it a quicker process.

The extra speed allows you to assess opportunities without the need to first invest in large infrastructure.

Take the weather in London, for example. It is consistent – normally cloudy with rain. Occasionally the sun shines and, sometimes, it even snows.

So how can you programmatically predict the weather in London with a level of accuracy?

You need clean data to start. Then machine learning (ML) for predictions. Simple? Yes… …perhaps as simple as Pareto’s 80-20 principle.

Simple to get up and running (~20% of your time), but the tuning in accuracy will take longer (~80% of time).

Working out the quality level (what is enough?), takes time and user feedback. My experiment looks at the 20%, so quality hasn’t taken priority.

So, what is the chance of snow in London?

Based on historic snowfall, the chance of snow is just 2% in November, peaking at 4% in January, and going down to 1% in March. This is using historic data, rather than ML to predict when snow is most likely to occur.

Rain in London is easier to predict, since it happens more frequently than snow. So, I built a prediction model using ML and a dataset with 15,340 daily Heathrow weather results from 1979-2020.

The model predicted rain each week of the year, and then compared it to the actuals.

The model got it right for 83% of the weeks. But it did predict above 40% chance of rain every week. No wonder weather forecasters have a hard time.

This is without any real tuning of the algorithm or much experimentation with the code libraries. But it’s not production ready either, since quality assurance is needed.

Here is the week-by-week view, with 39 of the 47 weeks correct in predicting rain, and 8 incorrect in their probability of rain.

Can anything else be done to tune the solution for better predictions?

Variables like air pressure, cloud cover & temperature could help – more actual data typically means better predictions. Data that is near real-time, is better for accuracy.

The historic dataset could be expanded to areas with comparable weather patterns (Paris?, Toronto?). Adding percentage probability and confidence levels can help with predictions rather than just a Right or Wrong.

This is a brief experiment with predicting what has historically been very hard to do – who would want to be a weather forecaster!

The technology is improving, and importantly it is making prototyping so much easier.

Dataset used:

London Weather, 1979-2020, Kaggle

Technologies used:

Python: programming language for building and running the weather prediction model
Pandas: Data processing and manipulation
NumPy: Numerical operations
Scikit-learn: Machine learning algorithms and evaluation
SMOTE: Balancing the dataset
Datetime: Handling time series data
CSV: Reading and writing weather data