Climate Data: Issues, Systems, and Opportunities

Presentation: pdf (25 MB)

The aim of this talk was to introduce students at the Mathematics for Planet Earth summer school in Exeter to some of the issues in data science. I knew I would be following Wilco Hazeleger who was talking on Climate Science and Big Data, so I didn’t have to hit all the big data messages.

My aim was to cover some of the usual isues around heterogeneity and tools, but really cover some of the upcoming volume issues, using as many real numbers as I could. One of the key points I was making was that as we go forward in time, we are moving from a science dominated by the difficulty of simulation to one that will be dominated by the difficulty of data handling - and that problem is really here now, although clearly the transition to exascale will involve problems with both data and simulation. I also wanted to get across some o fthe difficulties associated with next generation model intercomparison - interdependency and data handling - and how those apply to both the modellers themselves as well as the putative users of the simulation data.

The 1km model future is interesting in terms of data handling. I made a slightly preposterous extrapolation (an exercise for the reader is to work out what is preposterous) … but only to show the potential scale of the problem, and the many opportunities for doing something about it.

The latter part of the talk covered some of the same ground as my data science talk from March, to give the students a taste of some of the things that can done with modern techniques and (often) old data.

The last slide was aimed at reminding folks that climate science has always been a data science, and actually, always a big data science! Climate data is always large compared to what others are actually handling … and that we have always managed to address the upcoming data challenges. I hope now will be no different.