Assignment #04#
Downloading TAWES data from the GeoSphere Austria Data Hub#
Many datasets of the Austrian weather service GeoSphere Austria can be downloaded via their Data Hub, either through a web interface or through an API. In this assignment, we are going to use the API to access data from the TAWES station network.
The API documentation provides information on how the URL (or API endpoint) is constructed for data access:
<host>/<version>/<type>/<mode>/<resource_id>, where
<host> is https://dataset.api.hub.geosphere.at
<version> is v1
<type> is either grid, timeseries or station (we are only using station in our assignment)
<mode> is ither historical, current or forecast (we are using current and historical)
<resource_id> is the dataset (we are going to use tawes-v1-10min)
A. Write a script download_geosphere_tawes.py that, given a single station ID (e.g., 11320 for Innsbruck University) and parameter (e.g., TL for air temperature), downloads the current data point (<mode> is current) and prints the result in the command line. Format the output nicely by including, e.g., the units, the time stamp, and station number. You can find a list of stations and parameters in the interactive web download portal.
Hint A1: Unlike Assignement #03, where we asked for user input, here we are asking for a script with command line arguments. The command %run download_geosphere_tawes.py TL 11320 (in an IPython interpreter) should download the current air temperature for station 11320. Although argparse is more robust (and much better) than parsing sys.argv yourself, for the purpose of learning you should use sys.argv today and handle the command line arguments yourselves.
Hint A2: To access the data through the API you can use response = requests.get(url, params={key: value}), where the dictionary params includes the keys station_ids and parameters with the values taken from the input provided by the user. Make sure to install and import the requests package first.
Hint A3: What you are downloading is a JSON object, which is similar to a Python dictionary and can (and in our case does) consist of multiple nested dictionaries and arrays that contain the data and metadata (e.g., units and timestamp). Have a look at this short example on how to retrieve the JSON data from response.
B. Extend the script to check if the data download was successful before proceeding with printing the output. If not, print an error message and exit the script.
C. Extend the functionality of your script to download the historical timeseries (<mode> is historical) instead of just the current value and save the timeseris to a csv file if two additional command-line arguments (start and end time) are provided, e.g., %run download_geosphere_tawes.py TL 11320 2025-08-01T00:00 2025-10-01T00:00. Note that the TAWES dataset that we are using contains data only up to 3 months into the past (and the data are not quality-controlled).
Hint C1: You can use
import pandas as pd
pd.Series(index=time, data=value).to_csv(datafile, header=False)
to save the data to a file datafile, where time is a list of all timestamps and data is a list with all the data values.
C1. Include a check to make sure that the start and end times provided by the user are formatted correctly (YYYY-mm-ddTHH:MM). Print an error messaage and exit the script if that is not the case.
C2. Give the user the possibility to specify the folder, where the data should be stored, by providing an optional command-line argument (--output-dir /path/to/a/dir). If the directory does not exist, print an error message and exit the script. If the user does not specify an output directory, use a default value.
C3 (optional). Plot the timeseries and add another optional command-line argument (--create_plot) that allows the user to choose whether they want a plot to be created or not.
D. Structure your code into several functions (think about which parts of the code could make up meaningful functions) and use a
if __name__ == '__main__':
block in your code.