Assignment #07#

Exercise #07-01: Using ChatGPT for programming#

A Galton Board is a vertical board with staggered rows of pins or other obstacles. As beads are dropped from the top, they bounce off these pins, either to the left or right, finding a path between the pins until they drop into one of the collecting bins at the bottom. The resulting distribution of beads in the bins will follow a binomial distribution.

In this exercise, we are going to explore how we can use ChatGPT to potentially improve our program and our programming style and what some of the drawbacks may be. To interact with ChatGPT, you will need to create an account if you don’t have one yet.

A. Write a program that simulates a Galton Board using only the standard library. The user should be able to specify the number of rows of pins on the Galton Board and the number of beads to simulate as commond-line arguments. For this assignment you are asked to use the argparse module to parse the command-line arguments. The end result of your program should be a list containing the number of beads in each bin. It is important that you write this code yourself without the help of ChatGPT.

B. Now ask ChatGPT to document the code for you. Does it provide accurate and sufficient documentation (e.g., docstrings and meaningful comment lines)?

C. Ask ChatGPT to provide some suggestions on how to improve your code. But make sure that it still follows the instructions in our assignment. Are the suggestions actually improvements?

C1. Hit regenerate and see if you get different suggestions this time.

D. Ask ChatGPT to write a function for you to plot the distribution of beads using matplotlib. It should also modify your program accordingly, that is, include calls to the plot function and store the resulting plot in a directory that the user can specify via command-line arguments. Does the code work as expected?

E. Now start a fresh conversation with ChatGPT and actually ask it to produce the entire code for you. Is the code doing what it is supposed to be doing? How does it compare to your code? Think about what it takes to make sure that you can rely on the code being correct.

F. Which of the above interactions with ChatGPT provided useful input to you, which did not?

Exercise #07-02: ACINN meteorological data#

This exercise will give you a glimpse of working with pandas. We are going to analyze a dataset downloaded from the ACINN department database. The one-month long dataset is from the automatic weather station TAWES UIBK. Here you can find a description of the variable names.

The data is shared by ACINN under a Creative Commons Attribution-ShareAlike 4.0 International License.

You can access the dataset at the URL shown below. Data downloaded from the department database are formatted as csv files, which we can read in easily using pandas. You may want to read the documentation of read_csv to see what all the arguments do.

from urllib.request import Request, urlopen
from io import BytesIO
import pandas as pd

url = 'https://raw.githubusercontent.com/manuelalehner/scientific_programming/master/data/data_Ibk_Sep2024.csv'
# Parse the given url
req = urlopen(Request(url)).read()
# Read the data
data = pd.read_csv(BytesIO(req), sep=';', header=1, index_col=0, parse_dates=True)

The data are read into a so-called DataFrame, which can be very useful for time series analysis, e.g., from weather stations. Let’s explore this DataFrame somewhat to get you started.

data.columns # list all the column headers
Index(['tl', 'tl2', 'ts', 'tb1', 'tb2', 'tb3', 'tp', 'rf', 'rf2', 'rr', 'rrm',
       'p', 'som', 'glom', 'ffamm', 'ffm', 'ddm', 'ffxm', 'ddxm'],
      dtype='object')
data['som'] # access the data column 'som' (sunshine duration)
rawdate
2024-09-01 00:00:00    0.0
2024-09-01 00:01:00    0.0
2024-09-01 00:02:00    0.0
2024-09-01 00:03:00    0.0
2024-09-01 00:04:00    0.0
                      ... 
2024-09-30 23:55:00    0.0
2024-09-30 23:56:00    0.0
2024-09-30 23:57:00    0.0
2024-09-30 23:58:00    0.0
2024-09-30 23:59:00    0.0
Name: som, Length: 43200, dtype: float64
data.index # access the datetime index
DatetimeIndex(['2024-09-01 00:00:00', '2024-09-01 00:01:00',
               '2024-09-01 00:02:00', '2024-09-01 00:03:00',
               '2024-09-01 00:04:00', '2024-09-01 00:05:00',
               '2024-09-01 00:06:00', '2024-09-01 00:07:00',
               '2024-09-01 00:08:00', '2024-09-01 00:09:00',
               ...
               '2024-09-30 23:50:00', '2024-09-30 23:51:00',
               '2024-09-30 23:52:00', '2024-09-30 23:53:00',
               '2024-09-30 23:54:00', '2024-09-30 23:55:00',
               '2024-09-30 23:56:00', '2024-09-30 23:57:00',
               '2024-09-30 23:58:00', '2024-09-30 23:59:00'],
              dtype='datetime64[ns]', name='rawdate', length=43200, freq=None)

Write a script that allows the user to input the variable (e.g., air temperature, wind speed, …) either as a command line argument or using input() and that prints the following information in the terminal.

If the variable is wind direction:

The dominant wind direction was {XX} ({XX}% of the time). The least dominant wind direction was {XX} ({XX}% of the time).

If it is any other variable:

The maximum {VARIABLE} was {XX} {UNITS} ({DATE/TIME}), while the strongest {VARIABLE} averaged over an hour was {XX} {UNITS} ({DATE/TIME}).")

Hint 1: You can use either numpy to determine, for example, the maximum values or you can work directly with the DataFrame. (e.g., calculating the maximum). To convert a column of the DataFrame to a numpy array, you can use

temp = data['som'].to_numpy()

Hint 2: Calculating time averages is easy using the pandas resample method.

Hint 3: For wind direction, use the following eight wind direction classes: N, NW, W, SW, S, SE, E, NE.

Hint 4: To output the datetime index in a specific format, you can use the strftime method.