{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "UxDZW841tkSi" }, "source": [ "# **Weather type reconstruction with neural networks**" ] }, { "cell_type": "markdown", "metadata": { "id": "_Nha4NG_tkSl" }, "source": [ "created by: Lucas Pfister, 2024\n", "\n", "### ***Description:***\n", "\n", "This notebook contains the code used for weather type reconstruction in Pfister et al. (2024). For details see the description in this paper. As the original station observations are not all publicly available (yet), a dummy dataset is available for demonstration purposes. The classification method is designed for 9 weather types (similar to the CAP9 classification (Weusthoff, 2011) used in the paper).\n", "\n", "\n", "The notebook contains code to 1) read in 2) and preprocess the model input data, to 3) evaluate the model (independent validation) and to 4) create WT reconstructions. For this purpose, the numpy, pandas, matplotlib, tensorflow and sklearn libraries have to be installed.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Dit404ghtkSr" }, "source": [ "## **0) Load libraries**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "YbRY_dIutkSr", "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-04-04 11:50:08.693263: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: FMA\n", "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", "2024-04-04 11:50:10.404215: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory\n", "2024-04-04 11:50:10.404275: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.\n", "2024-04-04 11:50:15.358188: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory\n", "2024-04-04 11:50:15.358820: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory\n", "2024-04-04 11:50:15.358849: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n" ] } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "import tensorflow as tf\n", "from tensorflow import keras\n", "\n", "import sklearn\n", "from sklearn.metrics import confusion_matrix\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.preprocessing import PolynomialFeatures\n", "\n", "from sklearn.model_selection import train_test_split, KFold" ] }, { "cell_type": "markdown", "metadata": { "id": "TE_HXLPjfRBt" }, "source": [ "## **1) Read data**\n", "\n", "For demonstration purposes, a dummy dataset is read with four pressure series (pp) and three temperature series (ta), as well as the weather types (WT) in the last column. Note that the dummy weather types (9 classes) losely match the patterns in the pressure and temperature series, so model training is possible." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AAA_ppBBB_ppCCC_ppDDD_ppEEE_ppAAA_taBBB_taCCC_taWT
1957-09-011020.91013.91019.11012.41014.810.915.312.93
1957-09-021011.11009.11019.11004.41006.314.820.618.92
1957-09-031026.11015.71024.01020.11010.810.317.615.93
1957-09-041012.51024.61029.11021.31021.712.916.312.95
1957-09-051017.11016.71022.31012.21011.88.815.912.04
..............................
2020-12-271006.91013.51020.01006.91015.04.25.45.72
2020-12-281010.21024.61029.31015.71019.4-1.33.72.85
2020-12-29989.5989.41015.0983.7983.20.35.65.97
2020-12-301014.71005.91023.61012.0998.3-1.02.67.23
2020-12-311026.61020.11031.61029.21015.2-4.6-0.7-7.55
\n", "

23133 rows × 9 columns

\n", "
" ], "text/plain": [ " AAA_pp BBB_pp CCC_pp DDD_pp EEE_pp AAA_ta BBB_ta CCC_ta WT\n", "1957-09-01 1020.9 1013.9 1019.1 1012.4 1014.8 10.9 15.3 12.9 3\n", "1957-09-02 1011.1 1009.1 1019.1 1004.4 1006.3 14.8 20.6 18.9 2\n", "1957-09-03 1026.1 1015.7 1024.0 1020.1 1010.8 10.3 17.6 15.9 3\n", "1957-09-04 1012.5 1024.6 1029.1 1021.3 1021.7 12.9 16.3 12.9 5\n", "1957-09-05 1017.1 1016.7 1022.3 1012.2 1011.8 8.8 15.9 12.0 4\n", "... ... ... ... ... ... ... ... ... ..\n", "2020-12-27 1006.9 1013.5 1020.0 1006.9 1015.0 4.2 5.4 5.7 2\n", "2020-12-28 1010.2 1024.6 1029.3 1015.7 1019.4 -1.3 3.7 2.8 5\n", "2020-12-29 989.5 989.4 1015.0 983.7 983.2 0.3 5.6 5.9 7\n", "2020-12-30 1014.7 1005.9 1023.6 1012.0 998.3 -1.0 2.6 7.2 3\n", "2020-12-31 1026.6 1020.1 1031.6 1029.2 1015.2 -4.6 -0.7 -7.5 5\n", "\n", "[23133 rows x 9 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## read dataset\n", "training_data = pd.read_csv(\"WTrec_DummyTrainingData.csv\", index_col=0)\n", "training_data.index = pd.to_datetime(training_data.index)\n", "training_data" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "## separate WT series from station data\n", "WT_series = training_data.WT.copy()\n", "data = training_data.drop(\"WT\", axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## **2) Preprocessing**\n", "\n", "Seasonality correction (fitting first two harmonics) and trend correction (3rd order polynomial) for temperature series. Standardization of model input data." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "seascor = True # whether to correct temperature seasonality\n", "detrend = True # whether to correct temperature trend" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **2.1) Seasonality correction**" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def calcseas(t_series):\n", " '''Takes temperature series (Pandas Series object) and returns date array, fitted values (average seasonality) and residuals (anomalies from average seasonality)'''\n", " \n", " # get day of year and number of days\n", " doy = t_series.index.dayofyear\n", " ndoy = t_series.index.year.map(lambda x: pd.Timestamp(x, 12, 31).dayofyear)\n", " \n", " # array with 1st & 2nd harmonics (transposed)\n", " x = np.array([np.cos(2*np.pi*doy/ndoy),np.sin(2*np.pi*doy/ndoy),np.cos(4*np.pi*doy/ndoy),np.sin(4*np.pi*doy/ndoy)]).T\n", "\n", " # get temperature data\n", " y=t_series.values\n", " \n", " # get rid of na values for fit\n", " nonan_idx = np.where(~np.isnan(y))\n", " x_=x[nonan_idx]\n", " y_=y[nonan_idx]\n", "\n", " if y_.size == 0:\n", " print(\"observation vector is empty\")\n", " ynew = res = np.zeros_like(y)*np.nan\n", " \n", " else:\n", " ## 2nd harmonics fit\n", " reg = LinearRegression().fit(x_, y_)\n", " \n", " # fitted values\n", " ynew = reg.predict(x)\n", " \n", " # residuals\n", " res = y-ynew\n", " \n", " return(t_series.index, ynew, res)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "apply seasonality correction\n", "AAA_ta\n", "BBB_ta\n", "CCC_ta\n" ] } ], "source": [ "if seascor:\n", " print(\"apply seasonality correction\")\n", " for x in data.filter(regex=r'ta').columns:\n", " print(x)\n", " tseries = data[x]\n", " t, n, r = calcseas(tseries)\n", " data[x] = r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **2.2) Trend correction**" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def detr_poly(t_series, poly_degree = 3):\n", " '''Takes temperature series (Pandas Series object) and returns detrended time series'''\n", " XX = np.reshape(t_series.index, (len(t_series.index), 1))\n", " YY = t_series\n", " pf = PolynomialFeatures(degree=poly_degree)\n", " Xp = pf.fit_transform(XX)\n", " md2 = LinearRegression()\n", " md2.fit(Xp, YY)\n", " trendp = md2.predict(Xp)\n", " \n", " series_detr = YY-trendp\n", " return series_detr" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "detrend temperature data\n", "AAA_ta\n", "BBB_ta\n", "CCC_ta\n" ] } ], "source": [ "if detrend:\n", " print(\"detrend temperature data\")\n", " for x in data.filter(regex=r'ta').columns:\n", " print(x)\n", " tseries = data[x]\n", " tseries_detr = detr_poly(tseries)\n", " data[x] = tseries_detr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **2.3) Standardization**" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "standardize data\n" ] } ], "source": [ "print(\"standardize data\")\n", "## normalize station data (column-wise)\n", "statdata_norm = (data-data.mean())/data.std()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AAA_ppBBB_ppCCC_ppDDD_ppEEE_ppAAA_taBBB_taCCC_ta
1957-09-010.861675-0.1181710.0658970.2809970.221356-0.903708-1.544809-0.721915
1957-09-02-0.164191-0.6381980.065897-0.582752-0.5843700.1211210.1819120.918599
1957-09-031.4060120.0768400.6931611.112355-0.157809-0.985966-0.7356940.133359
1957-09-04-0.0176391.0410561.3460271.2419170.875417-0.290144-1.110568-0.651333
1957-09-050.4638900.1851780.4755390.259403-0.063018-1.294703-1.197688-0.869345
...........................
2020-12-27-0.603848-0.1615060.181109-0.3128310.2403151.7550740.8858010.834880
2020-12-28-0.2584031.0410561.3716300.6372930.6573970.3772220.3559130.064347
2020-12-29-2.425284-2.772475-0.458957-2.817702-2.7740500.8012160.9723390.910779
2020-12-300.212658-0.9848820.6419550.237809-1.3427010.4884470.0260721.271371
2020-12-311.4583530.5535311.6660602.0948690.259273-0.408775-1.016832-2.682053
\n", "

23133 rows × 8 columns

\n", "
" ], "text/plain": [ " AAA_pp BBB_pp CCC_pp DDD_pp EEE_pp AAA_ta \\\n", "1957-09-01 0.861675 -0.118171 0.065897 0.280997 0.221356 -0.903708 \n", "1957-09-02 -0.164191 -0.638198 0.065897 -0.582752 -0.584370 0.121121 \n", "1957-09-03 1.406012 0.076840 0.693161 1.112355 -0.157809 -0.985966 \n", "1957-09-04 -0.017639 1.041056 1.346027 1.241917 0.875417 -0.290144 \n", "1957-09-05 0.463890 0.185178 0.475539 0.259403 -0.063018 -1.294703 \n", "... ... ... ... ... ... ... \n", "2020-12-27 -0.603848 -0.161506 0.181109 -0.312831 0.240315 1.755074 \n", "2020-12-28 -0.258403 1.041056 1.371630 0.637293 0.657397 0.377222 \n", "2020-12-29 -2.425284 -2.772475 -0.458957 -2.817702 -2.774050 0.801216 \n", "2020-12-30 0.212658 -0.984882 0.641955 0.237809 -1.342701 0.488447 \n", "2020-12-31 1.458353 0.553531 1.666060 2.094869 0.259273 -0.408775 \n", "\n", " BBB_ta CCC_ta \n", "1957-09-01 -1.544809 -0.721915 \n", "1957-09-02 0.181912 0.918599 \n", "1957-09-03 -0.735694 0.133359 \n", "1957-09-04 -1.110568 -0.651333 \n", "1957-09-05 -1.197688 -0.869345 \n", "... ... ... \n", "2020-12-27 0.885801 0.834880 \n", "2020-12-28 0.355913 0.064347 \n", "2020-12-29 0.972339 0.910779 \n", "2020-12-30 0.026072 1.271371 \n", "2020-12-31 -1.016832 -2.682053 \n", "\n", "[23133 rows x 8 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "statdata_norm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## **3) Model tuning**" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "## define predictor and predictand data\n", "dates = statdata_norm.index\n", "x_data = statdata_norm.to_numpy()\n", "y_data = WT_series.to_numpy()\n", "\n", "n_data = x_data.shape[0]\n", "\n", "y_data_1_hot = np.zeros((n_data, y_data.max()))\n", "y_data_1_hot[np.arange(n_data),y_data-1] = 1\n", "\n", "n_input = x_data.shape[1]\n", "n_output = y_data_1_hot.shape[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **3.1) Model setup**\n", "\n", "As the time series in our dummy dataset correspond to the 1738 station set (5 pressure and 4 temperature series), either this model can be loaded or a new one can be created (un/comment corresponding lines)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "### create NN model (from scratch)\n", "\n", "#model = keras.Sequential()\n", " \n", "## input layer\n", "#model.add(tf.keras.layers.Input(name='input', dtype=tf.float32, shape=[n_input]))\n", "\n", "## hidden layers\n", "#model.add(tf.keras.layers.Dense(units=256, activation='relu'))\n", "#model.add(tf.keras.layers.Dense(units=128, activation='relu'))\n", " \n", "## dropout layer \n", "#model.add(tf.keras.layers.Dropout(rate = 0.1))\n", " \n", "## output layer\n", "#model.add(tf.keras.layers.Dense(units=n_output, name='output', activation='softmax'))\n", "\n", "## compile model\n", "#model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001), loss=\"categorical_crossentropy\", metrics=['accuracy'])\n", "\n", "#model.summary()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-04-04 11:50:52.023935: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory\n", "2024-04-04 11:50:52.025388: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)\n", "2024-04-04 11:50:52.025495: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (climcal4.giub.unibe.ch): /proc/driver/nvidia/version does not exist\n", "2024-04-04 11:50:52.027658: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: FMA\n", "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Model: \"sequential\"\n", "_________________________________________________________________\n", " Layer (type) Output Shape Param # \n", "=================================================================\n", " dense (Dense) (None, 256) 2304 \n", " \n", " dense_1 (Dense) (None, 128) 32896 \n", " \n", " dropout (Dropout) (None, 128) 0 \n", " \n", " output (Dense) (None, 9) 1161 \n", " \n", "=================================================================\n", "Total params: 36,361\n", "Trainable params: 36,361\n", "Non-trainable params: 0\n", "_________________________________________________________________\n" ] } ], "source": [ "### read pre-trained model\n", "model = tf.keras.models.load_model('NN_models/NN_hypermodel_stat_1738_tot.keras')\n", "model.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **3.2) Model validation**\n", "\n", "k-fold cross-validation. Note that this code is merely a regular cross-validation for a single model and not a nested cross-validation for hyperparameter tuning purposes." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "outer_fold = 1\n", "[ 2892 2893 2894 ... 23130 23131 23132] [ 0 1 2 ... 2889 2890 2891]\n", "Epoch 1/40\n", "87/87 [==============================] - 2s 13ms/step - loss: 0.6697 - accuracy: 0.7288 - val_loss: 0.6126 - val_accuracy: 0.7448\n", "Epoch 2/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5768 - accuracy: 0.7603 - val_loss: 0.5650 - val_accuracy: 0.7697\n", "Epoch 3/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5559 - accuracy: 0.7697 - val_loss: 0.5535 - val_accuracy: 0.7690\n", "Epoch 4/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5498 - accuracy: 0.7704 - val_loss: 0.5641 - val_accuracy: 0.7580\n", "Epoch 5/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5401 - accuracy: 0.7753 - val_loss: 0.5466 - val_accuracy: 0.7714\n", "Epoch 6/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5352 - accuracy: 0.7770 - val_loss: 0.5556 - val_accuracy: 0.7659\n", "Epoch 7/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5343 - accuracy: 0.7772 - val_loss: 0.5513 - val_accuracy: 0.7690\n", "Epoch 8/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5290 - accuracy: 0.7783 - val_loss: 0.5391 - val_accuracy: 0.7714\n", "Epoch 9/40\n", "87/87 [==============================] - 1s 9ms/step - loss: 0.5325 - accuracy: 0.7762 - val_loss: 0.5399 - val_accuracy: 0.7701\n", "Epoch 10/40\n", "87/87 [==============================] - 1s 11ms/step - loss: 0.5250 - accuracy: 0.7815 - val_loss: 0.5474 - val_accuracy: 0.7697\n", "Epoch 11/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5222 - accuracy: 0.7814 - val_loss: 0.5444 - val_accuracy: 0.7697\n", "Epoch 12/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5223 - accuracy: 0.7821 - val_loss: 0.5448 - val_accuracy: 0.7732\n", "Epoch 13/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5186 - accuracy: 0.7844 - val_loss: 0.5402 - val_accuracy: 0.7749\n", "91/91 [==============================] - 0s 2ms/step\n", "outer_fold = 2\n", "[ 0 1 2 ... 23130 23131 23132] [2892 2893 2894 ... 5781 5782 5783]\n", "Epoch 1/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5185 - accuracy: 0.7838 - val_loss: 0.5455 - val_accuracy: 0.7687\n", "Epoch 2/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5179 - accuracy: 0.7824 - val_loss: 0.5425 - val_accuracy: 0.7766\n", "Epoch 3/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5176 - accuracy: 0.7833 - val_loss: 0.5456 - val_accuracy: 0.7690\n", "Epoch 4/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5138 - accuracy: 0.7854 - val_loss: 0.5401 - val_accuracy: 0.7746\n", "Epoch 5/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5117 - accuracy: 0.7874 - val_loss: 0.5416 - val_accuracy: 0.7718\n", "Epoch 6/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5117 - accuracy: 0.7839 - val_loss: 0.5407 - val_accuracy: 0.7763\n", "Epoch 7/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5088 - accuracy: 0.7890 - val_loss: 0.5427 - val_accuracy: 0.7701\n", "Epoch 8/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5070 - accuracy: 0.7856 - val_loss: 0.5446 - val_accuracy: 0.7690\n", "Epoch 9/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5056 - accuracy: 0.7884 - val_loss: 0.5403 - val_accuracy: 0.7728\n", "91/91 [==============================] - 0s 3ms/step\n", "outer_fold = 3\n", "[ 0 1 2 ... 23130 23131 23132] [5784 5785 5786 ... 8673 8674 8675]\n", "Epoch 1/40\n", "87/87 [==============================] - 1s 11ms/step - loss: 0.5113 - accuracy: 0.7869 - val_loss: 0.5467 - val_accuracy: 0.7680\n", "Epoch 2/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5088 - accuracy: 0.7882 - val_loss: 0.5468 - val_accuracy: 0.7694\n", "Epoch 3/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5081 - accuracy: 0.7871 - val_loss: 0.5443 - val_accuracy: 0.7690\n", "Epoch 4/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5032 - accuracy: 0.7896 - val_loss: 0.5380 - val_accuracy: 0.7797\n", "Epoch 5/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5039 - accuracy: 0.7901 - val_loss: 0.5460 - val_accuracy: 0.7725\n", "Epoch 6/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5012 - accuracy: 0.7922 - val_loss: 0.5444 - val_accuracy: 0.7742\n", "Epoch 7/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4989 - accuracy: 0.7904 - val_loss: 0.5473 - val_accuracy: 0.7704\n", "Epoch 8/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4989 - accuracy: 0.7902 - val_loss: 0.5479 - val_accuracy: 0.7725\n", "Epoch 9/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4969 - accuracy: 0.7925 - val_loss: 0.5427 - val_accuracy: 0.7714\n", "91/91 [==============================] - 0s 3ms/step\n", "outer_fold = 4\n", "[ 0 1 2 ... 23130 23131 23132] [ 8676 8677 8678 ... 11565 11566 11567]\n", "Epoch 1/40\n", "87/87 [==============================] - 1s 11ms/step - loss: 0.5016 - accuracy: 0.7927 - val_loss: 0.5505 - val_accuracy: 0.7656\n", "Epoch 2/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.5004 - accuracy: 0.7904 - val_loss: 0.5451 - val_accuracy: 0.7714\n", "Epoch 3/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4985 - accuracy: 0.7904 - val_loss: 0.5484 - val_accuracy: 0.7707\n", "Epoch 4/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4966 - accuracy: 0.7909 - val_loss: 0.5584 - val_accuracy: 0.7683\n", "Epoch 5/40\n", "87/87 [==============================] - 1s 9ms/step - loss: 0.4950 - accuracy: 0.7905 - val_loss: 0.5601 - val_accuracy: 0.7669\n", "Epoch 6/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4934 - accuracy: 0.7927 - val_loss: 0.5487 - val_accuracy: 0.7697\n", "Epoch 7/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4914 - accuracy: 0.7953 - val_loss: 0.5525 - val_accuracy: 0.7739\n", "91/91 [==============================] - 0s 2ms/step\n", "outer_fold = 5\n", "[ 0 1 2 ... 23130 23131 23132] [11568 11569 11570 ... 14457 14458 14459]\n", "Epoch 1/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4966 - accuracy: 0.7952 - val_loss: 0.5494 - val_accuracy: 0.7690\n", "Epoch 2/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4884 - accuracy: 0.7976 - val_loss: 0.5589 - val_accuracy: 0.7697\n", "Epoch 3/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4888 - accuracy: 0.7964 - val_loss: 0.5468 - val_accuracy: 0.7746\n", "Epoch 4/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4887 - accuracy: 0.7956 - val_loss: 0.5534 - val_accuracy: 0.7694\n", "Epoch 5/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4863 - accuracy: 0.7968 - val_loss: 0.5526 - val_accuracy: 0.7746\n", "Epoch 6/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4873 - accuracy: 0.7949 - val_loss: 0.5614 - val_accuracy: 0.7687\n", "Epoch 7/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4857 - accuracy: 0.7982 - val_loss: 0.5559 - val_accuracy: 0.7732\n", "Epoch 8/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4861 - accuracy: 0.7987 - val_loss: 0.5599 - val_accuracy: 0.7701\n", "91/91 [==============================] - 0s 2ms/step\n", "outer_fold = 6\n", "[ 0 1 2 ... 23130 23131 23132] [14460 14461 14462 ... 17348 17349 17350]\n", "Epoch 1/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4870 - accuracy: 0.7937 - val_loss: 0.5503 - val_accuracy: 0.7742\n", "Epoch 2/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4865 - accuracy: 0.7963 - val_loss: 0.5517 - val_accuracy: 0.7687\n", "Epoch 3/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4840 - accuracy: 0.7985 - val_loss: 0.5520 - val_accuracy: 0.7701\n", "Epoch 4/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4820 - accuracy: 0.7978 - val_loss: 0.5479 - val_accuracy: 0.7687\n", "Epoch 5/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4808 - accuracy: 0.7960 - val_loss: 0.5544 - val_accuracy: 0.7714\n", "Epoch 6/40\n", "87/87 [==============================] - 1s 9ms/step - loss: 0.4791 - accuracy: 0.7993 - val_loss: 0.5601 - val_accuracy: 0.7676\n", "Epoch 7/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4766 - accuracy: 0.8020 - val_loss: 0.5499 - val_accuracy: 0.7735\n", "Epoch 8/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4786 - accuracy: 0.8002 - val_loss: 0.5518 - val_accuracy: 0.7742\n", "Epoch 9/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4773 - accuracy: 0.8018 - val_loss: 0.5581 - val_accuracy: 0.7701\n", "91/91 [==============================] - 0s 3ms/step\n", "outer_fold = 7\n", "[ 0 1 2 ... 23130 23131 23132] [17351 17352 17353 ... 20239 20240 20241]\n", "Epoch 1/40\n", "87/87 [==============================] - 1s 11ms/step - loss: 0.4865 - accuracy: 0.7976 - val_loss: 0.5527 - val_accuracy: 0.7714\n", "Epoch 2/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4872 - accuracy: 0.7954 - val_loss: 0.5562 - val_accuracy: 0.7676\n", "Epoch 3/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4853 - accuracy: 0.7956 - val_loss: 0.5538 - val_accuracy: 0.7687\n", "Epoch 4/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4824 - accuracy: 0.7967 - val_loss: 0.5637 - val_accuracy: 0.7697\n", "Epoch 5/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4819 - accuracy: 0.7976 - val_loss: 0.5535 - val_accuracy: 0.7714\n", "Epoch 6/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4782 - accuracy: 0.7984 - val_loss: 0.5590 - val_accuracy: 0.7704\n", "91/91 [==============================] - 0s 3ms/step\n", "outer_fold = 8\n", "[ 0 1 2 ... 20239 20240 20241] [20242 20243 20244 ... 23130 23131 23132]\n", "Epoch 1/40\n", "87/87 [==============================] - 1s 11ms/step - loss: 0.4768 - accuracy: 0.8000 - val_loss: 0.4616 - val_accuracy: 0.8050\n", "Epoch 2/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4811 - accuracy: 0.7951 - val_loss: 0.4626 - val_accuracy: 0.8046\n", "Epoch 3/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4771 - accuracy: 0.7986 - val_loss: 0.4614 - val_accuracy: 0.8053\n", "Epoch 4/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4771 - accuracy: 0.7982 - val_loss: 0.4743 - val_accuracy: 0.8001\n", "Epoch 5/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4774 - accuracy: 0.8012 - val_loss: 0.4664 - val_accuracy: 0.8029\n", "Epoch 6/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4746 - accuracy: 0.8003 - val_loss: 0.4705 - val_accuracy: 0.8012\n", "Epoch 7/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4739 - accuracy: 0.7986 - val_loss: 0.4653 - val_accuracy: 0.7960\n", "Epoch 8/40\n", "87/87 [==============================] - 1s 10ms/step - loss: 0.4762 - accuracy: 0.8006 - val_loss: 0.4682 - val_accuracy: 0.8012\n", "91/91 [==============================] - 0s 3ms/step\n" ] } ], "source": [ "## define input data\n", "Xdata = x_data\n", "ydata = y_data_1_hot\n", "\n", "## define the K-fold cross validator\n", "nfold_outer = 8\n", "cv_outer = KFold(n_splits=nfold_outer, shuffle=False)#, random_state=42)\n", "\n", "## define early stopping conditions for model tuning\n", "stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, mode=\"min\")\n", "\n", "## validation output table\n", "accuracy_outer_post = pd.DataFrame(index=np.arange(1,nfold_outer+1), columns=[\"ANN\",\"DJF\",\"MAM\",\"JJA\",\"SON\"])\n", "\n", "\n", "## loop over folds\n", "fold_outer = 1\n", "\n", "for train, test in cv_outer.split(Xdata, ydata):\n", " print(\"outer_fold = \"+str(fold_outer))\n", " print(train, test)\n", " \n", " ## define training and test data\n", " X_train = Xdata[train, :]\n", " y_train = ydata[train]\n", " dates_train = dates[train]\n", " \n", " X_test = Xdata[test, :]\n", " y_test = ydata[test]\n", " dates_test = dates[test]\n", " \n", " ## create model and tune it (with 1/7 of training data used for validation)\n", " NN_model = model\n", " NN_model.fit(X_train, y_train, validation_split=1/7, epochs=40, batch_size=200, callbacks=[stop_early])\n", " \n", " ## evaluate model\n", " y_pred = np.argmax(NN_model.predict(X_test), axis=1)+1\n", " y_tst = np.argmax(y_test, axis=1)+1\n", " \n", " ## overall accuracy\n", " acc_outer = sklearn.metrics.accuracy_score(y_tst, y_pred)\n", " accuracy_outer_post.loc[fold_outer, \"ANN\"] = acc_outer\n", " \n", " ## seasonal accuracy\n", " for i in [[12,1,2],[3,4,5],[6,7,8],[9,10,11]]:\n", " acc = sklearn.metrics.accuracy_score(y_tst[dates_test.month.isin(i)], y_pred[dates_test.month.isin(i)], normalize=True)\n", " if i == [12,1,2]:\n", " cc = \"DJF\"\n", " elif i == [3,4,5]:\n", " cc = \"MAM\"\n", " elif i == [6,7,8]:\n", " cc = \"JJA\"\n", " elif i == [9,10,11]:\n", " cc = \"SON\"\n", " accuracy_outer_post.loc[fold_outer, cc] = acc\n", " \n", " # Increase fold number\n", " fold_outer += 1\n", "\n", " " ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ANNDJFMAMJJASON
10.7901110.7853190.7785330.8271950.770604
20.7842320.7936290.8029890.7507080.788462
30.787690.7839340.7989130.781870.785714
40.7859610.7950140.7895480.767030.792582
50.772130.8088640.7719550.7228260.785714
60.7813910.7714680.7914890.7798910.782967
70.8069870.8328530.8021830.792120.802198
80.7672090.7945010.739130.7771740.759615
\n", "
" ], "text/plain": [ " ANN DJF MAM JJA SON\n", "1 0.790111 0.785319 0.778533 0.827195 0.770604\n", "2 0.784232 0.793629 0.802989 0.750708 0.788462\n", "3 0.78769 0.783934 0.798913 0.78187 0.785714\n", "4 0.785961 0.795014 0.789548 0.76703 0.792582\n", "5 0.77213 0.808864 0.771955 0.722826 0.785714\n", "6 0.781391 0.771468 0.791489 0.779891 0.782967\n", "7 0.806987 0.832853 0.802183 0.79212 0.802198\n", "8 0.767209 0.794501 0.73913 0.777174 0.759615" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## accuracy for all folds (overall and seasons)\n", "accuracy_outer_post" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ANN 0.784464\n", "DJF 0.795698\n", "MAM 0.784343\n", "JJA 0.774852\n", "SON 0.783482\n", "dtype: float64\n" ] } ], "source": [ "## average accuracy (overall and seasons)\n", "print(accuracy_outer_post.mean(axis=0))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## **4) Weather type reconstructions**\n", "\n", "reconstruct WTs from our dummy station variables and evaluate the reconstructions." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "723/723 [==============================] - 2s 3ms/step\n" ] } ], "source": [ "## create predictions with pre-trained model (section 3)\n", "preds = NN_model.predict(Xdata)\n", "preds_class = np.argmax(preds, axis=1)+1\n", "true_class = np.argmax(ydata, axis=1)+1" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
predictedtrue
1957-09-0113
1957-09-0222
1957-09-0333
1957-09-0455
1957-09-0534
.........
2020-12-2712
2020-12-2855
2020-12-2977
2020-12-3023
2020-12-3155
\n", "

23133 rows × 2 columns

\n", "
" ], "text/plain": [ " predicted true\n", "1957-09-01 1 3\n", "1957-09-02 2 2\n", "1957-09-03 3 3\n", "1957-09-04 5 5\n", "1957-09-05 3 4\n", "... ... ...\n", "2020-12-27 1 2\n", "2020-12-28 5 5\n", "2020-12-29 7 7\n", "2020-12-30 2 3\n", "2020-12-31 5 5\n", "\n", "[23133 rows x 2 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## create data frame with predicted and true WT time series\n", "WT_rec = pd.DataFrame([preds_class, true_class], columns=dates, index=[\"predicted\",\"true\"]).T\n", "WT_rec" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.8026196342886786" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## get overall accuracy\n", "sklearn.metrics.accuracy_score(WT_rec.true, WT_rec.predicted)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 123456789
181.9991063.3318435.6127016.1270130.0000002.9293380.0000000.0000000.000000
24.67853082.8645243.9322620.0000000.0000005.3960963.1285880.0000000.000000
314.0654506.01006971.8061675.2863442.7690370.0000000.0000000.0629330.000000
46.7150060.0000003.24716383.8902906.1160150.0000000.0000000.0315260.000000
50.0000000.0381243.58368313.38162475.4479600.0000000.0000007.5486080.000000
68.70657211.9296780.0837170.0000000.00000074.2570114.7300130.0000000.293010
70.0000008.6132180.0000000.0000000.0000003.73781185.3196100.0000002.329361
80.0000000.0894450.0894450.0000005.1878350.0000000.00000094.6332740.000000
90.0000000.0000000.0000000.0000000.0000001.76263217.8613400.00000080.376028
\n" ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## confusion matrix\n", "confmat = sklearn.metrics.confusion_matrix(WT_rec.true, WT_rec.predicted, labels = [1, 2, 3, 4, 5, 6, 7, 8, 9], normalize = \"true\")\n", "df = pd.DataFrame(confmat)*100\n", "df.index = df.columns = [1,2,3,4,5,6,7,8,9]\n", "df.style.background_gradient(cmap ='Blues').set_properties(**{'font-size': '20px'})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "colab": { "collapsed_sections": [ "Dit404ghtkSr", "K2DojogoQv4l", "rcuT4JYRtkS4", "YzpETsFotkTF", "agJ3gkg0tkTJ", "cgHlUGwDtkTW", "hh6I4wJLtkTa", "kcLUPxNwtkTd", "TE_HXLPjfRBt", "I3v3Qp2WfMoE" ], "name": "Tutorial_III_tf2_Fully_connected_NNs.ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python [conda env:.conda-ML]", "language": "python", "name": "conda-env-.conda-ML-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 4 }