5. Exploratory Data Analysis (EDA) for Time Series#

Before diving deep into time series modeling, it’s essential to understand the underlying characteristics of the data. That’s where Exploratory Data Analysis (EDA) plays a crucial role. For time series data, EDA helps in:

  1. Understanding underlying patterns such as trends and seasonality.

  2. Identifying outliers or any unusual data points.

  3. Validating assumptions related to time series forecasting models.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Generate sample time series data
time = np.linspace(0, 2 * np.pi, 100)
trend = time * 0.5
seasonality = np.sin(time)
noise = np.random.normal(0, 0.5, 100)
timeseries = trend + seasonality + noise
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 3))  # Adjust the figure size as per your preference
plt.plot(time, timeseries)
plt.xlabel('Time', fontsize=10)  # Adjust x-axis label font size
plt.ylabel('Value', fontsize=10)  # Adjust y-axis label font size
plt.title('Time Series Data', fontsize=12)  # Adjust title font size

# Adjust tick label font size for both x and y axes
plt.xticks(fontsize=8)
plt.yticks(fontsize=8)

plt.grid(True)
plt.tight_layout()  # Ensure all elements fit within the figure
plt.show()
../_images/11e068f2dd038ad79bc7bf45a83673001aee01aed4a3b1608bd1aae39b62f6a9.png

5.1 Visualizing Time Series Data#

The simplest and often the most informative EDA technique is visualization.

  • Line Plots: This is the most common technique, where time is on the x-axis, and the metric is on the y-axis.

5.3 ACF and PACF plots#

These are crucial plots when it comes to understanding the auto-correlation characteristics of a time series:

  • ACF (Auto-Correlation Function): It gives us the correlation of the series with its lagged values. It’s helpful to determine the order of the moving average (MA) component in an ARIMA model.

  • PACF (Partial Auto-Correlation Function): It gives the correlation which is not explained by the previous lags. Helps determine the order of the autoregressive (AR) component in an ARIMA model.

Here’s how you generate them:

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt

# Create a figure with subplots for autocorrelation and partial autocorrelation plots
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(8, 8))  # Adjust the figure size as per your preference

# Plot the autocorrelation function (ACF)
plot_acf(timeseries, lags=20, ax=ax1)
ax1.set_title('Autocorrelation Function (ACF)', fontsize=9)  # Adjust title font size
ax1.set_xlabel('Lags', fontsize=9)  # Adjust x-axis label font size
ax1.set_ylabel('ACF', fontsize=9)  # Adjust y-axis label font size
ax1.grid(True)  # Add grid

# Plot the partial autocorrelation function (PACF)
plot_pacf(timeseries, lags=20, ax=ax2)
ax2.set_title('Partial Autocorrelation Function (PACF)', fontsize=9)  # Adjust title font size
ax2.set_xlabel('Lags', fontsize=9)  # Adjust x-axis label font size
ax2.set_ylabel('PACF', fontsize=9)  # Adjust y-axis label font size
ax2.grid(True)  # Add grid

# Adjust tick label font size for both subplots
for ax in [ax1, ax2]:
    ax.tick_params(axis='both', labelsize=9)

plt.tight_layout()  # Ensure all elements fit within the figure
plt.show()
../_images/b61e734eacdf14ff7fc067823813a0ffc30fa5e5be10cd36b377f1706268fb17.png

Understanding ACF and PACF:

  • A slow decay in ACF suggests a moving average component.

  • A sharp drop in PACF after a certain lag k suggests that an autoregressive term up to k is significant.

Conclusion:

Exploratory Data Analysis (EDA) is an essential first step in time series analysis, just as in any other data analysis task. Visualization helps to get a feel for the time series data. Decomposition helps in understanding its components, and ACF & PACF plots assist in understanding the correlation characteristics, which are vital for models like ARIMA.