Analyzing Climatic Trends with pymannkendall Python package

Photo by NOAA on Unsplash

Analyzing Climatic Trends with pymannkendall Python package

Harnessing Python for Vital Climate Statistics: Your Guide to Analytical Insights.

Sanskriti Blog's photo

4 min read

In the field of climate science, analyzing long-term trends is crucial. Python, a versatile programming language, paired with specific packages, makes this analysis more accessible and insightful. Here, we show a Python script designed to assess climatic trends, focusing on its components, logic, and the incorporation of the Mann-Kendall test.

The Mann-Kendall Test

This test is a non-parametric method used to identify a trend in a series of values. It's widely used in climatological and hydrological studies for its effectiveness in handling non-normal datasets and its robustness against abrupt breaks due to inhomogeneous time series.

Packages Used

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import linregress
import pymannkendall as mk

pandas (pd): A powerful data manipulation and analysis library for Python. It provides data structures like DataFrames, making it easy to handle structured data efficiently.

numpy (np): A fundamental package for scientific computing in Python. It offers support for large, multi-dimensional arrays and matrices, along with a wide range of mathematical functions to operate on these arrays.

matplotlib.pyplot (plt): A plotting library in Python that provides a MATLAB-like interface. It's used for creating static, interactive, and animated visualizations in Python.

scipy.stats (linregress): Part of the SciPy library, this module provides a linear regression function to fit a line to data points, useful for detecting trends in datasets.

pymannkendall (mk): A Python package specifically designed for the Mann-Kendall trend test. This non-parametric test is used to assess the presence of a trend in time-series data, common in environmental and climatological studies.

Data Preparation

We start by loading an Excel file located at a specified file path into a DataFrame using pandas. Typically, this file contains over 10,000 rows of climatic data of a specific city.

The data is then filtered to focus on a specific station (in this case, with the identifier 42971) and for a specific time period, the Post-Monsoon months (October and November for the Indian subcontinent). This step narrows down the dataset to the relevant subset for analysis.

Finally, the filtered data is then grouped by year. For each year, the code calculates the mean (average) of the climatic parameters for the Post-Monsoon months. This step helps in understanding the annual climatic trends for these specific months at the specified station.

# Load the dataset
file_path = 'your path main.xlsx'
data = pd.read_excel(file_path)

# Filter the data for station 42971 and Post-Monsoon months (October, November)
station_data = data[(data['Index'] == 42971) & (data['Month'].isin([10, 11]))]

# Group by Year and calculate mean for Post-Monsoon months
annual_data = station_data.groupby('Year').mean()

Here, we focus on both plotting and statistical analysis of the annual climatological data. Essentially, it creates a series of plots, one for each specified parameter (like DTR, Tmax, Tmin, Rainfall). For each parameter, the code performs linear regression to draw a trend line and applies the Mann-Kendall test to assess the significance of the trend. The linear regression calculates the slope and other statistics of the trend line, while the Mann-Kendall test evaluates whether there is a statistically significant trend over time in the data. The output is a set of graphs, each showing the trend in a specific climatological parameter for post-monsoon months at station 42971.

# Plotting and analysis
plt.figure(figsize=(15, 10))

for i, param in enumerate(parameters, 1):
    plt.subplot(2, 2, i)

    # Linear regression for trend line
    slope, intercept, r_value, p_value, std_err = linregress(annual_data.index, annual_data[param])
    line = slope * annual_data.index + intercept

    # Kendall Tau and Sen's Slope
    mk_result = mk.original_test(annual_data[param])
    kendall_tau = mk_result.Tau
    sen_slope = mk_result.slope

The script outputs a series of plots for each parameter, showing the data points, trend line, and calculated statistics. This visual representation aids in understanding the changing climatic patterns over the years for the given station. Additionally, a summary table is generated, encapsulating the trend analysis results, which can be saved for further reference.

   # Plotting
    plt.scatter(annual_data.index, annual_data[param], label='Data')
    plt.plot(annual_data.index, line, color='red', label=f'Trend line (slope: {slope:.2f}, R²: {r_value**2:.2f})')
    plt.title(f'Post-Monsoon {param} for Station 42971')

    # Add table data
        'Parameter': param,
        'Slope': round(slope, 2),
        'R²': round(r_value**2, 2),
        'P-value': round(p_value, 3),
        'Kendall’s Tau': round(kendall_tau, 3),
        'Sen’s Slope': round(sen_slope, 3),
        'Trend': 'Positive' if slope > 0 else 'Negative' if slope < 0 else 'No Trend'

Use and adapt as needed


# Convert the table data to DataFrame
results_table = pd.DataFrame(table_data)

# Print the table
print("Trend Analysis Table for Station 42971 (Post-Monsoon)")

# Optionally, save the table to a CSV file
results_table.to_csv('outputfilename.csv', index=False)

Good luck.

Please visit the official website of Global Climate Association for more interesting information on climate science and literacy tools and initiatives