Projects Notes
Email

> ← Home

About This Project

I created this project to explore what drives engagement on Medium — using public data and hands-on Python analysis.

I analyzed ~1,800 Medium articles to uncover patterns in claps, reading time, and responses. This page walks through the full process, including: • Cleaning and transforming the data • Generating summary statistics and visualizations • Extracting insights useful for content planning

All visuals and analysis are now fully integrated below using Python and Quarto — no external dashboards required.


Tools & Skills


Dataset


Key Findings


Why I Did This

This project simulates a real-world editorial question:

“What types of stories perform best on Medium, and how can writers use data to plan better content?”

By combining public data, Python, and lightweight reporting tools, I aimed to show how a modern content strategist might use data.


Analysis Summary (Python)

To start the analysis, I loaded the dataset and imported key libraries used for data wrangling and visualization:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")

df = pd.read_csv("data/medium_data.csv")

The data includes article metadata such as publication, reading time, claps, and responses. Next, I cleaned and prepared the dataset for analysis.

Data Cleaning

Before analyzing the data, I cleaned and prepared it by:

df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df = df[df['claps'] > 0].copy()
df['claps_per_min'] = df['claps'] / df['reading_time']

Descriptive Stats

df[['claps', 'reading_time', 'responses']].describe()
claps reading_time responses
count 2420.000000 2420.000000 2420.000000
mean 367.808678 7.495041 5.722727
std 679.184481 3.703285 12.958243
min 1.000000 1.000000 0.000000
25% 62.000000 5.000000 0.000000
50% 155.000000 7.000000 2.000000
75% 382.250000 9.000000 5.000000
max 11100.000000 43.000000 207.000000

Visualizing Distributions

To understand the general spread and outliers in the dataset, I visualized the distributions of claps, reading time, and responses using boxplots. This helped surface skewed engagement patterns and typical article behavior.

plt.figure(figsize=(15, 5))

# Claps
plt.subplot(1, 3, 1)
sns.boxplot(x=df['claps'])
plt.title('Claps')

# Reading Time
plt.subplot(1, 3, 2)
sns.boxplot(x=df['reading_time'])
plt.title('Reading Time')

# Responses
plt.subplot(1, 3, 3)
sns.boxplot(x=df['responses'])
plt.title('Responses')

plt.tight_layout(pad=2)
plt.show()

Boxplots show that: