Skip to main content

Command Palette

Search for a command to run...

Day 11 - EDA Completed

Updated
3 min read
Y
Building my way into corporate life.

It Was Not the Graph That Was Confusing

Today's small discovery was this. Sometimes a graph looks confusing not because it is complex but because the brain is looking at it the wrong way.

Two visualizations came up while going deep into the network dataset. A boxplot and a heatmap. First glance they looked like abstract art. But after slowing down and reading them like a story they actually started saying something real about network behavior.

The Boxplot — Where Normal Lives

Imagine looking at thousands of network connections happening across a system. Most of them are quick and routine. They start, exchange a little data, and end.

A boxplot answers one simple question. What does normal look like?

Most connections cluster very close to zero seconds in duration. That dense region on the left side of the plot is just typical network behavior. Short, fast interactions.

Then you notice something else. A long line stretching to the right. And scattered dots sitting far away from the cluster.

Those dots are outliers.

Outliers are not automatically attacks but they are unusual. They represent connections lasting far longer than everything else. And that raises an obvious question. If most connections last a few seconds why do some last tens of thousands of seconds?

Could be persistent automated traffic. Could be abnormal system behavior. Could be potential attack activity.

The boxplot does not accuse anything of being malicious. It just highlights behavior that deserves a second look. And that alone is actually powerful.

The Heatmap — How Features Move Together

Then came the heatmap. First impression was a colorful grid that meant nothing. But it is really a map of relationships between features in the dataset.

Each square compares two variables. The color shows how strongly they move together.

The scale runs from positive 1 which means a strong positive relationship, to zero which means no relationship, to negative 1 which means a strong negative relationship.

First thing that stands out is the bright diagonal line running through the middle. That just means each feature perfectly correlates with itself. That always happens in a correlation matrix, nothing unusual there.

The interesting part is everything else.

Certain patches glow brighter than others. Those bright regions mean some features rise and fall together. If the number of bytes sent increases the number of bytes received might increase too. When connection duration grows longer packet counts might grow with it.

These relationships are patterns inside the network behavior. And patterns are exactly what machine learning models learn from.

Why Both of These Actually Matter

Together they tell a simple but meaningful story.

The boxplot shows what stands out. The heatmap shows what moves together.

One reveals unusual observations. The other reveals relationships between features. And once both of those things are visible something becomes clear. The dataset is not just a random collection of numbers. It contains structure. It contains behavior. It contains patterns that a model can actually learn from.

The Small Realization

Sometimes the hardest part of data analysis is not the algorithms. It is learning how to see what the data is already trying to say.

The boxplot whispers that most things behave like this except for these few strange cases. The heatmap quietly adds that some of these variables move together.

Once those messages land the graphs stop looking like noise. They start looking like insight.