Data Analytics for Cyber Security · Chapter 01

Introduction to Cybersecurity Analytics

Run Python live in your browser · Dataset embedded · SISTMR Australia · 2026

Dr. Pritam Gajkumar Shah

CybersecurityData AnalyticsAusJournal 2026

📦

Download Lab Files

The dataset is embedded in this page — no extra files needed to run the lab here. Download only if you want to use Jupyter on your own computer.

📓

Python Notebook

Full .ipynb for local Jupyter use

.ipynb~110 KB

⬇ Download Notebook

📊

Dataset (CSV)

500 rows · 10 columns · Embedded

.csvEmbedded

⬇ Download Dataset

ⓘDataset is fully embedded in this HTML file. Just open the file in a browser and click Run — nothing else needed.

🎯 Learning Objectives

Load and inspect a SIEM-style security event log
Count and visualise event types and severity levels
Identify suspicious source IPs using frequency analysis
Filter events by action type and severity
Create your first security bar chart

⚡

Live Python in your browser! Wait for ✅ Python ready (bottom-right), then click any ▶ Run button. Charts appear inline.

SECTION 1

Library Check

In [1]

"kw">import pandas "kw">as pd, matplotlib, numpy "kw">as np
print("st">'pandas    :', pd.__version__)
print("st">'matplotlib:', matplotlib.__version__)
print("st">'numpy     :', np.__version__)
print()
print("st">'All libraries ready!')

Out [1]

SECTION 2

Load & Explore Dataset

In [2]

df = pd.read_csv("st">'ch01_security_event_log.csv')
print("st">'Shape:', df.shape[0], "st">'rows x', df.shape[1], "st">'columns')
print()
print("st">'Columns:', list(df.columns))
print()
print(df.head().to_string())

Out [2]

In [3]

"kw">import io "kw">as _io3, sys "kw">as _sys3
_b = _io3.StringIO()
_sys3.stdout = _b
df.info()
_sys3.stdout = _sys3.__stdout__
print(_b.getvalue())

Out [3]

In [4]

m = df.isnull().sum()
"kw">if m.sum() > 0:
    print("st">'Missing values found:')
    print(m[m > 0])
"kw">else:
    print("st">'No missing values - clean dataset!')

Out [4]

In [5]

print(df.describe().round(2).to_string())

Out [5]

SECTION 3

Event Type Analysis

In [6]

ec = df["st">'event_type'].value_counts()
print("st">'Event Type Distribution:')
print(ec)
print()
print("st">'Unique event types:', df["st">'event_type'].nunique())

Out [6]

In [7]

ec = df["st">'event_type'].value_counts()
fig, ax = plt.subplots(figsize=(9,4))
ec.plot(kind="st">'bar', ax=ax, color="st">'#1556A4', edgecolor="st">'white', linewidth=0.8)
ax.set_title("st">'Security Event Type Distribution', fontsize=13, fontweight="st">'bold', pad=12)
ax.set_xlabel("st">'Event Type', fontsize=11)
ax.set_ylabel("st">'Count', fontsize=11)
ax.tick_params(axis="st">'x', rotation=30)
ax.yaxis.grid("kw">True, alpha=0.3)
ax.set_axisbelow("kw">True)
"kw">for b "kw">in ax.patches:
    ax.text(b.get_x()+b.get_width()/2, b.get_height()+1, int(b.get_height()), ha="st">'center', fontsize=9)
plt.tight_layout()
buf = io.BytesIO()
plt.savefig(buf, format="st">'png', dpi=120, bbox_inches="st">'tight', facecolor="st">'white')
buf.seek(0)
print("st">'CHART:' + base64.b64encode(buf.read()).decode())
plt.close()

Out [7]

SECTION 4

Severity Analysis

In [8]

sc = df["st">'severity'].value_counts()
print("st">'Severity Distribution:')
print(sc)
crit = df[df["st">'severity']=="st">'Critical']
print(f"st">'\nCritical events: {len(crit)} ({len(crit)/len(df)*100:.1f}% of total)')

Out [8]

In [9]

sc = df["st">'severity'].value_counts()
colours = {"st">'Critical':"st">'#C0392B',"st">'High':"st">'#E8734A',"st">'Medium':"st">'#F39C12',"st">'Low':"st">'#27AE60',"st">'Info':"st">'#2E6DA4'}
fig, ax = plt.subplots(figsize=(6,5))
sc.plot(kind="st">'pie', ax=ax, colors=[colours.get(s,"st">'#999') "kw">for s "kw">in sc.index], autopct="st">'%1.1f%%', startangle=90, wedgeprops={"st">'edgecolor':"st">'white',"st">'linewidth':2})
ax.set_title("st">'Event Severity Breakdown', fontsize=13, fontweight="st">'bold')
ax.set_ylabel("st">'')
buf = io.BytesIO()
plt.savefig(buf, format="st">'png', dpi=120, bbox_inches="st">'tight', facecolor="st">'white')
buf.seek(0)
print("st">'CHART:' + base64.b64encode(buf.read()).decode())
plt.close()

Out [9]

SECTION 5

Suspicious IP Analysis

In [10]

top = df["st">'source_ip'].value_counts().head(10)
print("st">'Top 10 Source IPs:')
print(top)

Out [10]

In [11]

top = df["st">'source_ip'].value_counts().head(10)
fig, ax = plt.subplots(figsize=(9,4))
top.plot(kind="st">'barh', ax=ax, color="st">'#E8734A', edgecolor="st">'white')
ax.set_title("st">'Top 10 Source IPs by Event Volume', fontsize=13, fontweight="st">'bold')
ax.set_xlabel("st">'Number of Events')
ax.invert_yaxis()
ax.xaxis.grid("kw">True, alpha=0.3)
ax.set_axisbelow("kw">True)
buf = io.BytesIO()
plt.savefig(buf, format="st">'png', dpi=120, bbox_inches="st">'tight', facecolor="st">'white')
buf.seek(0)
print("st">'CHART:' + base64.b64encode(buf.read()).decode())
plt.close()

Out [11]

SECTION 6

Action Taken Analysis

In [12]

print("st">'Action Taken Distribution:')
print(df["st">'action_taken'].value_counts())
bl = df[df["st">'action_taken']=="st">'Blocked']
print(f"st">'\nBlocked events: {len(bl)}')
print("st">'\nSample blocked events:')
print(bl[["st">'timestamp',"st">'source_ip',"st">'event_type',"st">'severity']].head(5).to_string())

Out [12]

In [13]

cross = pd.crosstab(df["st">'event_type'], df["st">'action_taken'])
print("st">'Event Type x Action Taken:')
print(cross.to_string())

Out [13]

SECTION 7

🎯 Challenge Exercises

🧪

Type your answer in the editable cell and click ▶ Run to test it live. Reveal the model answer only after you try!

🔴 Exercise 1 — Suspicious Login Failures

Find all Login Failure events from a source IP appearing 5+ times. Print each IP and its count.

Hint Filter event_type == 'Login Failure', use value_counts() on source_ip, filter count ≥ 5.

In [ex1] — Your Answer

# Write your code here

Out [ex1]

Show Model Answer

Model Answer

lf = df[df["st">'event_type'] == "st">'Login Failure']
ic = lf["st">'source_ip'].value_counts()
susp = ic[ic >= 5]
print("st">'IPs "kw">with 5+ Login Failure events:')
print(susp)
print(f"st">'\nTotal suspicious IPs: {len(susp)}')

🔍 Real-world: These IPs = potential brute-force attackers. Repeated login failures = classic credential stuffing indicator.

🟠 Exercise 2 — Critical Event Response Rate

What percentage of Critical severity events resulted in a Blocked action?

Hint Filter severity == 'Critical', then filter action_taken == 'Blocked'. Divide and multiply by 100.

In [ex2] — Your Answer

# Write your code here

Out [ex2]

Show Model Answer

Model Answer

crit = df[df["st">'severity'] == "st">'Critical']
blk  = crit[crit["st">'action_taken'] == "st">'Blocked']
pct  = len(blk)/len(crit)*100
print(f"st">'Total Critical events : {len(crit)}')
print(f"st">'Blocked among Critical: {len(blk)}')
print(f"st">'\nPercentage Blocked    : {pct:.1f}%')

🔍 Real-world: 74% block rate is reasonable, but 26% unblocked critical events need immediate investigation.

🟢 Exercise 3 — Protocol Distribution Chart

Create a bar chart of events per protocol with different colours and count labels on top.

Hint Use df['protocol'].value_counts(), then .plot(kind='bar', color=[...]). Loop ax.patches to add labels.

In [ex3] — Your Answer

# Write your code here

Out [ex3]

Show Model Answer

Model Answer

pc = df["st">'protocol'].value_counts()
print("st">'Protocol Distribution:')
print(pc)
print()
fig, ax = plt.subplots(figsize=(7,4))
pc.plot(kind="st">'bar', ax=ax, color=["st">'#27AE60',"st">'#0A9B83',"st">'#FFB347'][:len(pc)], edgecolor="st">'white', width=0.5)
ax.set_title("st">'Events per Protocol', fontsize=13, fontweight="st">'bold')
ax.set_xlabel("st">'Protocol')
ax.set_ylabel("st">'Count')
ax.tick_params(axis="st">'x', rotation=0)
"kw">for b "kw">in ax.patches:
    ax.text(b.get_x()+b.get_width()/2, b.get_height()+1, int(b.get_height()), ha="st">'center', fontsize=11, fontweight="st">'bold')
plt.tight_layout()
buf = io.BytesIO()
plt.savefig(buf, format="st">'png', dpi=120, bbox_inches="st">'tight', facecolor="st">'white')
buf.seek(0)
print("st">'CHART:' + base64.b64encode(buf.read()).decode())
plt.close()

🔍 Real-world: High ICMP = ping sweeps/recon. TCP dominates because most attacks use TCP.

SUMMARY

✓ What You Learned

df.shape

Rows × columns

df.head()

Preview first 5 rows

df.info()

Data types and nulls

value_counts()

Frequency per category

df[df['col']=='x']

Filter rows by condition

pd.crosstab()

Cross-tabulate two columns

.plot(kind='bar')

Bar chart from a Series

.plot(kind='pie')

Pie chart from a Series

🏆

Real-world application: This is what a SOC analyst does first when triaging a new SIEM feed — understand the shape, find anomalies, flag high-risk IPs.

Course Progress1 / 12 Complete

▶ Next: Chapter 2 — Foundations of Data Analytics