Loading Python...
AusJournal Teaching & Learning Resources
Live LabBeginnerCybersecurity
Data Analytics for Cyber Security · Chapter 01

Introduction to Cybersecurity Analytics

Run Python live in your browser · Dataset embedded · SISTMR Australia · 2026
Dr. Pritam Gajkumar Shah

Dr. Pritam Gajkumar Shah

SISTMR Australia  ·  wsnpgs@gmail.com

CybersecurityData AnalyticsAusJournal 2026
📦

Download Lab Files

The dataset is embedded in this page — no extra files needed to run the lab here. Download only if you want to use Jupyter on your own computer.

📓

Python Notebook

Full .ipynb for local Jupyter use

.ipynb~110 KB
⬇ Download Notebook
📊

Dataset (CSV)

500 rows · 10 columns · Embedded

.csvEmbedded
⬇ Download Dataset
Dataset is fully embedded in this HTML file. Just open the file in a browser and click Run — nothing else needed.
🎯 Learning Objectives
  1. Load and inspect a SIEM-style security event log
  2. Count and visualise event types and severity levels
  3. Identify suspicious source IPs using frequency analysis
  4. Filter events by action type and severity
  5. Create your first security bar chart
Live Python in your browser! Wait for ✅ Python ready (bottom-right), then click any ▶ Run button. Charts appear inline.
SECTION 1

Library Check

In [1]
"kw">import pandas "kw">as pd, matplotlib, numpy "kw">as np
print("st">'pandas    :', pd.__version__)
print("st">'matplotlib:', matplotlib.__version__)
print("st">'numpy     :', np.__version__)
print()
print("st">'All libraries ready!')
Out [1]
SECTION 2

Load & Explore Dataset

In [2]
df = pd.read_csv("st">'ch01_security_event_log.csv')
print("st">'Shape:', df.shape[0], "st">'rows x', df.shape[1], "st">'columns')
print()
print("st">'Columns:', list(df.columns))
print()
print(df.head().to_string())
Out [2]
In [3]
"kw">import io "kw">as _io3, sys "kw">as _sys3
_b = _io3.StringIO()
_sys3.stdout = _b
df.info()
_sys3.stdout = _sys3.__stdout__
print(_b.getvalue())
Out [3]
In [4]
m = df.isnull().sum()
"kw">if m.sum() > 0:
    print("st">'Missing values found:')
    print(m[m > 0])
"kw">else:
    print("st">'No missing values - clean dataset!')
Out [4]
In [5]
print(df.describe().round(2).to_string())
Out [5]
SECTION 3

Event Type Analysis

In [6]
ec = df["st">'event_type'].value_counts()
print("st">'Event Type Distribution:')
print(ec)
print()
print("st">'Unique event types:', df["st">'event_type'].nunique())
Out [6]
In [7]
ec = df["st">'event_type'].value_counts()
fig, ax = plt.subplots(figsize=(9,4))
ec.plot(kind="st">'bar', ax=ax, color="st">'#1556A4', edgecolor="st">'white', linewidth=0.8)
ax.set_title("st">'Security Event Type Distribution', fontsize=13, fontweight="st">'bold', pad=12)
ax.set_xlabel("st">'Event Type', fontsize=11)
ax.set_ylabel("st">'Count', fontsize=11)
ax.tick_params(axis="st">'x', rotation=30)
ax.yaxis.grid("kw">True, alpha=0.3)
ax.set_axisbelow("kw">True)
"kw">for b "kw">in ax.patches:
    ax.text(b.get_x()+b.get_width()/2, b.get_height()+1, int(b.get_height()), ha="st">'center', fontsize=9)
plt.tight_layout()
buf = io.BytesIO()
plt.savefig(buf, format="st">'png', dpi=120, bbox_inches="st">'tight', facecolor="st">'white')
buf.seek(0)
print("st">'CHART:' + base64.b64encode(buf.read()).decode())
plt.close()
Out [7]
SECTION 4

Severity Analysis

In [8]
sc = df["st">'severity'].value_counts()
print("st">'Severity Distribution:')
print(sc)
crit = df[df["st">'severity']=="st">'Critical']
print(f"st">'\nCritical events: {len(crit)} ({len(crit)/len(df)*100:.1f}% of total)')
Out [8]
In [9]
sc = df["st">'severity'].value_counts()
colours = {"st">'Critical':"st">'#C0392B',"st">'High':"st">'#E8734A',"st">'Medium':"st">'#F39C12',"st">'Low':"st">'#27AE60',"st">'Info':"st">'#2E6DA4'}
fig, ax = plt.subplots(figsize=(6,5))
sc.plot(kind="st">'pie', ax=ax, colors=[colours.get(s,"st">'#999') "kw">for s "kw">in sc.index], autopct="st">'%1.1f%%', startangle=90, wedgeprops={"st">'edgecolor':"st">'white',"st">'linewidth':2})
ax.set_title("st">'Event Severity Breakdown', fontsize=13, fontweight="st">'bold')
ax.set_ylabel("st">'')
buf = io.BytesIO()
plt.savefig(buf, format="st">'png', dpi=120, bbox_inches="st">'tight', facecolor="st">'white')
buf.seek(0)
print("st">'CHART:' + base64.b64encode(buf.read()).decode())
plt.close()
Out [9]
SECTION 5

Suspicious IP Analysis

In [10]
top = df["st">'source_ip'].value_counts().head(10)
print("st">'Top 10 Source IPs:')
print(top)
Out [10]
In [11]
top = df["st">'source_ip'].value_counts().head(10)
fig, ax = plt.subplots(figsize=(9,4))
top.plot(kind="st">'barh', ax=ax, color="st">'#E8734A', edgecolor="st">'white')
ax.set_title("st">'Top 10 Source IPs by Event Volume', fontsize=13, fontweight="st">'bold')
ax.set_xlabel("st">'Number of Events')
ax.invert_yaxis()
ax.xaxis.grid("kw">True, alpha=0.3)
ax.set_axisbelow("kw">True)
buf = io.BytesIO()
plt.savefig(buf, format="st">'png', dpi=120, bbox_inches="st">'tight', facecolor="st">'white')
buf.seek(0)
print("st">'CHART:' + base64.b64encode(buf.read()).decode())
plt.close()
Out [11]
SECTION 6

Action Taken Analysis

In [12]
print("st">'Action Taken Distribution:')
print(df["st">'action_taken'].value_counts())
bl = df[df["st">'action_taken']=="st">'Blocked']
print(f"st">'\nBlocked events: {len(bl)}')
print("st">'\nSample blocked events:')
print(bl[["st">'timestamp',"st">'source_ip',"st">'event_type',"st">'severity']].head(5).to_string())
Out [12]
In [13]
cross = pd.crosstab(df["st">'event_type'], df["st">'action_taken'])
print("st">'Event Type x Action Taken:')
print(cross.to_string())
Out [13]
SECTION 7

🎯 Challenge Exercises

🧪
Type your answer in the editable cell and click ▶ Run to test it live. Reveal the model answer only after you try!

🔴 Exercise 1 — Suspicious Login Failures

Find all Login Failure events from a source IP appearing 5+ times. Print each IP and its count.

Hint Filter event_type == 'Login Failure', use value_counts() on source_ip, filter count ≥ 5.
In [ex1] — Your Answer
# Write your code here
Out [ex1]
Show Model Answer
Model Answer
lf = df[df["st">'event_type'] == "st">'Login Failure']
ic = lf["st">'source_ip'].value_counts()
susp = ic[ic >= 5]
print("st">'IPs "kw">with 5+ Login Failure events:')
print(susp)
print(f"st">'\nTotal suspicious IPs: {len(susp)}')
🔍 Real-world: These IPs = potential brute-force attackers. Repeated login failures = classic credential stuffing indicator.

🟠 Exercise 2 — Critical Event Response Rate

What percentage of Critical severity events resulted in a Blocked action?

Hint Filter severity == 'Critical', then filter action_taken == 'Blocked'. Divide and multiply by 100.
In [ex2] — Your Answer
# Write your code here
Out [ex2]
Show Model Answer
Model Answer
crit = df[df["st">'severity'] == "st">'Critical']
blk  = crit[crit["st">'action_taken'] == "st">'Blocked']
pct  = len(blk)/len(crit)*100
print(f"st">'Total Critical events : {len(crit)}')
print(f"st">'Blocked among Critical: {len(blk)}')
print(f"st">'\nPercentage Blocked    : {pct:.1f}%')
🔍 Real-world: 74% block rate is reasonable, but 26% unblocked critical events need immediate investigation.

🟢 Exercise 3 — Protocol Distribution Chart

Create a bar chart of events per protocol with different colours and count labels on top.

Hint Use df['protocol'].value_counts(), then .plot(kind='bar', color=[...]). Loop ax.patches to add labels.
In [ex3] — Your Answer
# Write your code here
Out [ex3]
Show Model Answer
Model Answer
pc = df["st">'protocol'].value_counts()
print("st">'Protocol Distribution:')
print(pc)
print()
fig, ax = plt.subplots(figsize=(7,4))
pc.plot(kind="st">'bar', ax=ax, color=["st">'#27AE60',"st">'#0A9B83',"st">'#FFB347'][:len(pc)], edgecolor="st">'white', width=0.5)
ax.set_title("st">'Events per Protocol', fontsize=13, fontweight="st">'bold')
ax.set_xlabel("st">'Protocol')
ax.set_ylabel("st">'Count')
ax.tick_params(axis="st">'x', rotation=0)
"kw">for b "kw">in ax.patches:
    ax.text(b.get_x()+b.get_width()/2, b.get_height()+1, int(b.get_height()), ha="st">'center', fontsize=11, fontweight="st">'bold')
plt.tight_layout()
buf = io.BytesIO()
plt.savefig(buf, format="st">'png', dpi=120, bbox_inches="st">'tight', facecolor="st">'white')
buf.seek(0)
print("st">'CHART:' + base64.b64encode(buf.read()).decode())
plt.close()
🔍 Real-world: High ICMP = ping sweeps/recon. TCP dominates because most attacks use TCP.
SUMMARY

✓ What You Learned

df.shape
Rows × columns
df.head()
Preview first 5 rows
df.info()
Data types and nulls
value_counts()
Frequency per category
df[df['col']=='x']
Filter rows by condition
pd.crosstab()
Cross-tabulate two columns
.plot(kind='bar')
Bar chart from a Series
.plot(kind='pie')
Pie chart from a Series
🏆
Real-world application: This is what a SOC analyst does first when triaging a new SIEM feed — understand the shape, find anomalies, flag high-risk IPs.
Course Progress1 / 12 Complete

▶ Next: Chapter 2 — Foundations of Data Analytics

Dr. Pritam Gajkumar Shah  ·  SISTMR Australia  ·  2026

wsnpgs@gmail.com

Published on AusJournal · All rights reserved · For educational use only