Data Analytics for Cyber Security · Chapter 01
Introduction to Cybersecurity Analytics
Run Python live in your browser · Dataset embedded · SISTMR Australia · 2026
Dr. Pritam Gajkumar Shah
SISTMR Australia · wsnpgs@gmail.com
Cybersecurity Data Analytics AusJournal 2026
📦
Download Lab Files
The dataset is embedded in this page — no extra files needed to run the lab here. Download only if you want to use Jupyter on your own computer.
ⓘ Dataset is fully embedded in this HTML file. Just open the file in a browser and click Run — nothing else needed.
🎯 Learning Objectives
Load and inspect a SIEM-style security event log
Count and visualise event types and severity levels
Identify suspicious source IPs using frequency analysis
Filter events by action type and severity
Create your first security bar chart
⚡ Live Python in your browser! Wait for ✅ Python ready (bottom-right), then click any ▶ Run button. Charts appear inline.
▶▶ Run All Cells
SECTION 1
Library Check
In [1]
▶ Run
"kw" >import pandas "kw" >as pd, matplotlib, numpy "kw" >as np
print ("st" >'pandas :', pd.__version__)
print ("st" >'matplotlib:', matplotlib.__version__)
print ("st" >'numpy :', np.__version__)
print ()
print ("st" >'All libraries ready!')
SECTION 2
Load & Explore Dataset
In [2]
▶ Run
df = pd.read_csv ("st" >'ch01_security_event_log.csv')
print ("st" >'Shape:', df.shape[0 ], "st" >'rows x', df.shape[1 ], "st" >'columns')
print ()
print ("st" >'Columns:', list (df.columns))
print ()
print (df.head ().to_string ())
In [3]
▶ Run
"kw" >import io "kw" >as _io3, sys "kw" >as _sys3
_b = _io3.StringIO()
_sys3.stdout = _b
df.info ()
_sys3.stdout = _sys3.__stdout__
print (_b.getvalue ())
In [4]
▶ Run
m = df.isnull ().sum ()
"kw" >if m.sum () > 0 :
print ("st" >'Missing values found:')
print (m[m > 0 ])
"kw" >else:
print ("st" >'No missing values - clean dataset!')
In [5]
▶ Run
print (df.describe ().round (2 ).to_string ())
SECTION 3
Event Type Analysis
In [6]
▶ Run
ec = df["st" >'event_type'].value_counts ()
print ("st" >'Event Type Distribution:')
print (ec)
print ()
print ("st" >'Unique event types:', df["st" >'event_type'].nunique ())
In [7]
▶ Run
ec = df["st" >'event_type'].value_counts ()
fig, ax = plt.subplots (figsize=(9 ,4 ))
ec.plot (kind="st" >'bar', ax=ax, color="st" >'#1556A4', edgecolor="st" >'white', linewidth=0.8 )
ax.set_title ("st" >'Security Event Type Distribution', fontsize=13 , fontweight="st" >'bold', pad=12 )
ax.set_xlabel ("st" >'Event Type', fontsize=11 )
ax.set_ylabel ("st" >'Count', fontsize=11 )
ax.tick_params (axis="st" >'x', rotation=30 )
ax.yaxis.grid ("kw" >True, alpha=0.3 )
ax.set_axisbelow ("kw" >True)
"kw" >for b "kw" >in ax.patches:
ax.text (b.get_x ()+b.get_width ()/2 , b.get_height ()+1 , int (b.get_height ()), ha="st" >'center', fontsize=9 )
plt.tight_layout ()
buf = io.BytesIO()
plt.savefig (buf, format="st" >'png', dpi=120 , bbox_inches="st" >'tight', facecolor="st" >'white')
buf.seek (0 )
print ("st" >'CHART:' + base64.b64encode (buf.read ()).decode ())
plt.close ()
SECTION 4
Severity Analysis
In [8]
▶ Run
sc = df["st" >'severity'].value_counts ()
print ("st" >'Severity Distribution:')
print (sc)
crit = df[df["st" >'severity']=="st" >'Critical']
print (f"st" >'\nCritical events: {len (crit)} ({len (crit)/len (df)*100 :.1f}% of total)')
In [9]
▶ Run
sc = df["st" >'severity'].value_counts ()
colours = {"st" >'Critical':"st" >'#C0392B',"st" >'High':"st" >'#E8734A',"st" >'Medium':"st" >'#F39C12',"st" >'Low':"st" >'#27AE60',"st" >'Info':"st" >'#2E6DA4'}
fig, ax = plt.subplots (figsize=(6 ,5 ))
sc.plot (kind="st" >'pie', ax=ax, colors=[colours.get (s,"st" >'#999 ') "kw" >for s "kw" >in sc.index], autopct="st" >'%1. 1f%%', startangle=90 , wedgeprops={"st" >'edgecolor':"st" >'white',"st" >'linewidth':2 })
ax.set_title ("st" >'Event Severity Breakdown', fontsize=13 , fontweight="st" >'bold')
ax.set_ylabel ("st" >'')
buf = io.BytesIO()
plt.savefig (buf, format="st" >'png', dpi=120 , bbox_inches="st" >'tight', facecolor="st" >'white')
buf.seek (0 )
print ("st" >'CHART:' + base64.b64encode (buf.read ()).decode ())
plt.close ()
SECTION 5
Suspicious IP Analysis
In [10]
▶ Run
top = df["st" >'source_ip'].value_counts ().head (10 )
print ("st" >'Top 10 Source IPs:')
print (top)
In [11]
▶ Run
top = df["st" >'source_ip'].value_counts ().head (10 )
fig, ax = plt.subplots (figsize=(9 ,4 ))
top.plot (kind="st" >'barh', ax=ax, color="st" >'#E8734A', edgecolor="st" >'white')
ax.set_title ("st" >'Top 10 Source IPs by Event Volume', fontsize=13 , fontweight="st" >'bold')
ax.set_xlabel ("st" >'Number of Events')
ax.invert_yaxis ()
ax.xaxis.grid ("kw" >True, alpha=0.3 )
ax.set_axisbelow ("kw" >True)
buf = io.BytesIO()
plt.savefig (buf, format="st" >'png', dpi=120 , bbox_inches="st" >'tight', facecolor="st" >'white')
buf.seek (0 )
print ("st" >'CHART:' + base64.b64encode (buf.read ()).decode ())
plt.close ()
SECTION 6
Action Taken Analysis
In [12]
▶ Run
print ("st" >'Action Taken Distribution:')
print (df["st" >'action_taken'].value_counts ())
bl = df[df["st" >'action_taken']=="st" >'Blocked']
print (f"st" >'\nBlocked events: {len (bl)}')
print ("st" >'\nSample blocked events:')
print (bl[["st" >'timestamp',"st" >'source_ip',"st" >'event_type',"st" >'severity']].head (5 ).to_string ())
In [13]
▶ Run
cross = pd.crosstab (df["st" >'event_type'], df["st" >'action_taken'])
print ("st" >'Event Type x Action Taken:')
print (cross.to_string ())
SECTION 7
🎯 Challenge Exercises
🧪 Type your answer in the editable cell and click ▶ Run to test it live. Reveal the model answer only after you try!
🔴 Exercise 1 — Suspicious Login Failures Find all Login Failure events from a source IP appearing 5+ times . Print each IP and its count.
Hint Filter event_type == 'Login Failure', use value_counts() on source_ip, filter count ≥ 5.
In [ex1] — Your Answer
▶ Run
# Write your code here
Show Model Answer
Model Answer
▶ Run Answer
lf = df[df["st" >'event_type'] == "st" >'Login Failure']
ic = lf["st" >'source_ip'].value_counts ()
susp = ic[ic >= 5 ]
print ("st" >'IPs "kw" >with 5 + Login Failure events:')
print (susp)
print (f"st" >'\nTotal suspicious IPs: {len (susp)}')
🔍 Real-world: These IPs = potential brute-force attackers. Repeated login failures = classic credential stuffing indicator.
🟠 Exercise 2 — Critical Event Response Rate What percentage of Critical severity events resulted in a Blocked action?
Hint Filter severity == 'Critical', then filter action_taken == 'Blocked'. Divide and multiply by 100.
In [ex2] — Your Answer
▶ Run
# Write your code here
Show Model Answer
Model Answer
▶ Run Answer
crit = df[df["st" >'severity'] == "st" >'Critical']
blk = crit[crit["st" >'action_taken'] == "st" >'Blocked']
pct = len (blk)/len (crit)*100
print (f"st" >'Total Critical events : {len (crit)}')
print (f"st" >'Blocked among Critical: {len (blk)}')
print (f"st" >'\nPercentage Blocked : {pct:.1f}%')
🔍 Real-world: 74% block rate is reasonable, but 26% unblocked critical events need immediate investigation.
🟢 Exercise 3 — Protocol Distribution Chart Create a bar chart of events per protocol with different colours and count labels on top.
Hint Use df['protocol'].value_counts(), then .plot(kind='bar', color=[...]). Loop ax.patches to add labels.
In [ex3] — Your Answer
▶ Run
# Write your code here
Show Model Answer
Model Answer
▶ Run Answer
pc = df["st" >'protocol'].value_counts ()
print ("st" >'Protocol Distribution:')
print (pc)
print ()
fig, ax = plt.subplots (figsize=(7 ,4 ))
pc.plot (kind="st" >'bar', ax=ax, color=["st" >'#27AE60',"st" >'#0A9B83',"st" >'#FFB347'][:len (pc)], edgecolor="st" >'white', width=0.5 )
ax.set_title ("st" >'Events per Protocol', fontsize=13 , fontweight="st" >'bold')
ax.set_xlabel ("st" >'Protocol')
ax.set_ylabel ("st" >'Count')
ax.tick_params (axis="st" >'x', rotation=0 )
"kw" >for b "kw" >in ax.patches:
ax.text (b.get_x ()+b.get_width ()/2 , b.get_height ()+1 , int (b.get_height ()), ha="st" >'center', fontsize=11 , fontweight="st" >'bold')
plt.tight_layout ()
buf = io.BytesIO()
plt.savefig (buf, format="st" >'png', dpi=120 , bbox_inches="st" >'tight', facecolor="st" >'white')
buf.seek (0 )
print ("st" >'CHART:' + base64.b64encode (buf.read ()).decode ())
plt.close ()
🔍 Real-world: High ICMP = ping sweeps/recon. TCP dominates because most attacks use TCP.
SUMMARY
✓ What You Learned
df.head()
Preview first 5 rows
df.info()
Data types and nulls
value_counts()
Frequency per category
df[df['col']=='x']
Filter rows by condition
pd.crosstab()
Cross-tabulate two columns
.plot(kind='bar')
Bar chart from a Series
.plot(kind='pie')
Pie chart from a Series
🏆 Real-world application: This is what a SOC analyst does first when triaging a new SIEM feed — understand the shape, find anomalies, flag high-risk IPs.
Course Progress 1 / 12 Complete
▶ Next: Chapter 2 — Foundations of Data Analytics