In today’s cyber threat landscape, data exfiltration is one of the most low-and-slow and damaging tactics employed by threat actors. Attackers often aim to extract sensitive information over long periods, staying under the radar by mimicking legitimate network behaviour. As defenders, our goal is to proactively detect such anomalies before significant damage is done.
What makes exfiltration particularly dangerous is its stealthy nature. Adversaries often operate under the radar, exfiltrating data in small amounts over extended periods to avoid detection. These operations can mimic legitimate traffic patterns, leverage trusted services, and exploit weak monitoring practices, making traditional detection methods ineffective.
The consequences of a successful data exfiltration incident can be devastating. Organisations face risks such as:
- Severe regulatory penalties and legal action (especially under frameworks like GDPR or HIPAA),
- Long-term reputational damage and erosion of public trust,
- Competitive disadvantage due to the loss of proprietary data, and
- Operational disruption and the high cost of incident response.
To mitigate these threats, we have developed machine learning (ML) based models that analyse network traffic logs and identify unusual data movement patterns in near real-time. In this blog, we present three AI/ML-driven use cases designed to detect potential exfiltration attempts, using a combination of session-based and time-windowed traffic analysis.
1. Detecting Unusual Data Transfers Between Host Pairs Within Active Sessions
This use case is focused on identifying abnormal volumes of data being transferred between specific internal and external IP pairs within the boundaries of a session. By establishing behavioral baselines for communication patterns between known host pairs, we can flag sessions where the data transferred significantly exceeds typical values.
Such detections can highlight:
- Sudden spikes in session size from internal workstations,
- Lateral movement followed by data offloading,
- One-off sessions to previously unseen external IPs with large outbound payloads.
This approach helps detect targeted, session-based exfiltration, often used by advanced persistent threats (APTs) or insiders.
2. Monitoring Data Movement to External Organizations Across Sessions
Adversaries often use infrastructure hosted by cloud providers, VPS (Virtual Private Server) services, or even compromised legitimate domains to blend in. This use case aims to track data being sent to broader destination categories, such as an Autonomous System Number (ASN) or hosting organisation, instead of single IPs.
By doing this, we enhance our ability to:
- Identify distributed exfiltration to multiple IPs within the same ASN (e.g., Amazon, Microsoft),
- Catch abuse of whitelisted infrastructure that is generally considered safe,
- Monitor persistent low-volume data flows to unknown or risky organisations.
This organisational-level visibility gives security teams a higher-level view of how data might be leaking out, even if no single IP raises a red flag.
3. Monitoring Sustained Outbound Transfers to Reveal Potential P2P Communications
Peer-to-peer (P2P) communication channels are often used by malware for command-and-control (C2) operations, file synchronization, or hidden data exfiltration. This use case focuses on identifying internal hosts with consistent outbound traffic behaviour that doesn’t match their usual activity patterns.
These behaviors may include:
- Regular communication intervals to multiple unknown IPs,
- Small but steady amounts of data leaving the network over time,
- Use of non-standard ports or protocols associated with P2P tools.
By monitoring hourly aggregated traffic volumes and analyzing long-term trends, we can detect subtle, sustained exfiltration patterns often associated with P2P based malware families.
In today’s cyber threat landscape, data exfiltration is one of the most low-and-slow and damaging tactics employed by threat actors. Attackers often aim to extract sensitive information over long periods, staying under the radar by mimicking legitimate network behaviour. As defenders, our goal is to proactively detect such anomalies before significant damage is done.
What makes exfiltration particularly dangerous is its stealthy nature. Adversaries often operate under the radar, exfiltrating data in small amounts over extended periods to avoid detection. These operations can mimic legitimate traffic patterns, leverage trusted services, and exploit weak monitoring practices, making traditional detection methods ineffective.
The consequences of a successful data exfiltration incident can be devastating. Organisations face risks such as:
- Severe regulatory penalties and legal action (especially under frameworks like GDPR or HIPAA),
- Long-term reputational damage and erosion of public trust,
- Competitive disadvantage due to the loss of proprietary data, and
- Operational disruption and the high cost of incident response.
To mitigate these threats, we have developed machine learning (ML) based models that analyse network traffic logs and identify unusual data movement patterns in near real-time. In this blog, we present three AI/ML-driven use cases designed to detect potential exfiltration attempts, using a combination of session-based and time-windowed traffic analysis.
1. Detecting Unusual Data Transfers Between Host Pairs Within Active Sessions
This use case is focused on identifying abnormal volumes of data being transferred between specific internal and external IP pairs within the boundaries of a session. By establishing behavioral baselines for communication patterns between known host pairs, we can flag sessions where the data transferred significantly exceeds typical values.
Such detections can highlight:
- Sudden spikes in session size from internal workstations,
- Lateral movement followed by data offloading,
- One-off sessions to previously unseen external IPs with large outbound payloads.
This approach helps detect targeted, session-based exfiltration, often used by advanced persistent threats (APTs) or insiders.
2. Monitoring Data Movement to External Organizations Across Sessions
Adversaries often use infrastructure hosted by cloud providers, VPS (Virtual Private Server) services, or even compromised legitimate domains to blend in. This use case aims to track data being sent to broader destination categories, such as an Autonomous System Number (ASN) or hosting organisation, instead of single IPs.
By doing this, we enhance our ability to:
- Identify distributed exfiltration to multiple IPs within the same ASN (e.g., Amazon, Microsoft),
- Catch abuse of whitelisted infrastructure that is generally considered safe,
- Monitor persistent low-volume data flows to unknown or risky organisations.
This organisational-level visibility gives security teams a higher-level view of how data might be leaking out, even if no single IP raises a red flag.
3. Monitoring Sustained Outbound Transfers to Reveal Potential P2P Communications
Peer-to-peer (P2P) communication channels are often used by malware for command-and-control (C2) operations, file synchronization, or hidden data exfiltration. This use case focuses on identifying internal hosts with consistent outbound traffic behaviour that doesn’t match their usual activity patterns.
These behaviors may include:
- Regular communication intervals to multiple unknown IPs,
- Small but steady amounts of data leaving the network over time,
- Use of non-standard ports or protocols associated with P2P tools.
By monitoring hourly aggregated traffic volumes and analyzing long-term trends, we can detect subtle, sustained exfiltration patterns often associated with P2P based malware families.
ML Techniques and Tools Used
To support these use cases, our detection framework incorporates:
- Anomaly Detection Models: To learn typical traffic patterns and highlight statistically significant deviations.
- Statistical Profiling: To calculate baselines and detect outliers in session size and transfer frequency.
- Contextual Enrichment: To tag IPs with ASN, reputation, and cloud provider metadata, improving the quality of detections.
These models are embedded into our log analysis pipeline, which ingests traffic data, applies transformations, and triggers alerts when suspicious patterns are observed.
ML Techniques and Tools Used
To support these use cases, our detection framework incorporates:
- Anomaly Detection Models: To learn typical traffic patterns and highlight statistically significant deviations.
- Statistical Profiling: To calculate baselines and detect outliers in session size and transfer frequency.
- Contextual Enrichment: To tag IPs with ASN, reputation, and cloud provider metadata, improving the quality of detections.
These models are embedded into our log analysis pipeline, which ingests traffic data, applies transformations, and triggers alerts when suspicious patterns are observed.
See also: