Splunk Behavioral Detection

Below are tangible, actionable examples of how to implement a behavioral/baseline-based detection strategy for identifying potential brute force attempts on VPN portals in Splunk. These approaches often leverage Splunk Enterprise Security (ES) features such as correlation searches, Risk-Based Alerting (RBA), and sometimes the Splunk Machine Learning Toolkit (MLTK) for more sophisticated analysis.


1. Behavior/Baseline Example: Per-User Historical Baseline

Goal: Detect when a specific user’s authentication failures exceed their normal historical pattern.

A. Building a Baseline

  1. Aggregate Historical Data

    • Create a scheduled search (or a summary index) to calculate the average (and possibly standard deviation) of daily or hourly authentication failures for each user over a historical period (e.g., 30 days).
    • Example search to populate a summary index once a day:
      1
      2
      3
      4
      5
      6
      7
      | tstats count as fail_count
        where index=<your_vpn_index> sourcetype=<vpn_sourcetype> action="failed"
        by _time user
        span=1d
      | stats avg(fail_count) as daily_avg stdev(fail_count) as daily_stdev by user
      | collect index=baseline_vpn_fails source="daily_vpn_fail_baseline" 
        marker="baseline_data"
      
    • This stores daily average failures (daily_avg) and standard deviation (daily_stdev) per user in a separate index (e.g., baseline_vpn_fails).
  2. Refine the Baseline
    • Depending on environment size and activity, you might want to gather more data points (e.g., multiple weeks or months).
    • You could also store aggregated metrics by hour of day (peak vs. off hours) if you notice time-of-day usage patterns.

B. Comparing Real-Time Data Against Baseline

  1. Create a Correlation Search (or Scheduled Search) in Splunk ES:
    • Runs periodically (e.g., every 5 minutes) to check real-time failures against the user’s baseline.
      | tstats count as fail_count
        where index=<your_vpn_index> sourcetype=<vpn_sourcetype> action="failed" 
        by user _time
        span=5m
      | join user [ 
          search index=baseline_vpn_fails source="daily_vpn_fail_baseline"
        | fields user daily_avg daily_stdev
      ]
      | eval alert_threshold = daily_avg + (3 * daily_stdev)
      | where fail_count > alert_threshold
      
  2. Set Alert or Create a Notable
    • If the user’s 5-minute failure count is greater than (average + 3 × std dev), you can create an alert or a notable event.
  3. Risk-Based Alerting (RBA)
    • Instead of immediately triggering a high-priority alert, you can assign a “risk score” to the user if they exceed their baseline. If that risk score pushes them above a certain threshold within a larger time window, create a high-priority notable in ES.

2. IP Range Baseline (Geo-Based or Network Segment)

Goal: Detect anomalies for specific IP ranges (e.g., a known office network or typical user geo-locations).

  1. Identify Normal Access Patterns
    • For each distinct src_ip (or netblock like 10.0.0.0/8), gather historical metrics for the number of authentication failures and successes.
    • If you use geo-IP data, you can baseline how many attempts typically come from certain countries or regions.
  2. Baseline Search
    • Similar to the user-based approach, create a summary of average failures from each IP range or geo-location.
      1
      2
      3
      4
      5
      6
      7
      8
      | tstats count as fail_count
        where index=<your_vpn_index> sourcetype=<vpn_sourcetype> action="failed"
        by src_ip
        span=1d
      | iplocation src_ip
      | eval country=Country
      | stats avg(fail_count) as daily_avg stdev(fail_count) as daily_stdev by country
      | collect index=baseline_geo_fails source="daily_geo_fail_baseline"
      
  3. Real-Time Comparison
    • In a subsequent correlation search, if an IP from a normally quiet location suddenly has a spike in failures, generate a notable.
      | tstats count as fail_count
        where index=<your_vpn_index> sourcetype=<vpn_sourcetype> action="failed"
        by src_ip _time
        span=5m
      | iplocation src_ip
      | eval country=Country
      | join country [ 
          search index=baseline_geo_fails source="daily_geo_fail_baseline"
          | fields country daily_avg daily_stdev
      ]
      | eval alert_threshold = daily_avg + (3 * daily_stdev)
      | where fail_count > alert_threshold
      

3. Machine Learning Toolkit Example

Goal: Use a fully data-driven approach to detect outliers in authentication events without manually configuring thresholds.

  1. Enable the Machine Learning Toolkit (MLTK)
    • Install MLTK from Splunkbase if it isn’t already.
  2. Build a MLTK Experiment
    • In Splunk Web, go to Machine Learning Toolkit > Experiments.
    • Use the “Detect Numeric Outliers” or “Anomalous Behavior” experiment type.
    • Feed it your historical authentication failures, grouped by user/IP over time.
  3. Train Your Model
    • Select your fields (e.g., user, fail_count) and timeframe to build the training set.
    • The MLTK will generate a model that flags unusual spikes in fail_count per user.
  4. Apply the Model in Real-Time

    • After training, you can schedule a search that uses the apply command to run the model on new data and look for outliers.
    • Example:
      1
      2
      3
      4
      5
      6
      | tstats count as fail_count
        where index=<your_vpn_index> sourcetype=<vpn_sourcetype> action="failed"
        by _time user
        span=5m
      | apply your_ML_model
      | where isOutlier = 1
      
    • If isOutlier = 1, it means the count of failures for that user is anomalous relative to historical patterns.
  5. Generate Alerts / Notables
    • Convert this search into an alert or correlation search in Splunk ES.
    • Add risk-based logic: for each outlier event, add X points of risk to that user. If the user’s risk exceeds a certain threshold, generate a notable event.

4. Risk-Based Alerting (RBA) Workflows in Splunk ES

  1. Create a Risk Object
    • In Splunk ES, each user or IP can be a “risk object.”
    • For example, if a user’s current login failure is above baseline, add 20 risk points to that user. If it’s significantly above baseline (e.g., 5x normal), add 50 points.
  2. Use a Risk Threshold
    • A separate correlation search monitors risk scores across all risk objects.
    • If a user’s risk score > 100 in the last 24 hours, create a notable event: “Possible Compromise of [User].”
  3. Advantages
    • Single brute force spike might not trigger a critical alert if it’s borderline.
    • Multiple suspicious events (e.g., large spike in failures + logins from unusual country) quickly elevate the user’s risk, triggering a more severe notable.

5. Actionable Next Steps

  1. Start Simple
    • Begin by collecting historical data and building summary indexes for daily or weekly authentication failures per user/IP.
  2. Set Up a Scheduled Search or Correlation Search
    • Compare real-time data to your baselines.
    • Generate notables if significantly above normal levels.
  3. Iterate and Refine
    • Add contextual data (GeoIP, user groups, known business hours) to reduce false positives.
    • Move from static thresholds (e.g., average + 3×stdev) to a machine learning model if you have enough data and want more precision.
  4. Adopt RBA
    • If you have Splunk Enterprise Security, leverage risk-based alerting to avoid “alert fatigue.”
    • Combine multiple suspicious behaviors into a single high-severity notable.

Conclusion

By comparing current user/IP activity to a known historical baseline, you gain stronger detection coverage for brute force attempts than simple threshold-based rules. Splunk offers multiple ways to achieve this, from correlation searches in Enterprise Security to fully data-driven anomaly detection using the Machine Learning Toolkit. Over time, you can integrate these detections with Risk-Based Alerting to prioritize your most critical security events.

Edit

Pub: 03 Jan 2025 18:48 UTC

Views: 72