Explore detailed experimental results, setup information, and supplementary materials related to Provexa.
Attacks based on commonly used exploits.
Multi-host multi-step intrusive attacks
The attacker used an external host (Command & Control (C2) server) to perform initial penetration, distribute malware, and exfiltrate data. The first host compromised by the attacker is named as Host 1, which is a starting point to perform lateral movement and other malicious actions to compromise more hosts.
Real-world malware cases
We obtained a dataset of free Windows malware samples from VirusSign. We focused on 5 largest categories (i.e., Trojan.Autorun, Trojan.Danger, Virus.Hijack, Virus.Infector, Virus.Sysbot), and we randomly selected 1 malware sample for each category.
DARPA TC attack cases
We selected 3 attack cases from the DARPA Transparent Computing (TC) Engagement #5 data release. Specifically,
the dataset consists of the captured system audit logs of six performer systems (i.e., ClearScope, FiveDirections,
THEIA, TRACE, CADETS, and MARPLE) under the attack of the red team using different attack strategies.
The audit logs include both benign system activities and attack activities. The dataset also includes a ground-truth
report with attack descriptions and setups for the cases. We first retrieved the logs for the five performers with desktop
OS's (excluding ClearScope that runs Android). After examining the logs, we found that the logs for CADETS lack key
attributes (e.g., file name), making us unable to confirm the attack ground truth to conduct evaluations. In MARPLE,
the attack failed. In TRACE, we found that forward tracking one step can reveal the attack sequence. Thus, we do not
consider CADETS, MARPLE, and TRACE cases. Nevertheless, similar attacks were already performed for other performer systems
and their logs are covered. For the other two performer systems, the attack cases for the same performer system are largely
similar, and thus we selected at least one attack case from each system to include in our evaluation benchmark. To use the logs
for evaluation, we developed a tool to parse the released logs and loaded the parsed system entities and system events into Provexa's databases.
ATLASv2 attack cases
We selected 4 attack cases from the ATLASv2 Attack Engagement dataset. This dataset builds on the first version
of the ATLAS dataset and includes audit logs recorded from various sources to increase the benign activities
recorded to simulate a more realistic dataset with enough noise data to accompany the malicious system events.
The dataset also includes a ground-truth report with attack descriptions and setups for the cases. In this dataset,
two host machines were used for daily usage by two researchers to record benign activies for 4 days. In the fifth day,
the malicious attacks were carried out while the victim hosts continued serving the daily use of the researchers.
This captures realistic system logs that closely resemble real-world attack logs. We select 4 attack cases that involve
different vulnerabilities in commonly used software - Adobe Flash and Microsoft Word.
We provide query examples written in different languages: ProvQL, SQL, Cypher, and Splunk SPL (search query only).
We select malicious_ssh_theft
as an example. Two queries (a backward tracking query, and a search query) are shown.
Query 1: Backward Tracking From the POI Events
In the first query, the security analyst performs backward tracking from the POI event whose cmdline attribute contains authorized_keys.
bg_query1 = back track where (cmdline like 'authorized_keys', type=process) from db(malicious_ssh_theft);
WITH RECURSIVE allnodes (type, id, name, path, dstip, dstport, srcip, srcport, pid, exename, exepath, cmdline) AS (
SELECT 'file', id, name, path, NULL::text, NULL::int, NULL::text, NULL::int, NULL::int, NULL::text, NULL::text, NULL::text FROM file UNION
SELECT 'network', id, NULL::text, NULL::text, CAST (dstip AS text), dstport, CAST (srcip AS text), srcport, NULL::int, NULL::text, NULL::text, NULL::text FROM network UNION
SELECT 'process', id, NULL::text, NULL::text, NULL::text, NULL::int, NULL::text, NULL::int, pid, exename, exepath, cmdline FROM process),
nodes AS (SELECT * FROM allnodes),
alledges AS (
SELECT id, srcid, dstid, starttime, endtime, hostname, optype, amount FROM fileevent UNION
SELECT id, srcid, dstid, starttime, endtime, hostname, optype, amount FROM networkevent UNION
SELECT id, srcid, dstid, starttime, endtime, hostname, optype, 0 AS amount FROM processevent),
edges AS (SELECT e.* FROM alledges e INNER JOIN nodes n1 ON e.srcid = n1.id
INNER JOIN nodes n2 ON e.dstid = n2.id),
graph (id, srcid, dstid, starttime, endtime, hostname, optype, amount, threshold, step) AS (
SELECT *, edges.endtime, 1 FROM edges
WHERE dstid IN (SELECT id FROM nodes WHERE (cmdline like authorized_keys) AND (type = process))
UNION SELECT edges.*, LEAST(edges.endtime, graph.threshold), graph.step
FROM edges JOIN graph ON edges.dstid = graph.srcid
WHERE edges.starttime <= graph.threshold)
SELECT DISTINCT id, srcid, dstid, starttime, endtime, hostname, optype, amount FROM graph;
MATCH ()-[r]->(root)
WHERE (root.cmdline =~ authorized_keys) AND (root.type = process)
SET r.threshold = r.endtime, r.marked=true
WITH root
MATCH p = ()-[*..3]->(root)
WITH DISTINCT relationships(p) as r
FOREACH (i IN reverse(range(0, size(r)-2))
| FOREACH (n1 IN [r[i]]
| FOREACH (n2 IN [r[i+1]]
| FOREACH (edge IN
CASE
WHEN n1.starttime <= n2.threshold THEN [n1]
ELSE []
END | SET edge.marked=true
SET n1.threshold=CASE
WHEN n1.endtime > n2.threshold THEN n2.threshold
ELSE n1.endtime END))))
WITH DISTINCT r
MATCH (sn)-[rr]->(en)
WHERE (rr IN r AND rr.marked=true)
RETURN DISTINCT sn, rr, en
Query 2: Search for the Entry Nodes on the Backward Dependency Graph
In the second query, the security analyst searches for the attack entry nodes on the backward dependency graph.
search from bg_query1 where e1{exename like "ssh",type=process},
e2{type=process}
with e1->e2
return * as entry;
WITH allnodes (type, id, name, path, dstip, dstport, srcip, srcport, pid, exename, exepath, cmdline) AS (
SELECT 'file', id, name, path, NULL::text, NULL::int, NULL::text, NULL::int, NULL::int, NULL::text, NULL::text, NULL::text FROM file UNION
SELECT 'network', id, NULL::text, NULL::text, CAST (dstip AS text), dstport, CAST (srcip AS text), srcport, NULL::int, NULL::text, NULL::text, NULL::text FROM network UNION
SELECT 'process', id, NULL::text, NULL::text, NULL::text, NULL::int, NULL::text, NULL::int, pid, exename, exepath, cmdline FROM process),
nodes AS (SELECT * FROM allnodes), alledges AS (
SELECT id, srcid, dstid, starttime, endtime, hostname, optype, amount FROM fileevent UNION
SELECT id, srcid, dstid, starttime, endtime, hostname, optype, amount FROM networkevent UNION
SELECT id, srcid, dstid, starttime, endtime, hostname, optype, 0 AS amount FROM processevent),
edges AS (SELECT e.* FROM alledges e),
event1 (id, srcid, dstid, starttime, endtime, hostname, optype, amount) AS (
SELECT edges.* FROM
nodes n1 INNER JOIN edges ON n1.id = edges.srcid
INNER JOIN nodes n2 ON edges.dstid = n2.id
WHERE ((n1.exename like ssh) AND (n1.type = process)) AND (n2.type = process) AND (edges.optype != null)),
result AS (SELECT event1.* FROM event1 WHERE true)
SELECT * FROM result;
MATCH (e1)-[event0]->(e2)
WHERE (e1.exename =~ ssh) AND
(e1.type = process) AND
e2.type = process AND true
RETURN e1, e2, event0
| search index=process exename="ssh" type="process"
| fields id
| join type=inner id
[search index=processevent
| fields srcid, dstid, *]
| join type=inner dstid
[search index=process type="process"
| fields id, *
| rename id as dstid]
| table *
Detailed survey questions and results are presented below.
Detailed experiment setup and results are presented below.
Explanation of multi-query splitting, illustrative examples, and analysis of its advantages.