SPECIAL Transparency and Consent Benchmark (STC-bench)
Abstract
This document specifies STC-bench, the benchmark for the GDPR-based transparency and consent we developed in the context of the SPECIAL project.
Outline
The development of the benchmark is driven by so-called "choke-points", aimed at identifying important technical challenges to be evaluated in a query workload, forcing systems onto a path of technological innovation. This methodology depends on the identification of such workload by technical experts in the architecture of the system under test.
Thus, we analysed the SPECIAL platform with the technical experts involved in the SPECIAL policy vocabulary, the transparency and the compliance components. Following this study, we identified the transparency and compliance choke points described below. Further information can be found in Deliverable 3.3 - Scalability and Robustness testing report V1.
Transparency choke points
- CP1 - Concurrent access. The benchmark should test the ability of the system to efficiently handle concurrent transparency requests as the number of users grows.
- CP2 - Increasing data volume. The system should provide mechanisms to efficiently serve the transparency needs of the users, even when the number of events in the system (i.e. consents, data processing and sharing events) grows.
- CP3 - Ingestion time in a streaming scenario. The benchmark should test that the transparency needs are efficiently served in a streaming scenario, i.e. the user should be able to access the information of an event (and the result of the compliance check) shortly after the event arrives to the system.
Compliance choke points
- CP4 - Different complexities of policies. In general, policies can be arbitrarily complex, affecting the overall performance of any compliance checking process. Thus, the benchmark must consider different complexities of policies, reflecting a realistic scenario.
- CP5 - Increasing number of users. The benchmark should test the ability of the system to efficiently scale and perform as increasing number of users, i.e. data processing and sharing events, are managed.
- CP6 - Expected passed/fail tests. In general, the benchmark must consider a realistic scenario where policies are updated, some consents are revoked, and others are updated. The benchmark should provide the means to validate whether the performance of the system depends on the ratio of passed/fail tests in the work load.
- CP7 - Data generation rates. The system should cope with consents and data processing and sharing events generated with increasing rates, addressing the "velocity" requirements of most big data scenarios.
- CP8 - Performant streaming processing. The benchmark should be able to test the system in a streaming scenario, where the compliance checking should fulfil the aforementioned requirements of performance and responsiveness (latency).
- CP9 - Performant batch processing. In addition to streaming, the system must deal with performant compliance checking in batch mode.
Data Generation
The STC-bench data generator is designed to test the compliance and transparency performance of the SPECIAL platform, hence it produces synthetic data on two related concepts: the controllers' policies and the data sharing and processing events that are potentially compliant with the user consent.
The STC-bench data generator is available at the SPECIAL github repository.
The following parameters can be set (please find more information on the README of the STC-bench generator):
- Generation rate: The rate at which the generator outputs events. This parameter understands golang duration syntax eg: 1s or 10ms.
- Number of events: The total number of events that will be generated. When this parameters is <=0 it will create an infinite stream .
- Format: The serialization format used to write the events (json or ttl).
- Type: The type of event to be generated: log, which stands for generating data sharing and processing events, or consent, which generate new user consents.
- Number of policies: The maximum number of policies to be used in a single consent.
- Number of users: The number of UserID attribute values to generate.
Benchmark Tasks
In the following we summarize the set of concrete benchmark tasks for the SPECIAL compliance and transparency components. Further information can be found in Deliverable 3.3 - Scalability and Robustness testing report V1.
As for transparency, we consider a minimum subset of queries, described in Table 1. In this case, the system is aimed at resolving user and controller transparency queries. The transparency tasks are illustrated in Table 2.
ID | User | Query |
---|---|---|
Q1 | Data subject | All events of the user |
Q2 | Percentage of events of the user passed | |
Q3 | Percentage of events of the user failed | |
Q4 | All events of the user passed | |
Q5 | All events of the user failed | |
Q6 | Last 100 events of the user | |
Q7 | All events of the user from a particular application | |
Q8 | Data controller | All events |
Q9 | Percentage of events passed | |
Q10 | Percentage of events failed | |
Q11 | All events passed | |
Q12 | All events failed | |
Q13 | Last 100 events | |
Q14 | All events from a particular application |
Task | #Users | Event rate | Policies | #events | Pass ratio | Choke point |
---|---|---|---|---|---|---|
T-T1 | 100 | none | UNION of 5 p. | 500M events | Random | CP1 |
1K | ||||||
10K | ||||||
100K | ||||||
1M | ||||||
T-T2 | 1000 | none | UNION of 5 p. | 1M | Random | CP2 |
50M | ||||||
100M | ||||||
1B | ||||||
10B | ||||||
T-T3 | 1000 | 1 ev./60s | UNION of 5 p. | 500M events | Random | CP3 |
1 ev./30s | ||||||
1 ev./10s | ||||||
1 ev./s | ||||||
10 ev./s |
Table 3 shows the tasks to be performed by the SPECIAL compliance component in order to cover all choke points identified above. Each task delimits the different parameters involved, such as the scenario (streaming or batch processing), the number of users, etc. These parameters follow the choke points, and their values are estimated based on consultation with the SPECIAL pilot partners. Note that all tests set a test time of 30 minutes, which delimits the number of events generated given the number of users and event generation rate in each case.
Task | Subtask | Scenario | #Users | Event rate | Policies | Test time | Pass ratio | Choke point |
---|---|---|---|---|---|---|---|---|
C-T1 | C-T1-1 | Streaming | 1000 | 1 ev./10s | 1 policy | 30 minutes | Random | CP4, CP8 |
C-T1-2 | UNION of 5 p. | |||||||
C-T1-3 | UNION of 10 p. | |||||||
C-T1-4 | UNION of 20 p. | |||||||
C-T1-5 | UNION of 30 p. | |||||||
C-T2 | C-T2-1 | Streaming | 100 | 1 ev./10s | UNION of 5 p. | 30 minutes | Random | CP5, CP8 |
C-T2-2 | 1K | |||||||
C-T2-3 | 10K | |||||||
C-T2-4 | 100K | |||||||
C-T2-5 | 1M | |||||||
C-T3 | C-T3-1 | Streaming | 1000 | 1 ev./10s | UNION of 5 p. | 30 minutes | 0% | CP6, CP8 |
C-T3-2 | 25% | |||||||
C-T3-3 | 50% | |||||||
C-T3-4 | 75% | |||||||
C-T3-5 | 100% | |||||||
C-T4 | C-T4-1 | Streaming | 1000 | 1 ev./60s | UNION of 5 p. | 30 minutes | Random | CP7, CP8 |
C-T4-2 | 1 ev./30s | |||||||
C-T4-3 | 1 ev./10s | |||||||
C-T4-4 | 1 ev./s | |||||||
C-T4-5 | 10 ev./s | |||||||
C-T5 | C-T5-1 | Batch | 100 | - | UNION of 5 p. | 100K events | Random | CP9 |
C-T5-2 | 1K | 1M events | ||||||
C-T5-3 | 10K | 10M events | ||||||
C-T5-4 | 100K | 100M events | ||||||
C-T5-5 | 1M | 1B events |
Results
This section provides continuous updates on the results of the evaluation of STC-bench in the SPECIAL platform.
- Update on 30 June 2018: The results of the evaluation are published in Deliverable 3.3 - Scalability and Robustness testing report V1.