SPECIAL Transparency and Consent Benchmark (STC-bench)

Abstract

This document specifies STC-bench, the benchmark for the GDPR-based transparency and consent we developed in the context of the SPECIAL project.

Outline

The development of the benchmark is driven by so-called "choke-points", aimed at identifying important technical challenges to be evaluated in a query workload, forcing systems onto a path of technological innovation. This methodology depends on the identification of such workload by technical experts in the architecture of the system under test.

Thus, we analysed the SPECIAL platform with the technical experts involved in the SPECIAL policy vocabulary, the transparency and the compliance components. Following this study, we identified the transparency and compliance choke points described below. Further information can be found in Deliverable 3.3 - Scalability and Robustness testing report V1.

Transparency choke points

  • CP1 - Concurrent access. The benchmark should test the ability of the system to efficiently handle concurrent transparency requests as the number of users grows.
  • CP2 - Increasing data volume. The system should provide mechanisms to efficiently serve the transparency needs of the users, even when the number of events in the system (i.e. consents, data processing and sharing events) grows.
  • CP3 - Ingestion time in a streaming scenario. The benchmark should test that the transparency needs are efficiently served in a streaming scenario, i.e. the user should be able to access the information of an event (and the result of the compliance check) shortly after the event arrives to the system.

Compliance choke points

  • CP4 - Different complexities of policies. In general, policies can be arbitrarily complex, affecting the overall performance of any compliance checking process. Thus, the benchmark must consider different complexities of policies, reflecting a realistic scenario.
  • CP5 - Increasing number of users. The benchmark should test the ability of the system to efficiently scale and perform as increasing number of users, i.e. data processing and sharing events, are managed.
  • CP6 - Expected passed/fail tests. In general, the benchmark must consider a realistic scenario where policies are updated, some consents are revoked, and others are updated. The benchmark should provide the means to validate whether the performance of the system depends on the ratio of passed/fail tests in the work load.
  • CP7 - Data generation rates. The system should cope with consents and data processing and sharing events generated with increasing rates, addressing the "velocity" requirements of most big data scenarios.
  • CP8 - Performant streaming processing. The benchmark should be able to test the system in a streaming scenario, where the compliance checking should fulfil the aforementioned requirements of performance and responsiveness (latency).
  • CP9 - Performant batch processing. In addition to streaming, the system must deal with performant compliance checking in batch mode.

Data Generation

The STC-bench data generator is designed to test the compliance and transparency performance of the SPECIAL platform, hence it produces synthetic data on two related concepts: the controllers' policies and the data sharing and processing events that are potentially compliant with the user consent.

The STC-bench data generator is available at the SPECIAL github repository.

The following parameters can be set (please find more information on the README of the STC-bench generator):

  • Generation rate: The rate at which the generator outputs events. This parameter understands golang duration syntax eg: 1s or 10ms.
  • Number of events: The total number of events that will be generated. When this parameters is <=0 it will create an infinite stream .
  • Format: The serialization format used to write the events (json or ttl).
  • Type: The type of event to be generated: log, which stands for generating data sharing and processing events, or consent, which generate new user consents.
  • Number of policies: The maximum number of policies to be used in a single consent.
  • Number of users: The number of UserID attribute values to generate.

Benchmark Tasks

In the following we summarize the set of concrete benchmark tasks for the SPECIAL compliance and transparency components. Further information can be found in Deliverable 3.3 - Scalability and Robustness testing report V1.

As for transparency, we consider a minimum subset of queries, described in Table 1. In this case, the system is aimed at resolving user and controller transparency queries. The transparency tasks are illustrated in Table 2.

Table 1. Transparency queries for the data subject and the data controller
ID User Query
Q1 Data subject All events of the user
Q2 Percentage of events of the user passed
Q3 Percentage of events of the user failed
Q4 All events of the user passed
Q5 All events of the user failed
Q6 Last 100 events of the user
Q7 All events of the user from a particular application
 
Q8 Data controller All events
Q9 Percentage of events passed
Q10 Percentage of events failed
Q11 All events passed
Q12 All events failed
Q13 Last 100 events
Q14 All events from a particular application
Table 2. Transparency tasks, all referring to user and controller transparency queries
Task #Users Event rate Policies #events Pass ratio Choke point
T-T1 100 none UNION of 5 p. 500M events Random CP1
1K
10K
100K
1M
 
T-T2 1000 none UNION of 5 p. 1M Random CP2
50M
100M
1B
10B
 
T-T3 1000 1 ev./60s UNION of 5 p. 500M events Random CP3
1 ev./30s
1 ev./10s
1 ev./s
10 ev./s

Table 3 shows the tasks to be performed by the SPECIAL compliance component in order to cover all choke points identified above. Each task delimits the different parameters involved, such as the scenario (streaming or batch processing), the number of users, etc. These parameters follow the choke points, and their values are estimated based on consultation with the SPECIAL pilot partners. Note that all tests set a test time of 30 minutes, which delimits the number of events generated given the number of users and event generation rate in each case.

Table 3. Compliance tasks.
Task Subtask Scenario #Users Event rate Policies Test time Pass ratio Choke point
C-T1 C-T1-1 Streaming 1000 1 ev./10s 1 policy 30 minutes Random CP4, CP8
C-T1-2 UNION of 5 p.
C-T1-3 UNION of 10 p.
C-T1-4 UNION of 20 p.
C-T1-5 UNION of 30 p.
 
C-T2 C-T2-1 Streaming 100 1 ev./10s UNION of 5 p. 30 minutes Random CP5, CP8
C-T2-2 1K
C-T2-3 10K
C-T2-4 100K
C-T2-5 1M
 
C-T3 C-T3-1 Streaming 1000 1 ev./10s UNION of 5 p. 30 minutes 0% CP6, CP8
C-T3-2 25%
C-T3-3 50%
C-T3-4 75%
C-T3-5 100%
 
C-T4 C-T4-1 Streaming 1000 1 ev./60s UNION of 5 p. 30 minutes Random CP7, CP8
C-T4-2 1 ev./30s
C-T4-3 1 ev./10s
C-T4-4 1 ev./s
C-T4-5 10 ev./s
 
C-T5 C-T5-1 Batch 100 - UNION of 5 p. 100K events Random CP9
C-T5-2 1K 1M events
C-T5-3 10K 10M events
C-T5-4 100K 100M events
C-T5-5 1M 1B events

Results

This section provides continuous updates on the results of the evaluation of STC-bench in the SPECIAL platform.

This project receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 731601

Legal Notice | Privacy Policy