SPECIAL Transparency and Consent Benchmark (STC-bench)

Abstract

This document specifies STC-bench, the benchmark for the GDPR-based transparency and consent we developed in the context of the SPECIAL project.

Outline

The development of the benchmark is driven by so-called "choke-points", aimed at identifying important technical challenges to be evaluated in a query workload, forcing systems onto a path of technological innovation. This methodology depends on the identification of such workload by technical experts in the architecture of the system under test.

Thus, we analysed the SPECIAL platform with the technical experts involved in the SPECIAL policy vocabulary, the transparency and the compliance components. Following this study, we identified the transparency and compliance choke points described below. Further information can be found in Deliverable 3.3 - Scalability and Robustness testing report V1.

Transparency choke points

CP1 - Concurrent access. The benchmark should test the ability of the system to efficiently handle concurrent transparency requests as the number of users grows.
CP2 - Increasing data volume. The system should provide mechanisms to efficiently serve the transparency needs of the users, even when the number of events in the system (i.e. consents, data processing and sharing events) grows.
CP3 - Ingestion time in a streaming scenario. The benchmark should test that the transparency needs are efficiently served in a streaming scenario, i.e. the user should be able to access the information of an event (and the result of the compliance check) shortly after the event arrives to the system.

Compliance choke points

CP4 - Different complexities of policies. In general, policies can be arbitrarily complex, affecting the overall performance of any compliance checking process. Thus, the benchmark must consider different complexities of policies, reflecting a realistic scenario.
CP5 - Increasing number of users. The benchmark should test the ability of the system to efficiently scale and perform as increasing number of users, i.e. data processing and sharing events, are managed.
CP6 - Expected passed/fail tests. In general, the benchmark must consider a realistic scenario where policies are updated, some consents are revoked, and others are updated. The benchmark should provide the means to validate whether the performance of the system depends on the ratio of passed/fail tests in the work load.
CP7 - Data generation rates. The system should cope with consents and data processing and sharing events generated with increasing rates, addressing the "velocity" requirements of most big data scenarios.
CP8 - Performant streaming processing. The benchmark should be able to test the system in a streaming scenario, where the compliance checking should fulfil the aforementioned requirements of performance and responsiveness (latency).
CP9 - Performant batch processing. In addition to streaming, the system must deal with performant compliance checking in batch mode.

Data Generation

The STC-bench data generator is designed to test the compliance and transparency performance of the SPECIAL platform, hence it produces synthetic data on two related concepts: the controllers' policies and the data sharing and processing events that are potentially compliant with the user consent.

The STC-bench data generator is available at the SPECIAL github repository.

The following parameters can be set (please find more information on the README of the STC-bench generator):

Generation rate: The rate at which the generator outputs events. This parameter understands golang duration syntax eg: 1s or 10ms.
Number of events: The total number of events that will be generated. When this parameters is <=0 it will create an infinite stream .
Format: The serialization format used to write the events (json or ttl).
Type: The type of event to be generated: log, which stands for generating data sharing and processing events, or consent, which generate new user consents.
Number of policies: The maximum number of policies to be used in a single consent.
Number of users: The number of UserID attribute values to generate.

Benchmark Tasks

In the following we summarize the set of concrete benchmark tasks for the SPECIAL compliance and transparency components. Further information can be found in Deliverable 3.3 - Scalability and Robustness testing report V1.

As for transparency, we consider a minimum subset of queries, described in Table 1. In this case, the system is aimed at resolving user and controller transparency queries. The transparency tasks are illustrated in Table 2.

Table 1. Transparency queries for the data subject and the data controller
ID	User	Query
Q1	Data subject	All events of the user
Q2		Percentage of events of the user passed
Q3		Percentage of events of the user failed
Q4		All events of the user passed
Q5		All events of the user failed
Q6		Last 100 events of the user
Q7		All events of the user from a particular application

Q8	Data controller	All events
Q9		Percentage of events passed
Q10		Percentage of events failed
Q11		All events passed
Q12		All events failed
Q13		Last 100 events
Q14		All events from a particular application

Table 2. Transparency tasks, all referring to user and controller transparency queries
Task	#Users	Event rate	Policies	#events	Pass ratio	Choke point
T-T1	100	none	UNION of 5 p.	500M events	Random	CP1
	1K
	10K
	100K
	1M

T-T2	1000	none	UNION of 5 p.	1M	Random	CP2
				50M
				100M
				1B
				10B

T-T3	1000	1 ev./60s	UNION of 5 p.	500M events	Random	CP3
		1 ev./30s
		1 ev./10s
		1 ev./s
		10 ev./s

Table 3 shows the tasks to be performed by the SPECIAL compliance component in order to cover all choke points identified above. Each task delimits the different parameters involved, such as the scenario (streaming or batch processing), the number of users, etc. These parameters follow the choke points, and their values are estimated based on consultation with the SPECIAL pilot partners. Note that all tests set a test time of 30 minutes, which delimits the number of events generated given the number of users and event generation rate in each case.

Table 3. Compliance tasks.
Task	Subtask	Scenario	#Users	Event rate	Policies	Test time	Pass ratio	Choke point
C-T1	C-T1-1	Streaming	1000	1 ev./10s	1 policy	30 minutes	Random	CP4, CP8
	C-T1-2				UNION of 5 p.
	C-T1-3				UNION of 10 p.
	C-T1-4				UNION of 20 p.
	C-T1-5				UNION of 30 p.

C-T2	C-T2-1	Streaming	100	1 ev./10s	UNION of 5 p.	30 minutes	Random	CP5, CP8
	C-T2-2		1K
	C-T2-3		10K
	C-T2-4		100K
	C-T2-5		1M

C-T3	C-T3-1	Streaming	1000	1 ev./10s	UNION of 5 p.	30 minutes	0%	CP6, CP8
	C-T3-2						25%
	C-T3-3						50%
	C-T3-4						75%
	C-T3-5						100%

C-T4	C-T4-1	Streaming	1000	1 ev./60s	UNION of 5 p.	30 minutes	Random	CP7, CP8
	C-T4-2			1 ev./30s
	C-T4-3			1 ev./10s
	C-T4-4			1 ev./s
	C-T4-5			10 ev./s

C-T5	C-T5-1	Batch	100	-	UNION of 5 p.	100K events	Random	CP9
	C-T5-2		1K			1M events
	C-T5-3		10K			10M events
	C-T5-4		100K			100M events
	C-T5-5		1M			1B events

Results

This section provides continuous updates on the results of the evaluation of STC-bench in the SPECIAL platform.

Update on 30 June 2018: The results of the evaluation are published in Deliverable 3.3 - Scalability and Robustness testing report V1.