Date(s) - 24/03/2017
12:00 pm - 1:00 pm
Syntactic data anonymization strives to (i) ensure that an adversary cannot identify an individual’s record from published attributes with high probability, and (ii) provide high data utility. These mutually conflicting goals can be expressed as an optimization problem with privacy as the constraint and utility as the objective function.
Conventional research using the k-anonymity model has resorted to publishing data in homogeneous generalized groups. A recently proposed alternative does not create such cliques; instead, it recasts data values in a heterogeneous manner, aiming for higher utility.
Nevertheless, such works never defined the problem in the most general terms; thus, the utility gains they achieve are limited. In this paper, we propose a methodology that achieves the full potential of heterogeneity and gains higher utility while providing the same privacy guarantee. We formulate the problem of maximal-utility k-anonymization by freeform generalization as a network flow problem. We develop an optimal solution therefor using Mixed Integer Programming. Given the non-scalability of this solution, we develop an O(k n^2) Greedy algorithm that has no time-complexity disadvantage vis-á-vis previous approaches, an O(k n^2 log n) enhanced version thereof, and an O(k n^3) adaptation of the Hungarian algorithm; these algorithms build a set of k perfect matchings from original to anonymized data, a novel approach to the problem. Moreover, our techniques can resist adversaries who may know the employed algorithms. Our experiments with real-world data verify that our schemes achieve near-optimal utility (with gains of up to 41%), while they can exploit parallelism and data partitioning, gaining an efficiency advantage over simpler methods.