MTM2009-08747

Very large scale optimization for data privacy.

Funding organization: Spanish Ministry of Science and Inovation. January 2010 - December 2012.

Members and partners

Members
  • Jordi Castro
  • José Antonio González
  • Jordi Cuesta
  • Daniel Baena
  • Antonio Frangioni
  • Claudio Gentile
Universities
  • Universitat Politècnica de Catalunya (UPC)
  • Universitat Rovira i Virgili
  • Universita di Pisa
  • Consiglio Nazionale delle Ricerche, Roma, Italia
National Agencies
  • Statistics Germany
  • Statistics Netherlands
  • Statistics Italy
  • Statistics Catalonia
Top

Introduction and Objectives

National Statistical Agencies are forced by law to control the disclosure of individual confidential information from published data. Prior to dissemination, data must be processed by some protection method. This project is about tabular data protection methods, i.e., methods that deal with the trade-off between utility and disclosure risk of aggregated data. One of these methods, minimum distance Controlled Tabular Adjustment (CTA), suggested by the research group, has shown to be more efficient than alternative tools. This technique formulates a linear (for L1 distance) or quadratic (for L2) optimization problem. They become mixed integer linear or quadratic problems if binary decisions about sensitive cells are included in the model.

Previous results showed that polynomial-time interior-point methods are very effective at solving the resulting linear or quadratic optimization problems. For some structured tabular data, specialized methods developed by the group outperformed state-of-the-art commercial solvers by a factor of 500. However, for general unstructured and massive data (resulting in optimization problems with several million variables and a few million constraints), general interior-point solvers based on Cholesky factorizations may show a poor performance. The above only applies to a single table. If one wants to jointly protect the full set of tables from, for example, a particular census, (which is not done, but it would be the safest approach), the resulting problem is within the limits (if not beyond) of the capabilities of current optimization technology. One of the project´s goals is to fill this gap by developing and testing interior-point algorithms based on iterative solvers for these massive optimization problems.

In addition, for a recently finished project we participated in, the mixed integer version of CTA was used by Eurostat (Statistical Agency of the European Commission) within a wider scheme for the protection of structural bussiness statistics of European companies. Though the dimension of instances was not very large, the inclusion of binary variables in the model exhausted the branch-and-cut of current state-of-the-art optimization solvers. The project will also consider the solution of these discrete optimization problems by way of Benders reformulation, Lagrangian relaxation, and specific cuts generation (e.g., perspective cuts for quadratic objectives).

An alternative to mixed integer CTA could be the interval protection method. Though this approach formulates a continuous optimization problem, the number of variables is  of order square of cells in the table, and the number of constraints is of order number of cells times number of table relations. For a moderate table it may result in optimization problems of about 1e+8 variables and 1e+7 constraints. The constraints matrix exhibits a dual block-angular structure. We will approach it by means of interior-point methods for structured problems.

The main objectives of the project are thus: 1) Development of interior-point algorithms based on iterative solvers for very large general unstructured problems. 2) Development of interior-point algorithms for massive dual block-angular problems. 3) Development and application of improving strategies for the above approaches: preconditioners, regularizations. 4) Solution of the mixed integer CTA problem by procedures like Benders reformulation, Lagrangian relaxation, perspective cuts, coordinate descent.

Top

Results

Publications

  • J. Castro, A. Frangioni, C. Gentile, Perspective Reformulations of the CTA Problem With $L_2$ Distances. Submitted to Operations Research, 2012, first revision in progress.
  • J. Castro, On assessing the disclosure risk of controlled adjustment methods for statistical tabular data, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 20 (2012) 921-941 .
  • J. Castro, J. Cuesta, Solving $L_1$-CTA in 3D tables by an interior-point method for primal block-angular problems, TOP , doi:10.1007/s11750-011-0247-z.
  • J. Castro, J. Cuesta, Improving an interior-point algorithm for multicommodity flows by quadratic regularizations, Networks, 59 (2012) 117-131.
  • J. Castro, Recent advances in optimization techniques for statistical tabular data protection, European Journal of Operational Research, 216 (2012) 257-269.
  • D. Baena, J. Castro, Using the analytic center in the feasibility pump, Operations Research Letters, 39 (2011) 310-317.
  • J. Castro, J. Cuesta, Quadratic regularizations in an interior-point method for primal block-angular problems, Mathematical Programming, 130 (2011) 415-445.
  • J. A. González, J. Castro, A heuristic block coordinate descent approach for controlled tabular adjustment, Computers & Operations Research, 38 (2011) 1826-1835.
  • J. Castro, J. Cuesta, Existence, uniqueness and convergence of the regularized primal-dual central path, Operations Research Letters, 38 (2010) 366-371.
  • J. Castro, Statistical disclosure control in tabular data, Privacy and Anonymity in Information Management Systems: New Techniques for New Practical Problem, Springer (2010), 113-131. ISBN 978-1849962377.

Top


Conferences and Seminar Presentations

  • J. Castro, Improving iterative solvers in IPMs: regularizations and hybrid preconditioners, Edinburgh Research Group in Optimization Seminars, School of Mathematics, University of Edinburgh, 10 October 2012. Invited presentation.
  • J. Castro, A computational evaluation of optimization solvers for CTA, Privacy in Statistical Databases 2012, Palermo (Italia), September 2012.
  • J. Castro, Comparing $L_1$ and $L_2$ distances for CTA, Privacy in Statistical Databases 2012, Palermo (Italia), September 2012.
  • J.A. González, J. Castro, Improving the solution of CTA through valid inequalities, 25th European Conference on Operational Research-EURO 2012, Vilnius University, Vilnius (Lithuania), July 2012. Invited presentation.
  • D. Baena, J. Castro, A fix and relax heuristic for controlled tabular adjustment, 25th European Conference on Operational Research-EURO 2012, Vilnius University, Vilnius (Lithuania), July 2012. Invited presentation.
  • J. Castro, A. Frangioni, C. Gentile, Solving $L_2$-CTA by perspective reformulations, 25th European Conference on Operational Research-EURO 2012, Vilnius University, Vilnius (Lithuania), July 2012. Invited presentation.
  • J. Cuesta, J. Castro, Solving $L_1$-CTA in 3D tables by an interior-point method for block-angular problems, 25th European Conference on Operational Research-EURO 2012, Vilnius University, Vilnius (Lithuania), July 2012. Invited presentation.
  • J. Castro, Organization of Stream on Data Confidentiality, 25th European Conference on Operational Research-EURO 2012, Vilnius University, Vilnius (Lithuania), July 2012.
  • J. Castro. Workshop on Techniques for tabular data protection, IDESCAT Workshop , IDESCAT, the National Statistical Institute of Catalonia, Barcelona, May 23 2012. Invited presentation.
  • D. Baena, J. Castro, The analytic center feasibility pump, XXXIII Congreso Nacional de Estadística e Investigación Operativa, Madrid, Spain, April 2012.
  • J. Castro (in collaboration with A. Frangioni, C. Gentile, and J.A. González), MILP and MIQP optimization problems from the statistical disclosure control field. Optimization, Theory, Algorithms and Applications in Economics. Workshop in applications of Optimization in Engineering, Centre de Recerca Matemàtica (CRM), Bellaterra (Catalonia), October 2011. Invited presentation.
  • J. Castro, J.A. González, Present and future research on controlled tabular adjustment. Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Universitat Rovira i Virgili, Tarragona (Catalonia), October 2011. Invited presentation.
  • C. Gentile, J. Castro, A. Frangioni, Minimum Euclidean distance controlled adjustment problems by perspective reformulations, 42th Annual Conference of the Italian Operational Research Society - AIRO 2011 , Brescia (Italia), September 2011.
  • J. Castro, Panel discussion Challenges and Applications of MINLP, Exploratory Workshop on Mixed Integer Non-Linear Programming, Mathematical Research Institute, Universidad de Sevilla (Spain), December 2010. Invited participant to panel discussion.
  • J. Castro, A class of interior-point methods for structured problems: theory and applications, Conference on Numerical Optimization and Applications in Engineering, Mathematical Research Center, Universitat Autonoma de Barcelona, Bellaterra (Catalonia), October 2010. Invited presentation.
  • J. Castro, J.A. González, A tool for analyzing and fixing infeasible RCTA instances, Privacy in Statistical Databases 2010, Corfu (Greece), September 2010.
  • J. Castro, J. Cuesta, Existence, uniqueness and convergence of the regularized primal-dual central path, XXXII Congreso Nacional de Estadística e Investigación Operativa, A Coruña, Spain, September 2010.
  • J. A. González, J. Castro, Solving tough instances of the controlled tabular adjustment problem, 24th European Conference on Operational Research-EURO 2010, Session on Large-scale Mixed Optimization Problems, Universidade de Lisboa, Lisboa (Portugal), July 2010. Invited presentation.
  • J. Castro, J. Cuesta, Improving a class of PCG-based interior-point methods by quadratic regularizations, Joint SIMAI/SEMA Conference on Applied and Industrial Mathematics, Minisimposium of Numerical Solution of Large Linear Systems in Numerical Optimization, University of Cagliari, Cagliari (Italia), June 2010. Invited presentation.
  • J. Castro, Statistical disclosure control in tabular data, Research seminar, Department of Mathematics, Universitat de Lleida, Lleida (Catalonia), 1 June 2010. Invited presentation.

Technology transfer contracts and related projects

  • Contract C08517 with IDESCAT (National Statistical Institute of Catalonia) for safe configuration of tabular data  using CTA. 2011.
  • Contract C08516 with Statistics Netherlands (to work on a project funded by Eurostat) for improving an implementation of a CTA code for structural business statistics (EURO SBS3). 2011.
  • 2011-2014."Data without Boundaries (DwB)" Project FP7 INFRA-2010-262608, granted by the European Union.
  • Contract C08182 with IDESCAT (National Statistical Institute of Catalonia) for safe configuration of tabular data. October 2010-January 2011.
  • Contract C07809 with Statistics Netherlands (in collaboration with Statistics Germany and to work on a project funded by Eurostat) for extension of a CTA code for statistical confidentiality of European aggregates for animal production statistics. September 1 2009- February 28 2010.
Top


PhD thesis

  • D. Baena, Optimal and heuristic approaches for controlled tabular adjustment, advisor J. Castro, in progress.

MSc thesis

  • M. Matute, Methods for finding initial points in controlled tabular adjustment, advisor J.A. González, 4 November 2010.
  • X. Jiménez, An optimization and modeling environment for large scale block-angular problems, directed by J. Castro, 24 January 2012

Top