MTM200908747
Funding organization: Spanish Ministry of Science and Inovation. January 2010  December 2012.
Members and partners
Members

Universities

National Agencies

Introduction and Objectives
National Statistical Agencies are forced by law to control the disclosure of individual confidential information from published data. Prior to dissemination, data must be processed by some protection method. This project is about tabular data protection methods, i.e., methods that deal with the tradeoff between utility and disclosure risk of aggregated data. One of these methods, minimum distance Controlled Tabular Adjustment (CTA), suggested by the research group, has shown to be more efficient than alternative tools. This technique formulates a linear (for L1 distance) or quadratic (for L2) optimization problem. They become mixed integer linear or quadratic problems if binary decisions about sensitive cells are included in the model.
Previous results showed that polynomialtime interiorpoint methods are very effective at solving the resulting linear or quadratic optimization problems. For some structured tabular data, specialized methods developed by the group outperformed stateoftheart commercial solvers by a factor of 500. However, for general unstructured and massive data (resulting in optimization problems with several million variables and a few million constraints), general interiorpoint solvers based on Cholesky factorizations may show a poor performance. The above only applies to a single table. If one wants to jointly protect the full set of tables from, for example, a particular census, (which is not done, but it would be the safest approach), the resulting problem is within the limits (if not beyond) of the capabilities of current optimization technology. One of the project´s goals is to fill this gap by developing and testing interiorpoint algorithms based on iterative solvers for these massive optimization problems.
In addition, for a recently finished project we participated in, the mixed integer version of CTA was used by Eurostat (Statistical Agency of the European Commission) within a wider scheme for the protection of structural bussiness statistics of European companies. Though the dimension of instances was not very large, the inclusion of binary variables in the model exhausted the branchandcut of current stateoftheart optimization solvers. The project will also consider the solution of these discrete optimization problems by way of Benders reformulation, Lagrangian relaxation, and specific cuts generation (e.g., perspective cuts for quadratic objectives).
An alternative to mixed integer CTA could be the interval protection method. Though this approach formulates a continuous optimization problem, the number of variables is of order square of cells in the table, and the number of constraints is of order number of cells times number of table relations. For a moderate table it may result in optimization problems of about 1e+8 variables and 1e+7 constraints. The constraints matrix exhibits a dual blockangular structure. We will approach it by means of interiorpoint methods for structured problems.
Previous results showed that polynomialtime interiorpoint methods are very effective at solving the resulting linear or quadratic optimization problems. For some structured tabular data, specialized methods developed by the group outperformed stateoftheart commercial solvers by a factor of 500. However, for general unstructured and massive data (resulting in optimization problems with several million variables and a few million constraints), general interiorpoint solvers based on Cholesky factorizations may show a poor performance. The above only applies to a single table. If one wants to jointly protect the full set of tables from, for example, a particular census, (which is not done, but it would be the safest approach), the resulting problem is within the limits (if not beyond) of the capabilities of current optimization technology. One of the project´s goals is to fill this gap by developing and testing interiorpoint algorithms based on iterative solvers for these massive optimization problems.
In addition, for a recently finished project we participated in, the mixed integer version of CTA was used by Eurostat (Statistical Agency of the European Commission) within a wider scheme for the protection of structural bussiness statistics of European companies. Though the dimension of instances was not very large, the inclusion of binary variables in the model exhausted the branchandcut of current stateoftheart optimization solvers. The project will also consider the solution of these discrete optimization problems by way of Benders reformulation, Lagrangian relaxation, and specific cuts generation (e.g., perspective cuts for quadratic objectives).
An alternative to mixed integer CTA could be the interval protection method. Though this approach formulates a continuous optimization problem, the number of variables is of order square of cells in the table, and the number of constraints is of order number of cells times number of table relations. For a moderate table it may result in optimization problems of about 1e+8 variables and 1e+7 constraints. The constraints matrix exhibits a dual blockangular structure. We will approach it by means of interiorpoint methods for structured problems.
The main objectives of the project are thus: 1) Development of interiorpoint algorithms based on iterative solvers for very large general unstructured problems. 2) Development of interiorpoint algorithms for massive dual blockangular problems. 3) Development and application of improving strategies for the above approaches: preconditioners, regularizations. 4) Solution of the mixed integer CTA problem by procedures like Benders reformulation, Lagrangian relaxation, perspective cuts, coordinate descent.
Results
 J. Castro, A. Frangioni, C. Gentile, Perspective Reformulations of the CTA Problem With $L_2$ Distances. Submitted to Operations Research, 2012, first revision in progress.
 S. Bocanegra, J. Castro, A.R.L. Oliveira, Improving an interiorpoint approach for large blockangular problems by hybrid preconditioners , Technical Report DR 2012/06, Dept. of Statistics and Operations Research, Universitat Politècnica de Catalunya, 2012. Submitted to European Journal of Operational Research, first revision in progress.
 J. Castro, On assessing the disclosure risk of controlled adjustment methods for statistical tabular data, International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems, 20 (2012) 921941 .
 J. Castro, A computational evaluation of optimization solvers for CTA, Lecture Notes in Computer Science, 7556 (2012) 1121, eds. J. DomingoFerrer and I. Tinnirello, © Springer.
 J. Castro, Comparing $L_1$ and $L_2$ distances for CTA, Lecture Notes in Computer Science, 7556 (2012) 3546, eds. J. DomingoFerrer and I. Tinnirello, © Springer.
 J. Castro, J. Cuesta, Solving $L_1$CTA in 3D tables by an interiorpoint method for primal blockangular problems, TOP , doi:10.1007/s117500110247z.
 J. Castro, J. Cuesta, Improving an interiorpoint algorithm for multicommodity flows by quadratic regularizations, Networks, 59 (2012) 117131.
 J. Castro, Recent advances in optimization techniques for statistical tabular data protection, European Journal of Operational Research, 216 (2012) 257269.
 D. Baena, J. Castro, Using the analytic center in the feasibility pump, Operations Research Letters, 39 (2011) 310317.
 J. Castro, J. Cuesta, Quadratic regularizations in an interiorpoint method for primal blockangular problems, Mathematical Programming, 130 (2011) 415445.
 J. Castro, Extending controlled tabular adjustment for nonadditive tabular data with negative protection levels, Statistics and Operations Research Transactions  SORT, 35 (2011) 320.
 J. A. González, J. Castro, A heuristic block coordinate descent approach for controlled tabular adjustment, Computers & Operations Research, 38 (2011) 18261835.
 J. Castro, J. Cuesta, Existence, uniqueness and convergence of the regularized primaldual central path, Operations Research Letters, 38 (2010) 366371.
 J. Castro, J.A. González, A tool for analyzing and fixing infeasible RCTA instances, Lecture Notes in Computer Science, 6344 (2010) 1728, eds. J. DomingoFerrer and E. Magkos, © Springer.
 J. Castro, Statistical disclosure control in tabular data, Privacy and Anonymity in Information Management Systems: New Techniques for New Practical Problem, Springer (2010), 113131. ISBN 9781849962377.
Conferences and Seminar Presentations
 J. Castro, Improving iterative solvers in IPMs: regularizations and hybrid preconditioners, Edinburgh Research Group in Optimization Seminars, School of Mathematics, University of Edinburgh, 10 October 2012. Invited presentation.
 J. Castro, A computational evaluation of optimization solvers for CTA, Privacy in Statistical Databases 2012, Palermo (Italia), September 2012.
 J. Castro, Comparing $L_1$ and $L_2$ distances for CTA, Privacy in Statistical Databases 2012, Palermo (Italia), September 2012.
 J.A. González, J. Castro, Improving the solution of CTA through valid inequalities, 25th European Conference on Operational ResearchEURO 2012, Vilnius University, Vilnius (Lithuania), July 2012. Invited presentation.
 D. Baena, J. Castro, A fix and relax heuristic for controlled tabular adjustment, 25th European Conference on Operational ResearchEURO 2012, Vilnius University, Vilnius (Lithuania), July 2012. Invited presentation.
 J. Castro, A. Frangioni, C. Gentile, Solving $L_2$CTA by perspective reformulations, 25th European Conference on Operational ResearchEURO 2012, Vilnius University, Vilnius (Lithuania), July 2012. Invited presentation.
 J. Cuesta, J. Castro, Solving $L_1$CTA in 3D tables by an interiorpoint method for blockangular problems, 25th European Conference on Operational ResearchEURO 2012, Vilnius University, Vilnius (Lithuania), July 2012. Invited presentation.
 J. Castro, Organization of Stream on Data Confidentiality, 25th European Conference on Operational ResearchEURO 2012, Vilnius University, Vilnius (Lithuania), July 2012.
 J. Castro. Workshop on Techniques for tabular data protection, IDESCAT Workshop , IDESCAT, the National Statistical Institute of Catalonia, Barcelona, May 23 2012. Invited presentation.
 D. Baena, J. Castro, The analytic center feasibility pump, XXXIII Congreso Nacional de Estadística e Investigación Operativa, Madrid, Spain, April 2012.
 J. Castro (in collaboration with A. Frangioni, C. Gentile, and J.A. González), MILP and MIQP optimization problems from the statistical disclosure control field. Optimization, Theory, Algorithms and Applications in Economics. Workshop in applications of Optimization in Engineering, Centre de Recerca Matemàtica (CRM), Bellaterra (Catalonia), October 2011. Invited presentation.
 J. Castro, J.A. González, Present and future research on controlled tabular adjustment. Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, Universitat Rovira i Virgili, Tarragona (Catalonia), October 2011. Invited presentation.
 C. Gentile, J. Castro, A. Frangioni, Minimum Euclidean distance controlled adjustment problems by perspective reformulations, 42th Annual Conference of the Italian Operational Research Society  AIRO 2011 , Brescia (Italia), September 2011.
 J. Castro, Panel discussion Challenges and Applications of MINLP, Exploratory Workshop on Mixed Integer NonLinear Programming, Mathematical Research Institute, Universidad de Sevilla (Spain), December 2010. Invited participant to panel discussion.
 J. Castro, A class of interiorpoint methods for structured problems: theory and applications, Conference on Numerical Optimization and Applications in Engineering, Mathematical Research Center, Universitat Autonoma de Barcelona, Bellaterra (Catalonia), October 2010. Invited presentation.
 J. Castro, J.A. González, A tool for analyzing and fixing infeasible RCTA instances, Privacy in Statistical Databases 2010, Corfu (Greece), September 2010.
 J. Castro, J. Cuesta, Existence, uniqueness and convergence of the regularized primaldual central path, XXXII Congreso Nacional de Estadística e Investigación Operativa, A Coruña, Spain, September 2010.
 J. A. González, J. Castro, Solving tough instances of the controlled tabular adjustment problem, 24th European Conference on Operational ResearchEURO 2010, Session on Largescale Mixed Optimization Problems, Universidade de Lisboa, Lisboa (Portugal), July 2010. Invited presentation.
 J. Castro, J. Cuesta, Improving a class of PCGbased interiorpoint methods by quadratic regularizations, Joint SIMAI/SEMA Conference on Applied and Industrial Mathematics, Minisimposium of Numerical Solution of Large Linear Systems in Numerical Optimization, University of Cagliari, Cagliari (Italia), June 2010. Invited presentation.
 J. Castro, Statistical disclosure control in tabular data, Research seminar, Department of Mathematics, Universitat de Lleida, Lleida (Catalonia), 1 June 2010. Invited presentation.
Technology transfer contracts and related projects
 Contract C08517 with IDESCAT (National Statistical Institute of Catalonia) for safe configuration of tabular data using CTA. 2011.
 Contract C08516 with Statistics Netherlands (to work on a project funded by Eurostat) for improving an implementation of a CTA code for structural business statistics (EURO SBS3). 2011.
 20112014."Data without Boundaries (DwB)" Project FP7 INFRA2010262608, granted by the European Union.
 Contract C08182 with IDESCAT (National Statistical Institute of Catalonia) for safe configuration of tabular data. October 2010January 2011.
 Contract C07809 with Statistics Netherlands (in collaboration with Statistics Germany and to work on a project funded by Eurostat) for extension of a CTA code for statistical confidentiality of European aggregates for animal production statistics. September 1 2009 February 28 2010.
PhD thesis
 D. Baena, Optimal and heuristic approaches for controlled tabular adjustment, advisor J. Castro, in progress.
MSc thesis
 M. Matute, Methods for finding initial points in controlled tabular adjustment, advisor J.A. González, 4 November 2010.
 X. Jiménez, An optimization and modeling environment for large scale blockangular problems, directed by J. Castro, 24 January 2012