NOML
Nonlinear Optimization and Machine Learning III

Invited Session
Time Slot: Thursday Morning
Room: 003
Chair: Marco Sciandrone

Model Extraction based on Counterfactual Explanations

Time: 11:30

Cecilia Salvatore (Department of Civil Engineering and Computer Science, University of Rome Tor Vergata, Rome), Piccialli Veronica

Automated decision-making classification systems based on Machine Learning algorithms are often used in many real-life scenarios such as healthcare, credit, or criminal justice. There is thus increasing interest in making Machine Learning systems trustworthy: interpretability, robustness, and fairness are often essential requirements for the deployment of these systems. In particular, according to the European Union’s General Data Protection Regulation (GDPR), automated decision-making systems should guarantee the ”right to explanations”, meaning that those affected by the decision may require an explanation. Counterfactual Explanations are becoming a de-facto standard for a post-hoc explanation. Given an instance of a classification problem, belonging to a class, its counterfactual explanation corresponds to small perturbations of that instance that allow changing the classification outcome. The objective of this work is to try and exploit the information revealed by a small set of examples with their counterfactual explanations to build a surrogate model of the classification system. The idea is to define an optimization problem that provides in output a Forest of Optimal Trees as close as possible to the original classification model, given the information derived from the counterfactual points. This tool can be used either to attack the original model or to improve it, depending on the application context. Preliminary results show the viability of this approach.

A Robust Initialization of Residual Blocks for Effective ResNet Training without Batch Normalization

Time: 11:50

Enrico Civitelli (Università degli Studi di Firenze), Sortino Alessio, Lapucci Matteo, Bagattini Francesco, Galvan Giulio

Batch Normalization is an essential component of all state-of-the-art neural networks architectures. However, since it introduces many practical issues, much recent research has been devoted to designing normalization-free architectures. In this paper, we show that weights initialization is key to train ResNet-like normalization-free networks. In particular, we propose a slight modification to the summation operation of a block output to the skip-connection branch, so that the whole network is correctly initialized. We show that this modified architecture achieves competitive results on CIFAR-10 without further regularization nor algorithmic modifications.

Semi-supervised learning in multilayer hypergraphs

Time: 12:10

Sara Venturini (University of Padua), Rinaldi Francesco, Tudisco Francesco, Cristofari Andrea

A variety of complex systems has been successfully described as networks whose interacting pairs of nodes are connected by links. However, in real-world applications, we need to describe interactions in more detailed and varied ways. Many real systems display collective actions of groups of nodes and simplicial complexes or hypergraphs are the natural candidates to describe these group interactions. Moreover, often we can measure different types of relationships between the same nodes, and multilayer networks provide the mathematical formalism to describe these multi-channel interactions. Evidence show that each of these tools can improve modelling capacities with respect to standard graphs. However, relatively few studies simultaneously consider multilayer and higher-order structures in complex networks so far. This is also due to the fact that such a sophisticated structure comes at the cost of an additional complexity and a harder treatability. In this work we take a first step in the study of semi-supervised learning over multilayer hypergraphs, trying to deal with this additional complexity.

Robustness and Generalization in Training Deep Neural Networks

Time: 12:30

Leonardo Galli (RWTH Aachen), Rauhut Holger, Schmidt Mark

Deep Learning (DL) models are nowadays the key to achieve state-of-the-art performances in a wide range of real problems. Despite their extensive use, the understanding of DL is limited and the theoretical guarantees are few. Moreover, performances of DL models are very sensible to various training algorithmic choices, for instance and above all, the learning rate. The robust selection of the learning rate is indeed a very central topic in DL and it is still one of the main concern of optimization researchers working in this area. In this project, we will study nonmonotone Stochastic Gradient Descent (SGD) algorithms to improve the robustness of the training method w.r.t. the learning rate. Despite these methods are a very natural choice for highly nonlinear models, this is the first time in which they are applied on the top of a SGD-like algorithm. A new theory has been developed to adapt the convergence rates of deterministic nonmonotone methods to the stochastic setting. Numerical results are showing that the intrinsic properties of nonmonotone methods are extremely well suited when dealing with large-scale highly-nonlinear stochastic problems.