This is a computational study of bottlenecks on algebraic varieties. The bottlenecks of a smooth variety X⊆Cn are the lines in Cn which are normal to X at two distinct points. The main result is a numerical homotopy that can be used to approximate all isolated bottlenecks. This homotopy has the optimal number of paths under certain genericity assumptions. In the process we prove bounds on the number of bottlenecks in terms of the Euclidean distance degree. Applications include the optimization problem of computing the distance between two real varieties. Also, computing bottlenecks may be seen as part of the problem of computing the reach of a smooth real variety and efficient methods to compute the reach are still to be developed. Relations to triangulation of real varieties and meshing algorithms used in computer graphics are discussed in the paper. The resulting algorithms have been implemented with Bertini [4] and Macaulay2 [17].
We introduce low complexity bounds on mutual information for efficient privacy-preserving feature selection with secure multi-party computation (MPC). Considering a discrete feature with N possible values and a discrete label with M possible values, our approach requires O(N) multiplications as opposed to O(NM) in a direct MPC implementation of mutual information. Our experimental results show that for regression tasks, we achieve a computation speed up of over 1,000× compared to a straightforward MPC implementation of mutual information, while achieving similar accuracy for the downstream machine learning model.
Variational Autoencoders (VAEs) represent the given data in a low-dimensional latent space, which is generally assumed to be Euclidean. This assumption naturally leads to the common choice of a standard Gaussian prior over continuous latent variables. Recent work has, however, shown that this prior has a detrimental effect on model capacity, leading to subpar performance. We propose that the Euclidean assumption lies at the heart of this failure mode. To counter this, we assume a Riemannian structure over the latent space, which constitutes a more principled geometric view of the latent codes, and replace the standard Gaussian prior with a Riemannian Brownian motion prior. We propose an efficient inference scheme that does not rely on the unknown normalizing factor of this prior. Finally, we demonstrate that this prior significantly increases model capacity using only one additional scalar parameter.
Context:. Drowsiness affects the driver’s cognitive abilities, which are all important for safe driving. Fatigue detection is a critical technique to avoid traffic accidents. Data sharing among vehicles can be used to optimize fatigue detection models and ensure driving safety. However, data privacy issues hinder the sharing process. To tackle these challenges, we propose a Federated Learning (FL) approach for fatigue-driving behavior monitoring. However, in the FL system, the privacy information of the drivers might be leaked. In this paper, we propose to combine the concept of differential privacy (DP) with Federated Learning for the fatigue detection application, in which artificial noise is added to parameters at the drivers’ side before aggregating. This approach will ensure the privacy of drivers’ data and the convergence of the federated learning algorithms. In this paper, the privacy level in the system is determined in order to achieve a balance between the noise scale and the model’s accuracy. In addition, we have evaluated our models resistance against a model inversion attack. The effectiveness of the attack is measured by the Mean Squared Error (MSE) between the reconstructed data point and the training data. The proposed approach, compared to the non-DP case, has a 6% accuracy loss while decreasing the effectiveness of the attacks by increasing the MSE from 5.0 to 7.0, so a balance between accuracy and noise scale is achieved.
We introduce numerical algebraic geometry methods for computing lower bounds on the reach, local feature size, and weak feature size of the real part of an equidimensional and smooth algebraic variety using the variety’s defining polynomials as input. For the weak feature size, we also show that nonquadratic complete intersections generically have finitely many geometric bottlenecks, and we describe how to compute the weak feature size directly rather than a lower bound in this case. In all other cases, we describe additional computations that can be used to determine feature size values rather than lower bounds.
Unidentified devices in a network can result in devastating consequences. It is, therefore, necessary to fingerprint and identify IoT devices connected to private or critical networks. With the proliferation of massive but heterogeneous IoT devices, it is getting challenging to detect vulnerable devices connected to networks. Current machine learning-based techniques for fingerprinting and identifying devices necessitate a significant amount of data gathered from IoT networks that must be transmitted to a central cloud. Nevertheless, private IoT data cannot be shared with the central cloud in numerous sensitive scenarios. Federated learning (FL) has been regarded as a promising paradigm for decentralized learning and has been applied in many different use cases. It enables machine learning models to be trained in a privacy-preserving way. In this article, we propose a privacy-preserved IoT device fingerprinting and identification mechanisms using FL; we call it FL4IoT. FL4IoT is a two-phased system combining unsupervised-learning-based device fingerprinting and supervised-learning-based device identification. FL4IoT shows its practicality in different performance metrics in a federated and centralized setup. For instance, in the best cases, empirical results show that FL4IoT achieves âŒ99% accuracy and F1-Score in identifying IoT devices using a federated setup without exposing any private data to a centralized cloud entity. In addition, FL4IoT can detect spoofed devices with over 99% accuracy.
The increase of the computational power in edge devices has enabled the penetration of distributed machine learning technologies such as federated learning, which allows to build collaborative models performing the training locally in the edge devices, improving the efficiency and the privacy for training of machine learning models, as the data remains in the edge devices. However, in some IoT networks the connectivity between devices and system components can be limited, which prevents the use of federated learning, as it requires a central node to orchestrate the training of the model. To sidestep this, peer-to-peer learning appears as a promising solution, as it does not require such an orchestrator. On the other side, the security challenges in IoT deployments have fostered the use of machine learning for attack and anomaly detection. In these problems, under supervised learning approaches, the training datasets are typically imbalanced, i.e. the number of anomalies is very small compared to the number of benign data points, which requires the use of re-balancing techniques to improve the algorithms’ performance. In this paper, we propose a novel peer-to-peer algorithm,P2PK-SMOTE, to train supervised anomaly detection machine learning models in non-IID scenarios, including mechanisms to locally re-balance the training datasets via synthetic generation of data points from the minority class. To improve the performance in non-IID scenarios, we also include a mechanism for sharing a small fraction of synthetic data from the minority class across devices, aiming to reduce the risk of data de-identification. Our experimental evaluation in real datasets for IoT anomaly detection across a different set of scenarios validates the benefits of our proposed approach.
Federated Learning (FL) has emerged as a powerful paradigm to train collaborative machine learning (ML) models, preserving the privacy of the participants’ datasets. However, standard FL approaches present some limitations that can hinder their applicability in some applications. Thus, the need of a server or aggregator to orchestrate the learning process may not be possible in scenarios with limited connectivity, as in some IoT applications, and offer less flexibility to personalize the ML models for the different participants. To sidestep these limitations, peer-to-peer FL (P2PFL) provides more flexibility, allowing participants to train their own models in collaboration with their neighbors. However, given the huge number of parameters of typical Deep Neural Network architectures, the communication burden can also be very high. On the other side, it has been shown that standard aggregation schemes for FL are very brittle against data and model poisoning attacks. In this paper, we propose SparSFA, an algorithm for P2PFL capable of reducing the communication costs. We show that our method outperforms competing sparsification methods in P2P scenarios, speeding the convergence and enhancing the stability during training. SparSFA also includes a mechanism to mitigate poisoning attacks for each participant in any random network topology. Our empirical evaluation on real datasets for intrusion detection in IoT, considering both balanced and imbalanced-dataset scenarios, shows that SparSFA is robust to different indiscriminate poisoning attacks launched by one or multiple adversaries, outperforming other robust aggregation methods whilst reducing the communication costs through sparsification.