top of page
Photo du rédacteurAdvestis

Integrated sources model: A new space-learning model for heterogeneous multi-view data reduction, visualization, and clustering

P. Fogel, C. Geissler, F. Augé, G. Boldina, and George Luta. In "Artificial Intelligence in Health", July 2024.



Abstract: In machine learning, multi-view data involve multiple distinct sets of attributes (“views”) for a common set of observations; when each view has the same attributes considered in different contexts, the data are said to contain multiple views of homogeneous format, which can be conceptualized as a tensor. In this article, we describe a novel approach for integrating multiple views of heterogeneous format into a common latent space using a workflow that involves non-negative matrix and tensor factorization (NMF/NTF). This approach, which we refer to as the integrated sources model (ISM), consists of two main steps: Embedding and analysis. In the embedding step, the views are transformed into matrices with common non-negative components. In the analysis step, the transformed views are combined into a tensor and decomposed using NTF. We also present a variant of ISM; the integrated latent sources model (ILSM), which offers significant advantages over ISM in terms of computational power and in cases where the views are highly unbalanced with regard to the number of attributes per view. Noteworthy, ISM can be extended to process multi-omic and multi-view datasets even in the presence of missing views. We provide a proof-of-concept analysis using five examples, including the UCI Digits (the University of California Irvine Pen-Based Recognition of Handwritten Digits) dataset, a public cell-type gene signatures dataset, and a multi-omic single-cell dataset. These examples demonstrate that, in most cases, multi-view clustering is better achieved with ISM or its variant ILSM than with other latent space approaches. We also show how the non-negativity and sparsity of the ISM model components enable straightforward interpretations, in contrast to other approaches that involve latent factors of mixed signs. Finally, we present potential applications to single-cell multi-omics and spatial mapping, including spatial imaging, spatial transcriptomics, and computational biology, which are currently under evaluation. ISM relies on state-of-the-art algorithms invoked through a simple workflow implemented in Python.


Yorumlar


bottom of page