News

  • 03-11-2023

    Talk @CSE, University of Ioannina

    I will be giving a talk at the Department of Computer Science and Engineering of the University of Ioannina on LLMs for Greek.
  • 15-09-2023

    Paper accepted at CIKM 2023

    Our paper "On the Trade-off between Over-smoothing and Over-squashing in Deep Graph Neural Networks" has been accepted for publication at the CIKM 2023. You can check out the paper here.
  • 15-05-2023

    Paper accepted at IAICT 2023

    Our paper "Digital Twin for Automated Industrial Optimization: Intelligent Machine Selection via Process Modelling" has been accepted for publication at the IAICT 2023. You can check out the paper here.
  • 30-03-2023

    Paper accepted at Nature Scientific Reports Journal

    Our paper "Predicting COVID-19 positivity and hospitalization with multi-scale graph neural networks" has been accepted for publication at the prestigious Nature Scientific Reports journal. You can check out the paper here.

Education

© Konstantinos Skianis, powered by Bootstrap, last updated:

Topics

  • Natural Language Processing, Text Mining
  • Machine Learning, Deep Learning
  • Graph Mining, Network Science
  • Big Data, Data Science

Reviews

  • 2023: ACL
  • 2019: LLD @ICLR, NAACL
  • 2018: NIPS, WWW, EMNLP, ECML-PKDD
  • 2017: AAAI, LLD @NIPS, EMNLP
  • 2016: AAAI, WSDM, IC2S2

Code

On the Trade-off between Over-smoothing and Over-squashing in Deep Graph Neural Networks

Jhony Giraldo, Konstantinos Skianis, Thierry Bouwmans, Fragkiskos Malliaros
Conference CIKM 2023

Abstract

Graph Neural Networks (GNNs) have succeeded in various computer science applications, yet deep GNNs underperform their shallow counterparts despite deep learning's success in other domains. Over-smoothing and over-squashing are key challenges when stacking graph convolutional layers, hindering deep representation learning and information propagation from distant nodes. Our work reveals that over-smoothing and over-squashing are intrinsically related to the spectral gap of the graph Laplacian, resulting in an inevitable trade-off between these two issues, as they cannot be alleviated simultaneously. To achieve a suitable compromise, we propose adding and removing edges as a viable approach. We introduce the Stochastic Jost and Liu Curvature Rewiring (SJLR) algorithm, which is computationally efficient and preserves fundamental properties compared to previous curvature-based methods. Unlike existing approaches, SJLR performs edge addition and removal during GNN training while maintaining the graph unchanged during testing. Comprehensive comparisons demonstrate SJLR's competitive performance in addressing over-smoothing and over-squashing.

Digital Twin for Automated Industrial Optimization: Intelligent Machine Selection via Process Modelling

Konstantinos Skianis, Anastasios Giannopoulos, Alexandros Kalafatelis, Panagiotis Trakadas
Conference IAICT 2023

Abstract

Digital Twin (DT) is an emerging paradigm that enables a virtual model to effectively represent a physical process. In this paper, we present the adoption of the DT scheme by an offset printing company towards industrial optimization. The considered DT model is a virtual representation that serves as the digital copy of the physical printing process within an industrial unit. A virtual model for selecting the optimal machine line was developed to ensure cost-efficient printing. The machine line selection process was modeled as a decision process and then analyzed through simulations in a safe and cost-efficient digital environment, provided by the DT. Moreover, Machine Learning (ML) models were exploited to extract knowledge for the machine selection task, taking full advantage of the DT experiment. Based on real data and selection policies of a printing enterprise, the results revealed an improvement during the selection process, followed by a 5% cost reduction on the examined dataset.

SWIRL: Statistical downscaling for Wind Pattern Reconstruction using Machine Learning

Konstantinos Skianis, Anastasios Giannopoulos, Sotiris Spantideas, Maria Hatzaki, Aikaterini Karditsa, Panagiotis Trakadas
Conference CEST 2023

Abstract

Ports are critical infrastructures for global supply chains, crucial hubs and strategic to future trade. However, they are particularly exposed to Climate Change (CC) impacts, estimated to have broad implications on economy and human welfare. Therefore, a timely introduction of adaptation measures addressing CC impacts on ports becomes a major priority and can be proactive if based on projected climate. Yet, this challenge requires high spatial resolution timeseries for the present and the projected climate which are frequently missing. Moreover, employed downscaling procedures are not always skillful, particularly for extremely complex wind fields. The scope of this study is the development of reliable high-resolution wind speed/direction timeseries through Machine Learning (ML) techniques application. The employed ML regression schemes exploit ECMWF-ERA5 Reanalysis data as input training dataset (10931 instances) for Heraclion port area (Crete-Greece), containing 1 site of interest and 4 peripheral (period 1975-2004). Analytical simulations were conducted towards evaluating the regression accuracy on test data in terms of the Mean Absolute Error (MAE). Study outcomes revealed that ML techniques can efficiently reconstruct wind speed/direction timeseries, contributing to the wind downscaling and reconstruction problem, capable of supporting stakeholders needs on port scale regarding CC adaptation.

Predicting COVID-19 positivity and hospitalization with multi-scale graph neural networks

Konstantinos Skianis, Giannis Nikolentzos, Benoit Gallix, Rodolphe Thiebaut, Georgios Exarchakis
Journal Nature Scientific Reports 2023

Abstract

The pandemic of COVID‑19 is undoubtedly one of the biggest challenges for modern healthcare. In order to analyze the spatio‑temporal aspects of the spread of COVID‑19, technology has helped us to track, identify and store information regarding positivity and hospitalization, across different levels of municipal entities. In this work, we present a method for predicting the number of positive and hospitalized cases via a novel multi‑scale graph neural network, integrating information from fine‑scale geographical zones of a few thousand inhabitants. By leveraging population mobility data and other features, the model utilizes message passing to model interaction between areas. Our proposed model manages to outperform baselines and deep learning models, presenting low errors in both prediction tasks. We specifically point out the importance of our contribution in predicting hospitalization since hospitals became critical infrastructure during the pandemic. To the best of our knowledge, this is the first work to exploit high‑resolution spatio‑temporal data in a multi‑scale manner, incorporating additional knowledge, such as vaccination rates and population mobility data. We believe that our method may improve future estimations of positivity and hospitalization, which is crucial for healthcare planning.

Towards Fine-Dining Recipe Generation with Generative Pre-trained Transformers

Konstantinos Katserelis, Konstantinos Skianis
Journal arxiv 2022

Abstract

Food is essential to human survival. So much so that we have developed different recipes to suit our taste needs. In this work, we propose a novel way of creating new, fine-dining recipes from scratch using Transformers, specifically auto-regressive language models. Given a small dataset of food recipes, we try to train models to identify cooking techniques, propose novel recipes, and test the power of fine-tuning with minimal data. Code and data can be found here.

Moderate COVID-19: Clinical Trajectories and Predictors of Progression and Outcomes

Apostolos G. Pappas, Andreas Panagopoulos, Artemis Rodopoulou, Michaella Alexandrou, Anna-Louiza Chaliasou, Konstantinos Skianis, Eleftheria Kranidioti, Eleftheria Chaini, Ilias Papanikolaou, Ioannis Kalomenidis
Journal Journal of Personalized Medicine 2022

Abstract

Background: Patients with COVID-19 commonly present at healthcare facilities with moderate disease, i.e., pneumonia without a need for oxygen therapy. Aim: To identify clini- cal/laboratory characteristics of patients with moderate COVID-19, which could predict disease progression. Methods: 384 adult patients presented with moderate COVID-19 and admitted to two hospitals were retrospectively evaluated. In a multivariate analysis gender, age, BMI, Charlson Comorbidity Index (CCI) and National Early Weaning Score 2 were treated as co-variates. The development of hypoxemic respiratory failure, intubation rate and risk of death were considered as dependent variables. Estimated values are presented as odds-ratio (OR) with 95% confidence interval (CI). Results: Most of the patients were male (63.28%) with a mean (standard deviation) age of 59 (16.04) years. Median (interquartile range) CCI was 2 (1–4). A total of 58.85% of the patients developed respiratory failure; 6.51% were intubated, and 8.85% died. The extent of pneumonia in chest X-ray (involvement of all four quartiles) [OR 3.96 (1.18–13.27), p = 0.026], respiratory rate [OR 1.17 (1.05–1.3), p = 0.004], SatO2 [OR 0.72 (0.58–0.88), p = 0.002], systolic blood pressure [OR 1.02 (1–1.04), p = 0.041] and lymphocyte count [OR 0.9993 (0.9986–0.9999), p = 0.026] at presentation were associated with the development of respiratory failure. The extent of pneumonia [OR 26.49 (1.81–387.18), p = 0.017] was associated with intubation risk. Age [OR 1.14 (1.03–1.26), p = 0.014] and the extent of pneumonia [OR 22.47 (1.59–316.97), p = 0.021] were associated with increased risk of death. Conclusion: Older age, the extent of pneumonia, tachypnea, lower SatO2, higher systolic blood pressure and lymphopenia are associated with dismal outcomes in patients presenting with moderate COVID-19.

Evaluation of Machine Translation Methods applied to Medical Terminologies

Konstantinos Skianis, Florent Desgrippes, Yann Briand
Workshop LOUHI 2020

Abstract

Medical terminologies resources and standards play vital roles in clinical data exchanges, enabling significantly the services’ interoperability within healthcare national information networks. Health and medical science are constantly evolving causing requirements to advance the terminologies editions. In this paper, we present our evaluation work of the latest machine translation techniques addressing medical terminologies. Experiments have been conducted with use of statistical and neural machine translation methods. The devised procedure is tested on a validated sample of ICD-11 and ICF terminologies from English to French with promising results.

Boosting Tricks for Word Mover’s Distance

Konstantinos Skianis, Fragkiskos D. Malliaros, Nikolaos Tziortziotis, Michalis Vazirgiannis
Conference ICANN 2020

Abstract

Word embeddings have opened a new path in creating novelapproaches for addressing traditional problems in the natural language processing (NLP) domain. However, using word embeddings to compare text documents remains a relatively unexplored topic — with Word Mover’s Distance (WMD) being the prominent tool used so far. In this paper, we present a variety of tools that can further improve the computation of distances between documents based on WMD. We demonstrate that, alternative stopwords, cross document-topic comparison, deep contextualized word vectors and convex metric learning, constitute powerful tools that can boost WMD.

Rep the Set: Neural Networks for Learning Set Representations

Konstantinos Skianis, Giannis Nikolentzos, Stratis Limnios, Michalis Vazirgiannis
Conference AISTATS 2020, Palermo, Italy

Abstract

In several domains, data objects can be decomposed into sets of simpler objects. It is then natural to represent each object as the set of its components or parts. Many conventional machine learning algorithms are unable to process this kind of representations, since sets may vary in cardinality and elements lack a meaningful ordering. In this paper, we present a new neural network architecture, called RepSet, that can handle examples that are represented as sets of vectors. The proposed model computes the correspondences between an input set and some hidden sets by solving a series of network flow problems. This representation is then fed to a standard neural network architecture to produce the output. The architecture allows end-to-end gradient-based learning. We demonstrate RepSet on classification tasks, including text categorization, and graph classification, and we show that the proposed neural network achieves performance better or comparable to state-of-the-art algorithms.

Evaluation of Greek Word Embeddings

Stamatis Outsios, Christos Karatsalos, Konstantinos Skianis, Michalis Vazirgiannis
Conference LREC 2020, Marseille, France

Abstract

Since word embeddings have been the most popular input for many NLP tasks, evaluating their quality is of critical importance. Most research efforts are focusing on English word embeddings. This paper addresses the problem of constructing and evaluating such models for the Greek language. We created a new word analogy corpus considering the original English Word2vec word analogy corpus and some specific linguistic aspects of the Greek language as well. Moreover, we created a Greek version of WordSim353 corpora for a basic evaluation of word similarities. We tested seven word vector models and our evaluation showed that we are able to create meaningful representations. Last, we discovered that the morphological complexity of the Greek language and polysemy can influence the quality of the resulting word embeddings.

GraKeL: A Graph Kernel Library in Python

Giannis Siglidis, Giannis Nikolentzos, Stratis Limnios, Christos Giatsidis, Konstantinos Skianis, Michalis Vazirgiannis
Journal JMLR 2019

Abstract

The problem of accurately measuring the similarity between graphs is at the core of many applications in a variety of disciplines. Graph kernels have recently emerged as a promising approach to this problem. There are now many kernels, each focusing on different structural aspects of graphs. Here, we present GraKeL, a library that unifies several graph kernels into a common framework. The library is written in Python and is build on top of scikit-learn. It is simple to use and can be naturally combined with scikit-learn's modules to build a complete machine learning pipeline for tasks such as graph classification and clustering. The code is BSD licensed and is available here.

Scientometrics for Success and Influence in the Microsoft Academic Graph

George Panagopoulos, Christos Xypolopoulos, Konstantinos Skianis, Christos Giatsidis, Jie Tang, Michalis Vazirgiannis
Conference Complex Networks 2019

Abstract

Measuring and evaluating an author’s impact has been a withstanding challenge in the academic world with profound effects on society. Apart from its practical usage for academic evaluation, it enhances transparency and reinforces scientific excellence. In this demo paper we present our efforts to address this problem capitalizing on the field-based citations and the author oriented citation network extracted from the Microsoft Academic Graph, to our knowledge the largest network of its kind. We separate impact into two dimensions: success and influence over the network, and provide two novel scientometrics to quantify some of their aspects: (i) the distribution of the h-index for specific scientific fields and a search engine to visualize an authors’ position in it as well as the top percentile she belongs to, (ii) recomputing our previously introduced D-core influence metric on this huge network and presenting authority/integration of the authors in the form of D-core frontiers. In addition we present interesting insights on the most dense scientific domains and the most influential authors. We believe the proposed analytics highlight under-examined aspects in the area of scientific evaluation and pave the way for more involved scientometrics.

Word Embeddings from Large-scale Greek Web Content

Stamatis Outsios, Konstantinos Skianis, Polykarpos Meladianos, Christos Xypolopoulos, Michalis Vazirgiannis
Resources/Demo arXiv 2018

Abstract

Word embeddings are undoubtedly very useful components in many NLP tasks. In this paper, we present word embeddings and other linguistic resources trained on the largest to date digital Greek language corpus. We also present a live web tool for testing the Greek word embeddings, by offering “analogy”, “similarity score” and “most similar words” functions. Through our explorer, one could interact with the Greek word vectors.

Orthogonal Matching Pursuit for Text Classification

Konstantinos Skianis, Nikolaos Tziortziotis, Michalis Vazirgiannis
Workshop W-NUT, EMNLP 2018, Brussels, Belgium

Abstract

In text classification, the problem of overfitting arises due to the high dimensionality, making regularization essential. Although classic regularizers provide sparsity, they fail to return highly accurate models. On the contrary, state-of-the-art group-lasso regularizers provide better results at the expense of low sparsity. In this paper, we apply a greedy variable selection algorithm, called Orthogonal Matching Pursuit, for the text classification task. We also extend standard Group OMP by introducing overlapping group OMP to handle overlapping groups of features. Empirical analysis verifies that both OMP and overlapping GOMP constitute powerful regularizers, able to produce effective and very sparse models.

Fusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification

Konstantinos Skianis, Fragkiskos Malliaros, Michalis Vazirgiannis
Workshop TextGraphs, NAACL 2018, New Orleans, USA [Best Paper Award]

Abstract

Contrary to the traditional Bag-of-Words approach, we consider the Graph-of-Words (GoW) model in which each document is represented by a graph that encodes relationships between the different terms. Based on this formulation, the importance of a term is determined by weighting the corresponding node in the document, collection and label graphs, using node centrality criteria. We also introduce novel graph-based weighting schemes by enriching graphs with word-embedding similarities, in order to reward or penalize semantic relationships. Our methods produce more discriminative feature weights for text categorization, outperforming existing frequency-based criteria.

Kernel Graph Convolutional Neural Networks

Giannis Nikolentzos, Polykarpos Meladianos, Antoine J.-P. Tixier, Konstantinos Skianis, Michalis Vazirgiannis
Conference ICANN 2018, Rhodes, Greece

Abstract

Graph kernels have been successfully applied to many graph classification problems. Typically, a kernel is first designed, and then an SVM classifier is trained based on the features defined implicitly by this kernel. This two-stage approach decouples data representation from learning, which is suboptimal. On the other hand, Convolutional Neural Networks (CNNs) have the capability to learn their own features directly from the raw data during training. Unfortunately, they cannot handle irregular data such as graphs. We address this challenge by using graph kernels to embed meaningful local neighborhoods of the graphs in a continuous vector space. A set of filters is then convolved with these patches, pooled, and the output is then passed to a feedforward network. With limited parameter tuning, our approach outperforms strong baselines on 7 out of 10 benchmark datasets.

Data Management for the World Wealth & Income Database

Christos Giatsidis, Antonis Skandalis, Konstantinos Skianis, Michalis Vazirgiannis
Facundo Alvaredo, Lucas Chancel, Thomas Piketty, Emmanuel Saez, Gabriel Zucman
Ben Grillet, Francois Prosper, Brice Terdjman, Anthony Veyssiere
Conference ParisBD 2017, Paris, France

Abstract

The world wealth & income database (http://wid.world) is a project that publishes data related to inequality over the world. It is an open portal that distributes time series about economic concepts such as wealth, income, etc., where users can select time series of interest based on multi-attribute queries. Most of the components of the project are hosted on the cloud using Amazon Web Services. The data management functionality of the project consists of two major parts: a) a modern relational database that uses state of the art indexing techniques along with JSON features and b) a web API through which the database can be accessed and where some data transformations take place. To reduce latency across the globe the project is currently deployed in two sites (EU and US). Currently the database holds data for 319 geographical regions (countries, continents, states), about 150 combinations of attributes for each region over 50 years on average. The project is a joint collaboration between the World Inequality Lab at Paris School of Economics, DaSciM team at Laboratoire d’Informatique de l’X and WEDODATA.

SpreadViz: Analytics and Visualization of Spreading Processes in Social Networks

Konstantinos Skianis, Maria Evgenia G. Rossi, Fragkiskos D. Malliaros, Michalis Vazirgiannis
Demo ICDM 2016, Barcelona, Spain

Abstract

In this paper, we propose SpreadViz, a web tool for exploration and visualization of spreading properties in social networks. SpreadViz consists of three main modules, namely graph exploration and analytics, detection of influential nodes, and interactive visualization. More precisely, SpreadViz offers the following functionalities: (i) It computes and visualizes various centrality criteria towards understanding how the position of a node in the network affects its spreading properties; (ii) It offers a wide range of criteria for the detection of single and multiple influential nodes and comparison among them; (iii) It effectively visualizes the spread of influence in the network as well as the performance of each method. In our demonstration, we invite the audience to interact with SpreadViz, exploring, analyzing, and visualizing the spreading processes over various real-world social networks.

Regularizing Text Categorization with Clusters of Words

Konstantinos Skianis, Francois Rousseau, Michalis Vazirgiannis
Conference EMNLP 2016, Austin, USA

Abstract

Regularization is a critical step in any supervised learning problem and crucial for addressing not only overfitting, but also taking into account any prior knowledge we may have on the problem features and their relationships. In this paper we explore state-of-the-art structured regularizers for textual data and we propose novel ones based on topics from LSI and clusters from word2vec and graph-of-words document representation. We show that for text categorization our proposed regularizers are faster than the state-of-the-art ones while they improve classification accuracy.

GoWvis: A web application for Graph-of-Words-based text visualization and summarization

Antoine J.-P. Tixier, Konstantinos Skianis, Michalis Vazirgiannis
Demo ACL 2016, Berlin, Germany

Abstract

We introduce GoWvis, an interactive web application that represents any piece of text inputted by the user as a Graph-of-Words and leverages graph degeneracy and community detection to generate an extractive summary (keyphrases and paragraph) of the inputted text in an unsupervised fashion. The entire analysis can be fully customized via the tuning of many text preprocessing, graph building, and graph mining parameters. Our system is thus well suited to educational purposes, exploration and early research experiments. The new summarization strategy we propose also shows promise.

Graph-Based Term Weighting for Text Categorization

Fragkiskos Malliaros, Konstantinos Skianis
Workshop Someris, ASONAM 2015, Paris, France

Abstract

Text categorization is an important task with plenty of applications, ranging from sentiment analysis to automated news classification. In this paper, we introduce a novel graph-based approach for text categorization. Contrary to the traditional Bag-of-Words model for document representation, we consider a model in which each document is represented by a graph that encodes relationships between the different terms. The importance of a term to a document is indicated using graphtheoretic node centrality criteria. The proposed weighting scheme is able to meaningfully capture the relationships between the terms that co-occur in a document, creating feature vectors that can improve the categorization task. We perform experiments in well-known document collections, applying popular classification algorithms. Our preliminary results indicate that the proposed graph-based weighting mechanism is able to outperform existing frequency-based term weighting criteria, under appropriate parameter setting.

Data Mining and Science - Full course - 2022

Undergraduate course, Athens University of Economics and Business

This course provides a concise introduction to data science, an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data and apply knowledge from data to a wide range of application domains. The course is accompanied by hands-on problem solving with Python programming and aims at familiarity with data set analysis techniques/algorithms and software tools in the field of knowledge mining and machine learning and deep learning and preparation for participating in data challenges.


Learning for Text and Graph Data - Labs - (2016-2017)

Licence, Master M2, ENS Paris-Saclay (Labs)

The courses aim at providing an introduction to advanced machine learning and combinatorial methods aiming at large scale text and graph data. The courses syllabus included:

  • Advanced graph kernels and classification,clustering / community mining (Louvain, modularity, degeneracy)
  • Influence maximization models (SIR/SIS, LT, IC,…), degeneracy based spreaders selection
  • Graph of words advanced topics: tw-icw, graph kernels for document similarity, graph based regularization for text classification
  • Word embeddings, Unsupervised document classification with the Word Mover’s Distance, WMD vs cosine similarity
  • Deep learning for NLP, Supervised document classification (TF-IDF vs TW-IDF)
  • Keyword extraction for summarization: Graph based keyword extraction, summarization (off line, online), Filipova’s word graph for multi-sentence fusion

  • April 2019

    6th place, National Bank of Greece Race 2019

    National Bank of Greece

    Training Greek word embeddings from legal texts. You can find more info here.

  • April 2016

    2nd place, Fintech Crowdhackathon 2016

    National Bank of Greece

    Our team (RSK project) is the 2nd winner in the Fintech Crowdhackathon organized by the National Bank of Greece. We made a platform to detect fraud e-transactions based on Deep Learning. You can find more info here.

  • January 2015

    2nd place, Dreem challenge 2015

    Inclass Kaggle

    Trying to analyze dreams. During deep sleep, crucial mechanisms occur: memory consolidation, cellular regeneration, growth hormone release or biologic clock reset. Lacking deep sleep impairs memory, focus and judgment during work. DREEM introduces a way to increase the duration and quality of deep sleep to ensure optimal performances.

  • May 2013

    6th place, Data Mining Cup 2013

    Prudsys

    Our team from the Department of Informatics, consisted of undergraduate students G.Papoutsakis, G.Zografos, G.Theofilis, myself and Phd students M. Karkali and S. Thomaidou, took the 6th place in the Data Mining Cup 2013 competition.

    The participations reached 99 out of 77 universities all over the world.

  • April 2013

    Top 25%, Employee Access Challenge 2013

    Kaggle

    The objective of this competition is to build a model, learned using historical data, that will determine an employee's access needs, such that manual access transactions (grants and revokes) are minimized as the employee's attributes change over time. The model will take an employee's role information and a resource code and will return whether or not access should be granted.

Department of Computer Science and Engineering
Ioannina
45110 Epirus, Greece
  • kskianis _at_ cse.uoi.gr
  • skianis.konstantinos _at_ gmail.com
  • kostas.skianis
  • #kskianis
  • My LinkedIn profile