• 01-10-2020

    Paper accepted at LOUHI 2020

    Our paper "Evaluation of Machine Translation Methods applied to Medical Terminologies" has been accepted for presentation at the 11th International Workshop on Health Text Mining and Information Analysis (LOUHI), colocated with EMNLP 2020! You can check out the paper here.
  • 31-07-2020

    Paper accepted at ICANN 2020

    Our paper "Boosting Tricks for Word Mover’s Distance" has been accepted for presentation at the 29th International Conference on Artificial Neural Networks (ICANN2020)! You can check out the paper here.
  • 11-02-2020

    Paper accepted at LREC 2020

    Our paper "Evaluation of Greek Word Embeddings" has been accepted for presentation at the 12th International Conference on Language Resources and Evaluation (LREC), held in Marseille, France! You can check out the paper here and play with the resources here.
  • 07-01-2020

    Paper accepted at AISTATS 2020

    Our paper entitled "Rep the Set: Neural Networks for Learning Set Representations" has been accepted for presentation at AISTATS, held in Palermo. Check out the paper here.
  • 05-10-2019

    Machine translation for medical terminology

    I am currently working on machine translation for medical terminology from english to french. We will report and publish methodology and resources for the task soon.


© Konstantinos Skianis, powered by Bootstrap, last updated:


  • Natural Language Processing, Text Mining
  • Machine Learning, Deep Learning
  • Graph Mining, Network Science
  • Big Data, Data Science


  • 2019: LLD @ICLR, NAACL
  • 2017: AAAI, LLD @NIPS, EMNLP
  • 2016: AAAI, WSDM, IC2S2


Evaluation of Machine Translation Methods applied to Medical Terminologies

Konstantinos Skianis, Florent Desgrippes, Yann Briand
Workshop Paper LOUHI 2020


Medical terminologies resources and standards play vital roles in clinical data exchanges, enabling significantly the services’ interoperability within healthcare national information networks. Health and medical science are constantly evolving causing requirements to advance the terminologies editions. In this paper, we present our evaluation work of the latest machine translation techniques addressing medical terminologies. Experiments have been conducted with use of statistical and neural machine translation methods. The devised procedure is tested on a validated sample of ICD-11 and ICF terminologies from English to French with promising results.

Boosting Tricks for Word Mover’s Distance

Konstantinos Skianis, Fragkiskos D. Malliaros, Nikolaos Tziortziotis, Michalis Vazirgiannis
Conference Paper ICANN 2020


Word embeddings have opened a new path in creating novelapproaches for addressing traditional problems in the natural language processing (NLP) domain. However, using word embeddings to compare text documents remains a relatively unexplored topic — with Word Mover’s Distance (WMD) being the prominent tool used so far. In this paper, we present a variety of tools that can further improve the computation of distances between documents based on WMD. We demonstrate that, alternative stopwords, cross document-topic comparison, deep contextualized word vectors and convex metric learning, constitute powerful tools that can boost WMD.

Rep the Set: Neural Networks for Learning Set Representations

Konstantinos Skianis, Giannis Nikolentzos, Stratis Limnios, Michalis Vazirgiannis
Conference Paper AISTATS 2020, Palermo, Italy


In several domains, data objects can be decomposed into sets of simpler objects. It is then natural to represent each object as the set of its components or parts. Many conventional machine learning algorithms are unable to process this kind of representations, since sets may vary in cardinality and elements lack a meaningful ordering. In this paper, we present a new neural network architecture, called RepSet, that can handle examples that are represented as sets of vectors. The proposed model computes the correspondences between an input set and some hidden sets by solving a series of network flow problems. This representation is then fed to a standard neural network architecture to produce the output. The architecture allows end-to-end gradient-based learning. We demonstrate RepSet on classification tasks, including text categorization, and graph classification, and we show that the proposed neural network achieves performance better or comparable to state-of-the-art algorithms.

Evaluation of Greek Word Embeddings

Stamatis Outsios, Christos Karatsalos, Konstantinos Skianis, Michalis Vazirgiannis
Conference Paper LREC 2020, Marseille, France


Since word embeddings have been the most popular input for many NLP tasks, evaluating their quality is of critical importance. Most research efforts are focusing on English word embeddings. This paper addresses the problem of constructing and evaluating such models for the Greek language. We created a new word analogy corpus considering the original English Word2vec word analogy corpus and some specific linguistic aspects of the Greek language as well. Moreover, we created a Greek version of WordSim353 corpora for a basic evaluation of word similarities. We tested seven word vector models and our evaluation showed that we are able to create meaningful representations. Last, we discovered that the morphological complexity of the Greek language and polysemy can influence the quality of the resulting word embeddings.

GraKeL: A Graph Kernel Library in Python

Giannis Siglidis, Giannis Nikolentzos, Stratis Limnios, Christos Giatsidis, Konstantinos Skianis, Michalis Vazirgiannis
Software JMLR 2019


The problem of accurately measuring the similarity between graphs is at the core of many applications in a variety of disciplines. Graph kernels have recently emerged as a promising approach to this problem. There are now many kernels, each focusing on different structural aspects of graphs. Here, we present GraKeL, a library that unifies several graph kernels into a common framework. The library is written in Python and is build on top of scikit-learn. It is simple to use and can be naturally combined with scikit-learn's modules to build a complete machine learning pipeline for tasks such as graph classification and clustering. The code is BSD licensed and is available here.

Scientometrics for Success and Influence in the Microsoft Academic Graph

George Panagopoulos, Christos Xypolopoulos, Konstantinos Skianis, Christos Giatsidis, Jie Tang, Michalis Vazirgiannis
Conference Paper Complex Networks 2019


Measuring and evaluating an author’s impact has been a withstanding challenge in the academic world with profound effects on society. Apart from its practical usage for academic evaluation, it enhances transparency and reinforces scientific excellence. In this demo paper we present our efforts to address this problem capitalizing on the field-based citations and the author oriented citation network extracted from the Microsoft Academic Graph, to our knowledge the largest network of its kind. We separate impact into two dimensions: success and influence over the network, and provide two novel scientometrics to quantify some of their aspects: (i) the distribution of the h-index for specific scientific fields and a search engine to visualize an authors’ position in it as well as the top percentile she belongs to, (ii) recomputing our previously introduced D-core influence metric on this huge network and presenting authority/integration of the authors in the form of D-core frontiers. In addition we present interesting insights on the most dense scientific domains and the most influential authors. We believe the proposed analytics highlight under-examined aspects in the area of scientific evaluation and pave the way for more involved scientometrics.

Word Embeddings from Large-scale Greek Web Content

Stamatis Outsios, Konstantinos Skianis, Polykarpos Meladianos, Christos Xypolopoulos, Michalis Vazirgiannis
Resources/Demo Paper arXiv 2018


Word embeddings are undoubtedly very useful components in many NLP tasks. In this paper, we present word embeddings and other linguistic resources trained on the largest to date digital Greek language corpus. We also present a live web tool for testing the Greek word embeddings, by offering “analogy”, “similarity score” and “most similar words” functions. Through our explorer, one could interact with the Greek word vectors.

Orthogonal Matching Pursuit for Text Classification

Konstantinos Skianis, Nikolaos Tziortziotis, Michalis Vazirgiannis
Workshop Paper W-NUT, EMNLP 2018, Brussels, Belgium


In text classification, the problem of overfitting arises due to the high dimensionality, making regularization essential. Although classic regularizers provide sparsity, they fail to return highly accurate models. On the contrary, state-of-the-art group-lasso regularizers provide better results at the expense of low sparsity. In this paper, we apply a greedy variable selection algorithm, called Orthogonal Matching Pursuit, for the text classification task. We also extend standard Group OMP by introducing overlapping group OMP to handle overlapping groups of features. Empirical analysis verifies that both OMP and overlapping GOMP constitute powerful regularizers, able to produce effective and very sparse models.

Fusing Document, Collection and Label Graph-based Representations with Word Embeddings for Text Classification

Konstantinos Skianis, Fragkiskos Malliaros, Michalis Vazirgiannis
Workshop Paper TextGraphs, NAACL 2018, New Orleans, USA [Best Paper Award]


Contrary to the traditional Bag-of-Words approach, we consider the Graph-of-Words (GoW) model in which each document is represented by a graph that encodes relationships between the different terms. Based on this formulation, the importance of a term is determined by weighting the corresponding node in the document, collection and label graphs, using node centrality criteria. We also introduce novel graph-based weighting schemes by enriching graphs with word-embedding similarities, in order to reward or penalize semantic relationships. Our methods produce more discriminative feature weights for text categorization, outperforming existing frequency-based criteria.

Kernel Graph Convolutional Neural Networks

Giannis Nikolentzos, Polykarpos Meladianos, Antoine J.-P. Tixier, Konstantinos Skianis, Michalis Vazirgiannis
Conference Paper ICANN 2018, Rhodes, Greece


Graph kernels have been successfully applied to many graph classification problems. Typically, a kernel is first designed, and then an SVM classifier is trained based on the features defined implicitly by this kernel. This two-stage approach decouples data representation from learning, which is suboptimal. On the other hand, Convolutional Neural Networks (CNNs) have the capability to learn their own features directly from the raw data during training. Unfortunately, they cannot handle irregular data such as graphs. We address this challenge by using graph kernels to embed meaningful local neighborhoods of the graphs in a continuous vector space. A set of filters is then convolved with these patches, pooled, and the output is then passed to a feedforward network. With limited parameter tuning, our approach outperforms strong baselines on 7 out of 10 benchmark datasets.

Data Management for the World Wealth & Income Database

Christos Giatsidis, Antonis Skandalis, Konstantinos Skianis, Michalis Vazirgiannis
Facundo Alvaredo, Lucas Chancel, Thomas Piketty, Emmanuel Saez, Gabriel Zucman
Ben Grillet, Francois Prosper, Brice Terdjman, Anthony Veyssiere
Conference Poster ParisBD 2017, Paris, France


The world wealth & income database ( is a project that publishes data related to inequality over the world. It is an open portal that distributes time series about economic concepts such as wealth, income, etc., where users can select time series of interest based on multi-attribute queries. Most of the components of the project are hosted on the cloud using Amazon Web Services. The data management functionality of the project consists of two major parts: a) a modern relational database that uses state of the art indexing techniques along with JSON features and b) a web API through which the database can be accessed and where some data transformations take place. To reduce latency across the globe the project is currently deployed in two sites (EU and US). Currently the database holds data for 319 geographical regions (countries, continents, states), about 150 combinations of attributes for each region over 50 years on average. The project is a joint collaboration between the World Inequality Lab at Paris School of Economics, DaSciM team at Laboratoire d’Informatique de l’X and WEDODATA.

SpreadViz: Analytics and Visualization of Spreading Processes in Social Networks

Konstantinos Skianis, Maria Evgenia G. Rossi, Fragkiskos D. Malliaros, Michalis Vazirgiannis
Demo Paper ICDM 2016, Barcelona, Spain


In this paper, we propose SpreadViz, a web tool for exploration and visualization of spreading properties in social networks. SpreadViz consists of three main modules, namely graph exploration and analytics, detection of influential nodes, and interactive visualization. More precisely, SpreadViz offers the following functionalities: (i) It computes and visualizes various centrality criteria towards understanding how the position of a node in the network affects its spreading properties; (ii) It offers a wide range of criteria for the detection of single and multiple influential nodes and comparison among them; (iii) It effectively visualizes the spread of influence in the network as well as the performance of each method. In our demonstration, we invite the audience to interact with SpreadViz, exploring, analyzing, and visualizing the spreading processes over various real-world social networks.

Regularizing Text Categorization with Clusters of Words

Konstantinos Skianis, Francois Rousseau, Michalis Vazirgiannis
Conference Paper EMNLP 2016, Austin, USA


Regularization is a critical step in any supervised learning problem and crucial for addressing not only overfitting, but also taking into account any prior knowledge we may have on the problem features and their relationships. In this paper we explore state-of-the-art structured regularizers for textual data and we propose novel ones based on topics from LSI and clusters from word2vec and graph-of-words document representation. We show that for text categorization our proposed regularizers are faster than the state-of-the-art ones while they improve classification accuracy.

GoWvis: A web application for Graph-of-Words-based text visualization and summarization

Antoine J.-P. Tixier, Konstantinos Skianis, Michalis Vazirgiannis
Demo Paper ACL 2016, Berlin, Germany


We introduce GoWvis, an interactive web application that represents any piece of text inputted by the user as a Graph-of-Words and leverages graph degeneracy and community detection to generate an extractive summary (keyphrases and paragraph) of the inputted text in an unsupervised fashion. The entire analysis can be fully customized via the tuning of many text preprocessing, graph building, and graph mining parameters. Our system is thus well suited to educational purposes, exploration and early research experiments. The new summarization strategy we propose also shows promise.

Graph-Based Term Weighting for Text Categorization

Fragkiskos Malliaros, Konstantinos Skianis
Workshop Paper Someris, ASONAM 2015, Paris, France


Text categorization is an important task with plenty of applications, ranging from sentiment analysis to automated news classification. In this paper, we introduce a novel graph-based approach for text categorization. Contrary to the traditional Bag-of-Words model for document representation, we consider a model in which each document is represented by a graph that encodes relationships between the different terms. The importance of a term to a document is indicated using graphtheoretic node centrality criteria. The proposed weighting scheme is able to meaningfully capture the relationships between the terms that co-occur in a document, creating feature vectors that can improve the categorization task. We perform experiments in well-known document collections, applying popular classification algorithms. Our preliminary results indicate that the proposed graph-based weighting mechanism is able to outperform existing frequency-based term weighting criteria, under appropriate parameter setting.

Learning for Text and Graph Data - 2017, Learning for Text and Graph Data (2016-2017)

Licence, Master M2

The courses aim at providing an introduction to advanced machine learning and combinatorial methods aiming at large scale text and graph data. The courses syllabus included:

  • Advanced graph kernels and classification,clustering / community mining (Louvain, modularity, degeneracy)
  • Influence maximization models (SIR/SIS, LT, IC,…), degeneracy based spreaders selection
  • Graph of words advanced topics: tw-icw, graph kernels for document similarity, graph based regularization for text classification
  • Word embeddings, Unsupervised document classification with the Word Mover’s Distance, WMD vs cosine similarity
  • Deep learning for NLP, Supervised document classification (TF-IDF vs TW-IDF)
  • Keyword extraction for summarization: Graph based keyword extraction, summarization (off line, online), Filipova’s word graph for multi-sentence fusion

  • April 2019

    6th place, National Bank of Greece Race 2019

    National Bank of Greece

    Training Greek word embeddings from legal texts. You can find more info here.

  • April 2016

    2nd place, Fintech Crowdhackathon 2016

    National Bank of Greece

    Our team (RSK project) is the 2nd winner in the Fintech Crowdhackathon organized by the National Bank of Greece. We made a platform to detect fraud e-transactions based on Deep Learning. You can find more info here.

  • January 2015

    2nd place, Dreem challenge 2015

    Inclass Kaggle

    Trying to analyze dreams. During deep sleep, crucial mechanisms occur: memory consolidation, cellular regeneration, growth hormone release or biologic clock reset. Lacking deep sleep impairs memory, focus and judgment during work. DREEM introduces a way to increase the duration and quality of deep sleep to ensure optimal performances.

  • May 2013

    6th place, Data Mining Cup 2013


    Our team from the Department of Informatics, consisted of undergraduate students G.Papoutsakis, G.Zografos, G.Theofilis, myself and Phd students M. Karkali and S. Thomaidou, took the 6th place in the Data Mining Cup 2013 competition.

    The participations reached 99 out of 77 universities all over the world.

  • April 2013

    Top 25%, Employee Access Challenge 2013


    The objective of this competition is to build a model, learned using historical data, that will determine an employee's access needs, such that manual access transactions (grants and revokes) are minimized as the employee's attributes change over time. The model will take an employee's role information and a resource code and will return whether or not access should be granted.

Laboratoire d'Informatique (LIX), École Polytechnique
Batiment Alan Turing, 1 Rue Honore d'Estienne d'Orves
Campus de l'Ecole Polytechnique
91120 Palaiseau, France
Office 1071
  • kskianis at
  • rob.cs.aueb at
  • kostas.skianis
  • #kskianis
  • My LinkedIn profile