Elias (Ilias) Iosif received his PhD degree from the School of ECE, Technical University of Crete, Greece in Oct. 2013. He holds a 5-year Diploma (2005) and a MS degree (2007) from the same department. From 2007 to Sep. 2013 he was a research assistant at the School of ECE, Technical University of Crete, Greece. Beginning Jan. 2014, he is a post-doctoral researcher affiliated with the School of ECE, National Technical University of Athens, Greece and the "Athena" Research and Innovation Center in Information, Athens, Greece. The core of his research is in the area of lexical semantics focus on Distributional Semantic Models (DSMs). As part of his PhD research he proposed a network-based DSM approach for the computation of semantic similarity between words. His recent interests include the extension of this network approach towards the development of multimodal/multilingual semantic networks, and the creation of a compositional framework for computing the similarity between larger textual fragments, e.g., phrases and sentences. Parts of his work have been applied to various tasks including the affective analysis of text and creation of social networks from web-harvested data. Also, he applied DSMs and affective text analysis to spoken dialogue systems in the framework of two recent EC-founded projects: PortDial (www.portdial.eu) and SpeDial (www.spedial.eu).
Doctor of Philosophy, 2013
Thesis: "Network-Based Distributional Semantic Models"
Advisor: Prof. Alexandros Potamianos
Master of Science, 2007
Thesis: "Unsupervised Induction of Semantic Classes Using Semantic Similarity Metrics"
Advisor: Prof. Alexandros Potamianos
Five-year Diploma, 2005
Thesis: "Automatic Derivation of Ontologies from Text"
Advisor: Prof. Alexandros Potamianos
Semantic similarity matrices are available for four dictionaries organized by part-of-speech: nouns, verbs, adjectives, and adverbs.
Download the dictionaries that include nouns (4000 entries), verbs (1434 entries), adjectives (1427 entries), and adverbs (334 entries): dictionaries(28K).
For each part-of-speech two similarity matrices were computed using the following network-based similarity metrics: maximum similarity, and sum of squared similarities. Each similarity metric was computed using Dice coefficient and Google-based semantic relatedness for various numbers of neighbors (size of neighborhood): 30, 50, 100, 150. Download similarity matrices for: adverbs (4.1M), verbs (79M), adjectives (84M), nouns (4000 entries) (640M), nouns (5884 entries) (348M). For the matrix of 5884 nouns, only 100 neighbors were used, while the respective dictionary is also available.
An alphabetically ordered list of 8752 nouns extracted from the SemCor3 corpus. This list comes as a single file consisting of 8752 lines. Each line has two space-separated fields: (i) the lexical form of the noun, and (ii) a unique index. Download vocabulary (76K).
SemSim corpus: Snippets of web documents for the network vocabulary. For each noun up to 1000 snippets were downloaded. The corpus is organized in sub-corpora, that is one file for each noun. Download corpus (819M).
Similarities repository: This repository includes the pairwise similarities of the networks nouns. The similarities were estimated over the above corpus for several values of the context window H. Given an H value, the similarity scores for a particular noun come into a separate file, i.e., 8752 files are available. These files are named according to the corresponding noun indices. In particular, the similarity scores are represented as follows. Consider the file of similarities for a noun indexed as i , e.g., "i.sims". The j-th row of "i.sims" corresponds to the similarity between nouns indexed by i and j.
Baseline context-based similarities. Similarity scores are available for the following values of contextual window size H: H=1 (226M), H=2 (229M), H=3 (232M), H=5 (234M).
Baseline context-based similarities normalized according to local normalization (N-normalization) for N=100. Similarity scores are available for the following values of contextual window size H: H=1 (256M), H=2 (255M), H=3 (254M), H=5 (253M).
Baseline context-based similarities normalized according to global normalization (Z-normalizarion). The statistics of similarities, i.e, mean and variance, were computed across the entire network. Similarity scores are available for the following values of contextual window size H: H=1 (288M), H=2 (291M), H=3 (292M), H=5 (292M).
Download the data (90K) including associative and semantic pairs, and the corresponding priming coefficients.
The first tool, CParse, is a Perl script that parses a corpus and creates feature vectors. The second tool, CosSim, is fed with the feature vectors for computing word semantic similarities. The tools are described in the LREC'12 paper entitled ``SemSim: Resources for Normalized Semantic Similarity Computation Using Lexical Networks''. Download the tools (16K).
Proud to be member of the Tweester team that won the 1st place (among 19 groups) in the Twiter sentiment analysis task SemEval 2016 competition. Also, you may read a related article published in the local press (in Greek). March 2016