NCTA 2021 Abstracts


Full Papers
Paper Nr: 8
Title:

Medoid-based MLP: An Application to Wood Sawing Simulator Metamodeling

Authors:

Sylvain Chabanet, Philippe Thomas and Hind B. El-Haouzi

Abstract: Predicting the set of lumbers which would be obtained from sawing a log at a specific sawmill is a difficult problem, which complicates short and mid term decision making in this industry. While sawmill simulators able to simulate the sawing of a log from a 3D scan of its outer shape exist, they can be extremely computationally intensive. Several alternative approaches based on machine learning algorithms and different set of features were explored in previous works. This paper proposes the use of one hidden layer perceptrons, and a vector of features build from dissimilarities from the scans to a set of selected wood logs, chosen as the class medoids. Several architectures are tested and compared to validate the pertinence of the proposed set of medoid-based features. The lowest mean squared error was obtained for MISO neural networks with a sigmoid output activation function, to constrain the output value ranges.
Download

Paper Nr: 9
Title:

Unsupervised Grammatical Pattern Discovery from Arabic Extra Large Corpora

Authors:

Adelle Abdallah, Hussein Awdeh, Youssef Zaki, Gilles Bernard and Mohammad Hajjar

Abstract: Many methods have been applied to automatic construction or expansion of lexical semantic resources. Most follow the distributional hypothesis applied to lexical context of words, eliminating grammatical context (stopwords). This paper will show that the grammatical context can yield information about semantic properties of words, if the corpus be large enough. In order to do this, we present an unsupervised pattern-based model building semantic word categories from large corpora, devised for resource-poor languages. We divide the vocabulary between high-frequency and lower frequency items, and explore the patterns formed by high-frequency items in the neighborhood of lower frequency words. Word categories are then created by clustering. This is done on a very large Arabic corpus, and, for comparison, on a large English corpus; results are evaluated with direct and indirect evaluation methods. We compare the results with state-of-the-art lexical models for performance and for computation time.
Download

Paper Nr: 11
Title:

Relaxed Dissimilarity-based Symbolic Histogram Variants for Granular Graph Embedding

Authors:

Luca Baldini, Alessio Martino and Antonello Rizzi

Abstract: Graph embedding is an established and popular approach when designing graph-based pattern recognition systems. Amongst the several strategies, in the last ten years, Granular Computing emerged as a promising framework for structural pattern recognition. In the late 2000’s, symbolic histograms have been proposed as the driving force in order to perform the graph embedding procedure by counting the number of times each granule of information appears in the graph to be embedded. Similarly to a bag-of-words representation of a text corpora, symbolic histograms have been originally conceived as integer-valued vectorial representation of the graphs. In this paper, we propose six ‘relaxed’ versions of symbolic histograms, where the proper dissimilarity values between the information granules and the constituent parts of the graph to be embedded are taken into account, information which is discarded in the original symbolic histogram formulation due to the hard-limited nature of the counting procedure. Experimental results on six open-access datasets of fully-labelled graphs show comparable performance in terms of classification accuracy with respect to the original symbolic histograms (average accuracy shift ranging from -7% to +2%), counterbalanced by a great improvement in terms of number of resulting information granules, hence number of features in the embedding space (up to 75% less features, on average).
Download

Paper Nr: 18
Title:

Revising Conceptual Similarity by Neural Networks

Authors:

Arianna Pavone and Alessio Plebe

Abstract: Similarity is an excellent example of a domain-general source of information. Even when we do not have specific knowledge of a domain, we can use similarity as a default method to reason about it. Similarity also plays a significant role in psychological accounts of problem solving, memory, prediction, and categorisation. However, despite the strong presence of similarity judgments in our reasoning, a general conceptual model of similarity has yet to be agreed upon. In this paper, we propose an alternative, unifying solution in this challenge in concept research based on the recent Eliasmith’s theory of biological cognition. Specifically we introduce the Semantic Pointer Model of Similarity (SPMS) which describes concepts in terms of processes involving a recently postulated class of mental representations called semantic pointers. We discuss how such model is in accordance with the main guidelines of most traditional models known in literature, on the one hand, and gives a solution to most of the criticisms against these models, on the other. We also present some preliminary experimental evaluation in order to support our theory and verify whether similarities derived by human judgments can be compatible with the SPMS.
Download

Paper Nr: 21
Title:

Accurate 3D Object Detection from Point Cloud Data using Bird’s Eye View Representations

Authors:

Nerea Aranjuelo, Guus Engels, David Montero, Marcos Nieto, Ignacio Arganda-Carreras, Luis Unzueta and Oihana Otaegui

Abstract: In this paper, we show that accurate 3D object detection is possible using deep neural networks and a Bird’s Eye View (BEV) representation of the LiDAR point clouds. Many recent approaches propose complex neural network architectures to process directly the point cloud data. The good results obtained by these methods have left behind the research of BEV-based approaches. However, BEV-based detectors can take advantage of the advances in the 2D object detection field and need to handle much less data, which is important in real-time automotive applications. We propose a two-stage object detection deep neural network, which takes BEV representations as input and validate it in the KITTI BEV benchmark, outperforming state-of-the-art methods. In addition, we show how additional information can be added to our model to improve the accuracy of the smallest and most challenging object classes. This information can come from the same point cloud or an additional sensor’s data, such as the camera.
Download

Paper Nr: 27
Title:

Zero-Shot Action Recognition with Knowledge Enhanced Generative Adversarial Networks

Authors:

Kaiqiang Huang, Luis Miralles-Pechuán and Susan Mckeever

Abstract: Zero-Shot Action Recognition (ZSAR) aims to recognise action classes in videos that have never been seen during model training. In some approaches, ZSAR has been achieved by generating visual features for unseen classes based on the semantic information of the unseen class labels using generative adversarial networks (GANs). Therefore, the problem is converted to standard supervised learning since the unseen visual features are accessible. This approach alleviates the lack of labelled samples of unseen classes. In addition, objects appearing in the action instances could be used to create enriched semantics of action classes and therefore, increase the accuracy of ZSAR. In this paper, we consider using, in addition to the label, objects related to that action label. For example, the objects ‘horse’ and ‘saddle’ are highly related to the action ‘Horse Riding’ and these objects can bring additional semantic meaning. In this work, we aim to improve the GAN-based framework by incorporating object-based semantic information related to the class label with three approaches: replacing the class labels with objects, appending objects to the class, and averaging objects with the class. Then, we evaluate the performance using a subset of the popular dataset UCF101. Our experimental results demonstrate that our approach is valid since when including appropriate objects into the action classes, the baseline is improved by 4.93%.
Download

Short Papers
Paper Nr: 4
Title:

Reward Prediction for Representation Learning and Reward Shaping

Authors:

Hlynur Davíð Hlynsson and Laurenz Wiskott

Abstract: One of the fundamental challenges in reinforcement learning (RL) is the one of data efficiency: modern algorithms require a very large number of training samples, especially compared to humans, for solving environments with high-dimensional observations. The severity of this problem is increased when the reward signal is sparse. In this work, we propose learning a state representation in a self-supervised manner for reward prediction. The reward predictor learns to estimate either a raw or a smoothed version of the true reward signal in an environment with a single terminating goal state. We augment the training of out-of-the-box RL agents in single-goal environments with visual inputs by shaping the reward using our reward predictor during policy learning. Using our representation for preprocessing high-dimensional observations, as well as using the predictor for reward shaping, is shown to facilitate faster learning of Actor Critic using Kronecker-factored Trust Region and Proximal Policy Optimization.
Download

Paper Nr: 12
Title:

Multivariate Short Term Load Forecasting Strategy: Application to Anomalous Days of ISO New England Data

Authors:

Innocent S. Duma and Bhekisipho Twala

Abstract: In this paper, we consider short-term electricity load forecasting which is for making forecasting within 1 hour to 7 days or a month ahead usually used for the day-to-day operations of the utility industry, such as scheduling the generation and transmission of electric energy. This is a three step process: (1) Data preprocessing which include feature extraction, (2) Modeling and (3) Model Evaluation. Electrical load time series are non stationary and notoriously very noisy because of variety of factors that affect the electrical markets. As a data preprocessing step to remove the white noise on the multivariate predictor variables (which include historical load, weather, and holidays) we perform a multivariate denoising using wavelets and principal component analysis (MWPCA). In the modeling step we propose three multivariate Bayesian Optimization (BO) based Random Forest (RF), Feedforward Neural Networks (FFNN) and Long Short-term Memory (LSTM) neural network for day ahead hourly load forecast of the anomalous days system load of the ISO New England grid. For model evaluation we used three evaluation metrics, the Mean Absolute Percent Error (MAPE), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). All the trained models achieved a superior results on the chosen model evaluation metrics most notably achieving a MAPE of less than 1% on the data under study. And the FFNN model outperformed both the RF and LSTM models.
Download

Paper Nr: 13
Title:

Object Detection with TensorFlow on Hardware with Limited Resources for Low-power IoT Devices

Authors:

Jurij Kuzmic and Günter Rudolph

Abstract: This paper presents several models for individual object detection with TensorFlow in a 2D image with Convolution Neural Networks (ConvNet). Here, we focus on an approach for hardware with limited resources in the field of the Internet of Things (IoT). Additionally, our selected models are trained and evaluated using image data from a Unity 3D simulator as well as real data from model making area. In the beginning, related work of this paper is discussed. As well known, a large amount of annotated training data for supervised learning of ConvNet is required. These annotated training data are automatically generated with the Unity 3D environment. The procedure for generating annotated training data is also presented in this paper. Furthermore, the different object detection models are compared to find a better and faster system for object detection on hardware with limited resources for low-power IoT devices. Through the experiments described in this paper the comparison of the run time of the trained models is presented. Also, a transfer learning approach in object detection is carried out in this paper. Finally, future research and work in this area are discussed.
Download

Paper Nr: 14
Title:

Pervasive Hand Gesture Recognition for Smartphones using Non-audible Sound and Deep Learning

Authors:

Ahmed Ibrahim, Ayman El-Refai, Sara Ahmed, Mariam Aboul-Ela, Hesham M. Eraqi and Mohamed Moustafa

Abstract: Due to the mass advancement in ubiquitous technologies nowadays, new pervasive methods have come into the practice to provide new innovative features and stimulate the research on new human-computer interactions. This paper presents a hand gesture recognition method that utilizes the smartphone’s built-in speakers and microphones. The proposed system emits an ultrasonic sonar-based signal (inaudible sound) from the smartphone’s stereo speakers, which is then received by the smartphone’s microphone and processed via a Convolutional Neural Network (CNN) for Hand Gesture Recognition. Data augmentation techniques are proposed to improve the detection accuracy and three dual-channel input fusion methods are compared. The first method merges the dual-channel audio as a single input spectrogram image. The second method adopts early fusion by concatenating the dual-channel spectrograms. The third method adopts late fusion by having two convectional input branches processing each of the dual-channel spectrograms and then the outputs are merged by the last layers. Our experimental results demonstrate a promising detection accuracy for the six gestures presented in our publicly available dataset with an accuracy of 93.58% as a baseline.
Download

Paper Nr: 16
Title:

Augmenting Machine Learning with Flexible Episodic Memory

Authors:

Hugo Chateau-Laurent and Frédéric Alexandre

Abstract: A major cognitive function is often overlooked in artificial intelligence research: episodic memory. In this paper, we relate episodic memory to the more general need for explicit memory in intelligent processing. We describe its main mechanisms and its involvement in a variety of functions, ranging from concept learning to planning. We set the basis for a computational cognitive neuroscience approach that could result in improved machine learning models. More precisely, we argue that episodic memory mechanisms are crucial for contextual decision making, generalization through consolidation and prospective memory.
Download

Paper Nr: 17
Title:

A Multi-agent Approach for Graph Classification

Authors:

Luca Baldini and Antonello Rizzi

Abstract: In this paper, we propose and discuss a prototypical framework for graph classification. The proposed algorithm (Graph E-ABC) exploits a multi-agent design, where swarm of agents (orchestrated via evolutionary optimization) are in charge of finding meaningful substructures from the training data. The resulting set of substructures compose the pivotal entities for a graph embedding procedure that allows to move the pattern recognition problem from the graph domain towards the Euclidean space. In order to improve the learning capabilities, the pivotal substructures undergo an independent optimization procedure. The performances of Graph E-ABC are addressed via a sensitivity analysis over its critical parameters and compared against current approaches for graph classification. Results on five open access datasets of fully labelled graphs show interesting performances in terms of accuracy, counterbalanced by a relatively high number of pivotal substructures.
Download

Paper Nr: 20
Title:

An Approach to One-shot Identification with Neural Networks

Authors:

Janis Mohr, Finn Breidenbach and Jörg Frochte

Abstract: In order to optimise products and comprehend product defects, the production process must be traceable. Machine learning techniques are a modern approach, which can be used to recognise a product in every production step. The goal is a tool with the capability to specifically assign changes in a process step to an individual product or batch. In general, a machine learning system based on a Convolutional Neural Network (CNN) forms a vision subsystem to recognise individual products and return their designation. In this paper an approach to identify objects, which have only been seen once, is proposed. The proposed approach is for applications in production comparable with existing solutions based on siamese networks regarding the accuracy. Furthermore, it is a lightweight architecture with some advantages regarding computation coast in the online prediction use case of some industrial applications. It is shown that together with the described workflow and data augmentation the method is capable to solve an existing industrial application.
Download

Paper Nr: 22
Title:

Spatio-temporal Attention Mechanism and Knowledge Distillation for Lip Reading

Authors:

Shahd Elashmawy, Marian Ramsis, Hesham M. Eraqi, Farah Eldeshnawy, Hadeel Mabrouk, Omar Abugabal and Nourhan Sakr

Abstract: Despite the advancement in the domain of audio and audio-visual speech recognition, visual speech recognition systems are still quite under-explored due to the visual ambiguity of some phonemes. In this work, we propose a new lip-reading model that combines three contributions. First, the model front-end adopts a spatio-temporal attention mechanism to help extract the informative data from the input visual frames. Second, the model back-end utilizes a sequence-level and frame-level Knowledge Distillation (KD) techniques that allow leveraging audio data during the visual model training. Third, a data preprocessing pipeline is adopted that includes facial landmarks detection-based lip-alignment. On LRW lip-reading dataset benchmark, a noticeable accuracy improvement is demonstrated; the spatio-temporal attention, Knowledge Distillation, and lip-alignment contributions achieved 88.43%, 88.64%, and 88.37% respectively.

Paper Nr: 23
Title:

Hierarchical Relation Networks: Exploiting Categorical Structure in Neural Relational Reasoning

Authors:

Ruomu Zou and Constantine Dovrolis

Abstract: Organizing objects in the world into conceptual hierarchies is a key part of human cognition and general intelligence. It allows us to efficiently reason about complex and novel situations relying on relationships between object categories and hierarchies. Learning relationships among sets of objects from data is known as relation learning. Recent developments in this area using neural networks have enabled answering complex questions posed on sets of objects. Previous approaches operate directly on objects – instead of categories of objects. In this position paper, we make the case for reasoning at the level of object categories, and we propose the Hierarchical Relation Network (HRN) framework. HRNs first infer a category for each object to drastically decrease the number of relationships that need to be learned. An HRN consists of a number of distinct modules, each of which can be initialized as a simple arithmetic operation, a supervised or unsupervised model, or as part of a fully differentiable network. This approach demonstrates that categories in relational reasoning can allow for major reductions in training time, increased data efficiency, and better interpretability of the network’s reasoning process.
Download

Paper Nr: 25
Title:

Speech Emotion Recognition using MFCC and Hybrid Neural Networks

Authors:

Youakim Badr, Partha Mukherjee and Sindhu M. Thumati

Abstract: Speech emotion recognition is a challenging task and feature extraction plays an important role in effectively classifying speech into different emotions. In this paper, we apply traditional feature extraction methods like MFCC for feature extraction from audio files. Instead of using traditional machine learning approaches like SVM to classify audio files, we investigate different neural network architectures. Our baseline model implemented as a convolutional neural network results in 60% classification accuracy. We propose a hybrid neural network architecture based on Convolutional and Long Short-Term Memory (ConvLSTM) networks to capture spatial and sequential information of audio files. Our experimental results show that our ComvLSTM model has achieved an accuracy of 59%. We improved our model with data augmentation techniques and re-trained it with augmented dataset. The classification accuracy achieves 91% for multi-class classification of RAVDESS dataset outperforming the accuracy of state-of-the-art multi-class classification models that used the similar data.
Download

Paper Nr: 26
Title:

Recurrent Neural Networks Analysis for Embedded Systems

Authors:

Gonçalo F. Neves, Jean-Baptiste Chaudron and Arnaud Dion

Abstract: Artificial Neural Networks (ANNs) are biologically inspired algorithms especially efficient for pattern recognition and data classification. In particular, Recurrent Neural Networks (RNN) are a specific type of ANNs which model and process sequences of data that have temporal relationship. Thus, it introduces interesting behavior for embedded systems applications such as autopilot systems. However, RNNs (and ANNs in general) are computationally intensive algorithms, especially to allow the network to learn. This implies a wise integration and proper analysis on the embedded systems that we gather these functionalities. We present in this paper an analysis of two types of Recurrent Neural Networks, Long-Short Term Memory (LSTM) and Gated-Recurrent Unit (GRU), explain their architectures and characteristics. We propose our dedicated implementation which is tested and validated on embedded system devices with a dedicated dataset.
Download

Paper Nr: 5
Title:

Is It Possible to Recognize Apple Employees by Their LinkedIn Profile Picture?

Authors:

Thanakij Wanavit and Leslie Klieb

Abstract: Samples of images from the portraits on the profiles of members of the social media site LinkedIn who live in the Bay Area of San Francisco were collected and analyzed by the EmoPy package for the presence of seven emotions. A Random Forest classifier used these probabilities to predict if the members were employed by Apple or not. Accuracy reached around 62% compared with a naive error rate of 50%. An error analysis shows that this result is significant and robust. A connection between the data and Apple’s organizational culture is pointed out.
Download

Paper Nr: 7
Title:

Overview of Arabic Sentence Corpora

Authors:

Hussein Awdeh, Adelle Abdallah, Gilles Bernard and Mohammad Hajjar

Abstract: The Arabic corpus, specifically the gold standard corpus is an important part of The Arabic Natural Language Processing. Described as a very large collection of texts stored on a computer, a corpus is considered as the most important source for semantic and syntax research and it can be a single language, a monolingual Corpus, or a multilingual Corpus. Then, an easy access to available corpora is highly needed in the Natural Language process (NLP) research community especially for language such as Arabic. Currently, there is no easy way to access to a comprehensive and updated list of available Arabic corpora. Our study in this paper, aims to present the results of a recent survey conducted to identify the list of the available Arabic corpora classified into categories and their resources.
Download

Paper Nr: 15
Title:

Benzene Prediction: A Comparative Study of ANFIS, LSTM and MLR

Authors:

Andreas Humpe, Holger Günzel and Lars Brehm

Abstract: It is generally recognized that road traffic emissions are a major health risk and responsible for a substantial share of death and disease in Europe. Although artificial intelligence methods have been used extensively for air pollution forecasting, there is little research on benzene prediction and the use of long short-term memory networks. Benzene is considered one of the pollutants of greatest concern in urban areas and has been linked to leukemia. This paper investigates the predictive power of adaptive neuro-fuzzy inference systems, long short-term memory networks and multiple linear regression models for one hour ahead benzene prediction in the city of Augsburg, Germany. The results of the analysis indicate that adaptive neuro-fuzzy inference systems have the best in sample performance for benzene prediction, whereas long short-term memory networks and multiple linear regressions show similar predictive power. However, long short-term memory models have the best out of sample performance for one hour ahead benzene prediction. This supports the use of long short-term memory networks for benzene prediction in real emission forecasting applications.
Download