Leeds Beckett University - City Campus,
Woodhouse Lane,
LS1 3HE
Dr Anatoliy Gorbenko
Reader
Anatoliy Gorbenko is a Reader with the School of Built Envinronment, Engineering and Computing. Hi is also a Visiting Professor with the Department of Computer Systems, Networks and Cybersecurity at the National Aerospace University of Ukraine.
About
Anatoliy Gorbenko is a Reader with the School of Built Envinronment, Engineering and Computing. Hi is also a Visiting Professor with the Department of Computer Systems, Networks and Cybersecurity at the National Aerospace University of Ukraine.
Anatoliy Gorbenko is a Reader with the School of Built Envinronment, Engineering and Computing. Hi is also a Visiting Professor with the Department of Computer Systems, Networks and Cybersecurity at the National Aerospace University of Ukraine.
Anatoliy received his M.Eng. in computer engineering in 2000 and his Ph.D. in computer science in 2005 from the National Aerospace University, Kharkiv, Ukraine. He completed his D.Sc. habilitation and obtained a professorship with the Department of Computer Systems and Networks in 2012. From 2014 to 2016, he was Dean of the Aircraft Radio-technical Faculty and lead the Service-Oriented Systems Dependability research group.
In 2017 Prof. Gorbenko joined the School of Computing, Creative Technologies and Engineering at Leeds Beckett University, UK.
Research interests
Anatoliy Gorbenko has a solid mathematical and engineering background and extensive experience in probability and stochastic processes theories, experimental data analysis and optimization techniques. His expertise include dependability and performance of distributed systems, SOA and clouds; SW vulnerability and intrusion tolerance.
Main research interests are:
- Logic-based neuro-symbolic machine learning;
- System research ensuring dependability, fault- and intrusion tolerance in distributed systems;
- Benchmarking dependability and performance of computer networks, SOA and Cloud Computing;
- Investigation, measurement and tolerating the uncertainty of the Internet and SOA;
- Investigation the CAP impact and performance trade-offs for Big-Data solutions;
- Assessment and ensuring wireless networks performance and sustainability;
- Green computing and communications, HW and SW power consumption measurement.
Publications (67)
Sort By:
Featured First:
Search:
Using Tsetlin Machine for Decoding, Visualization and Minimization of Local Immune Fingerprints in Peritoneal Dialysis Infections
The immune system’s primary functions include recognizing invading pathogens, controlling infections, and restoring tissue integrity. However, definitive evidence showing that an individual’s local immune system can differentiate between bacterial pathogens to mount specific responses remains limited. In this study, we applied a machine learning (ML) approach to characterize immune responses in 82 peritoneal dialysis patients presenting with acute peritonitis. Immune profiles were obtained from peritoneal effluents on the day of infection onset, analyzing a comprehensive array of cellular and soluble markers, including local immune cell populations, inflammatory and regulatory cytokines, chemokines, and tissue damage-associated factors. Utilizing the Tsetlin Machine, a recently new logic-based ML algorithm, we identified pathogen-specific immune fingerprints for different bacterial groups, each characterized by distinct biomarker profiles. Unlike traditional black-box models, the Tsetlin Machine expresses immune responses as transparent logical rules, enabling visual interpretation and supporting timely, informed antibiotic selection based on a patient’s immune profile well before culture results become available. We also present during- and post-training techniques for feature and clause minimization to produce more concise rules, improve interpretability, and reduce computational cost. This capacity for transparent decision-making illustrates the potential of the Tsetlin Machine to analyze complex biomedical datasets and improve patient outcomes by delivering clear and actionable insights.
Optimising Tsetlin Machine for Traffic Sign Recognition: A Study of Image Pre-processing, Booleanisation and Ensemble Methods
This paper investigates the performance of the Tsetlin Machine (TM) on the German Traffic Sign Recognition Benchmark (GTSRB), a real-world dataset collected under diverse weather and lighting conditions. We examined the efficiency of different image Booleanisation techniques and ensembling methods, evaluating their impact on TM performance. Leveraging the regular structure of traffic signs, the TM demonstrated strong pattern recognition capabilities in noisy environments. Using adaptive mean thresholding for Booleanisation, a TM with only 100 clauses per class achieved 91.89% test accuracy and over 34,000 predictions per second, without GPU acceleration. Further improvements were achieved through ensemble learning across three data modalities: thresholded images, HOG descriptors, and Haar features, boosting accuracy to 96.34%, all without data augmentation. While the primary focus of the paper was not to achieve maximum accuracy, the results highlight TM’s practical viability for efficient, interpretable, and scalable visual classification and pattern recognition.
Hospital management plays a pivotal role in ensuring the efficient delivery of medical services, especially in the face of challenges posed by pandemics such as COVID-19. This paper explores the application of machine learning techniques in addressing the challenge of hospitalization during pandemics. Leveraging a comprehensive dataset sourced from the Mexican government, various supervised learning algorithms including Random Forest, Gradient Boosting, Support Vector Machine, K-Nearest Neighbors, and Multilayer Perceptron are trained and evaluated to discern factors contributing to hospitalizations. Feature importance analysis and dimensionality reduction techniques are employed to enhance models predictive performance. The best model was Gradient Boosting algorithm with an accuracy of 85.63% and AUC score of 0.8696. The interpretability plots showed that pneumonia had a positive impact on the hospitalization prediction of the model. Our analysis indicates that women aged over 45 with pneumonia and concurrent COVID-19 exhibit the highest likelihood of hospitalization. This study underscores the potential of interpretable machine learning in aiding hospital managers to optimize resource allocation, hospitalization cases, and make data-driven decisions during pandemics.
Distributed NoSQL databases is a new time of data storages which offer configurable levels of consistency so that data can scale across many geographically distributed nodes. In order to achieve high system availability and allow a quick response the architects of modern large-scale web applications such as Facebook, Twitter, etc. often decide to weak consistency requirements. In this paper, we put forward the idea of using redundant read requests to further reduce response time and improve availability of replicated distributed systems operating at a relaxed consistency level. The proposed approach was implemented on a testbed Cassandra NoSQL cluster. Our evaluation results show that redundant reads can be considered as an effective means of reducing the probability of extreme delays that regularly occur in distributed systems. In some scenarios, the proposed mechanism can not only improve system availability and minimize the worst-case execution time, but also reduce the average response time despite the increase in system workload.
Tsetlin Machine (TM) is a recent automaton-based algorithm for reinforcement learning. It has demonstrated competitive accuracy on many popular benchmarks while providing a natural interpretability. Due to its logical underpinning, it is amenable to hardware implementation with faster performance and higher energy efficiency than conventional Artificial Neural Networks. This paper introduces a multi-layer architecture of Tsetlin Machines with the aim to further boost TM performance via adoption of a hierarchical feature learning approach. This is seen as a way of creating hierarchical logic expressions from original Boolean literals, surpassing single-layer TMs in their ability to capture more complex patterns and high-level features. In this work we demonstrate that multi-layer TM considerably overperforms the single-layer TM architecture on several benchmarks while maintaining the ability to interpret its logic inference. However, it has also been shown that uncontrolled growth in the number of layers leads to overfitting.
Tsetlin Machine (TM) is a recent automaton-based algorithm for reinforcement learning. It has demonstrated competitive accuracy on many popular benchmarks while providing a natural interpretability. Due to its logically underpinning it is amenable to hardware implementation with faster performance and higher energy efficiency than conventional Artificial Neural Networks (ANNs). This paper provides an overview of Tsetlin Machine architecture and its hyper-parameters as compared to ANN. Furthermore, it gives practical examples of TM application for patterns recognition using MNIST dataset as a case study. In this work we also prove reproducibility of TM learning process to confirm its trustworthiness and convergence in the light of the stochastic nature of TAs reinforcement.
Exception Analysis in Service-Oriented Architecture
Exception handling is one of the powerful means of achieving high dependability and fault-tolerance in service-oriented architecture (SOA). The paper introduces the results of experimental analysis of the SOA-specific exceptions and factors affecting availability and fault-tolerance of Web Services, implemented by use of different development kits. We discovered several types of failure domains and represent results of failures injection and exception analysis in SOA.
Green computing and communications in critical application domains: Challenges and solutions
Information and communication technologies (IT) and IT-based instrumentation and control (I&C) systems, hardware and software components are analyzed in context of 'green' paradigm. Green IT paradigm is formulated as a Cartesian product of a pair 'external (E) and internal (I) aspects of IT and computer-based I&Csystems' and a pair 'power (recourse) consumption minimization (Rmin) and safety maximization (Smax)'. In the paper we discuss main research challenges and development solutions, education and training activities in the domain of green and safe computing and communication. Finally, we report results of EU-TEMPUS projects in the area of safe and green ITs and define models of academia and industry cooperation for excellence, innovations and knowledge exchange. © 2013 IEEE.
Estimating throughput unfairness in a mixed data rate Wi-Fi environment
The paper discusses throughput unfairness inherent in the very nature of mixed data rate Wi-Fi networks employing random media access control technique CSMA/CA. This unfairness exhibits itself through the fact that slow clients consume more airtime to transfer a given amount of data, leaving less airtime for other clients. This decreases the overall network throughput and significantly degrades performance of high data rate clients. In the paper we propose mathematical models considering airtime unfairness and estimating wireless networks throughput depending on number of network connections and their data rates. These models show that all wireless clients have an equal throughput independently of data rates used by them. Client's throughput approximates to the data rate of the slowest client in mixed data rate Wi-Fi networks. © 2013 IEEE.
Software Engineering for Resilient Systems
Adaptive WiFi systems: Principles of design and application support
The basis of this article is the analysis of an adaptive approach within the concept of Green Communication. This approach encourages the reduction of energy consumption. Despite the fact that standards specification ignores the dynamic properties of the subscribers, they also provide the possibility to make modifications on the parameters that use the methods of adaptation as a natural reaction to the environment. Apart from the above mentioned modifications, both static and dynamic scenarios, along with their limitations, are briefly examined. Aiming to highlight and determine the advantages and disadvantages of each method and to better comprehend the relation between the parameters and the results, this paper discusses the methods of adaptation and control of the parameters of the access points that are within the principles of adaptation. In addition, there is a software simulator shown for adaptive wireless access points. © 2013 IEEE.
Review and comparative analysis of mini- and micro-UAV autopilots
In the paper we discuss main functional characteristics and distinctive implementation features of the modern unmanned aerial vehicles autopilots. We consider eight different autopilots, including Micropilot's MP2x28, unmanned aerial vehicles Navigation's Vector, Moog Crossbow's GNC1000, etc. and compare their performance, communication interfaces, types of controlled vehicles and a set of supported functions.
Secure Hybrid Clouds: Analysis of Configurations Energy Efficiency
The paper examines energy efficiency of running computational tasks in the hybrid clouds considering data privacy and security aspects. As a case study we examine CPU demanding high-tech methods of radiation therapy of cancer. We introduce mathematical models comparing energy efficiency of running our case study application in the public cloud, private data center and on a personal computer. These models employ Markovian chains and queuing theory together to estimate energy and performance attributes of different configurations of computational environment. Finally, a concept of a Hybrid Cloud which effectively distributes computational tasks among the public and secure private clouds depending on their security requirements is proposed in the paper.
Self-adaptive mobile wireless hotspot zones: Initial issues
This article presents the initial issues; research and development of adaptive mobile access points and a software simulator with the goal of studying the processes occurring in the wireless network, and take into account different user behaviors. Basic concepts of the article include adaptive wireless networks with the use of mobile access points, while the article focuses on the mathematical approach and description of the problem. The mathematical approach of this article includes intersecting areas and non-intersecting frequencies in the system and is described by a number of formulas with ranging values. The results of this research are presented and followed by a number of future suggestions, for further research and improvement. © 2013 IEEE.
Evolution of von Neumann's paradigm: Dependable and green computing
The research and implementation issues in areas of safety and energy critical SW-, HW- and FPGA-based systems in context of John Von Neumann's paradigm (VNP) are discussed. The stages of the VNP evolution and related problems connected with creation of dependable (and safe) systems out of undependable (and unsafe) components are analyzed. Aspects of the VNP development regarding resilient and green computing are described. A conception of green and safe computing is formulated. Features of the VNP application for green computing are analyzed. © 2013 IEEE.
Throughput estimation with regard to airtime consumption unfairness in mixed data rate Wi-Fi networks
The paper discusses throughput unfairness inherent in the very nature of mixed data rate Wi-Fi networks employing random media access control technique CSMA/CA. This unfairness exhibits itself through the fact that slow clients consume more airtime to transfer a given amount of data, leaving less airtime for other clients. This decreases the overall network throughput and significantly degrades performance of high data rate clients. In the paper we propose mathematical models considering airtime unfairness and estimating wireless networks throughput depending on number of network connections and their data rates. These models show that all wireless clients have an equal throughput independently of data rates used by them. We verify our theoretical findings by running natural experiment and show that client's throughput approximates to the data rate of the slowest client.
Time-Outing Internet Services
Uncertainty and response time instability can affect invoked Web services' usability, performance, trustworthiness, and dependability. To resolve uncertainty, researchers have applied a three-pronged approach. First, they remove uncertainty through advances in data collection, response time measurement, and benchmarking. Second, they employ a mathematical foundation for modeling uncertainty. Finally, they improve fault-tolerance techniques by making well-considered choices of time-outs and trade-offs between cost, availability, trustworthiness, and performance. © 2003-2012 IEEE.
Scenario-based Markovian modeling of web-system availability considering attacks on vulnerabilities
In the paper we simulate web-system availability taking into account security aspects and different maintenance scenarios. As a case study we have developed two Markov's models. These models simulate availability of a multi-tier web-system considering attacks on DNS vulnerabilities in additional to sys-tem failures due to hardware/software (HW/SW) faults. Proposed Markov's model use attacks rate and criticality as initial simulation parameters. In the paper we demonstrate how to estimate these parameters using open vulnerability databases (e.g. National Vulnerability Database). We also define different vulnerability elimination (VE) scenarios and examine how they affect system availability.
The Impact of Consistency on System Latency in Fault Tolerant Internet Computing
The paper discusses our practical experience and theoretical results in investigating the impact of consistency on latency in distributed fault tolerant systems built over the Internet. Trade-offs between consistency, availability and latency are examined, as well as the role of the application timeout as the main determinant of the interplay between system availability and performance. The paper presents experimental results of measuring response time for replicated service-oriented systems that provide different consistency levels: ONE, ALL and QUORUM. These results clearly show that improvements in system consistency increase system latency. A set of novel analytical models is proposed that would enable quantified response time prediction depending on the level of consistency provided by a replicated system.
This experience report analyses performance of the Cassandra NoSQL database and studies the fundamental trade-off between data consistency and delays in distributed data storages. The primary focus is on investigating the interplay between the Cassandra performance (response time) and its consistency settings. The paper reports the results of the read and write performance benchmarking for a replicated Cassandra cluster, deployed in the Amazon EC2 Cloud. We present quantitative results showing how different consistency settings affect the Cassandra performance under different workloads. One of our main findings is that it is possible to minimize Cassandra delays and still guarantee the strong data consistency by optimal coordination of consistency settings for both read and write requests. Our experiments show that (i) strong consistency costs up to 25% of performance and (ii) the best setting for strong consistency depends on the ratio of read and write operations. Finally, we generalize our experience by proposing a benchmarking-based methodology for run-time optimization of consistency settings to achieve the maximum Cassandra performance and still guarantee the strong data consistency under mixed workloads.
This paper examines fundamental trade-offs in fault-tolerant distributed systems and replicated databases built over the Internet. We discuss interplays between consistency, availability, and latency which are in the very nature of globally distributed computer systems and also analyse their interconnection with durability and energy efficiency. In this paper we put forward an idea that consistency, availability, latency, durability and other properties need to be viewed as more continuous than binary in contrast to the well-known CAP/PACELC theorems. We compare different consistency models and highlight the role of the application timeout, replication factor and other settings that essentially determine the interplay between above properties. Our findings may be of interest to software engineers and system architects who develop Internet-scale distributed computer systems and cloud solutions.
Principles of Formal Methods Integration for Development Fault-Tolerant Systems: Event-B and FME(C)A
Exploring Uncertainty of Delays as a Factor in End-to-End Cloud Response Time
This paper reports our experience in benchmarking a cloud-based web-service and investigates instability of its performance and the delays induced by the communication medium when measured from multiple client locations. We compare the performance of MS Azure, Go Grid and an in-house server running the same benchmark web service and analyze how the client and service implementation technologies affect its performance. The uncertainty discovered in the network delay affects the overall performance and dependability of cloud computing provisioning and requires specific resilience techniques. © 2012 IEEE.
A concept of distributed replicated data storages like Cassandra, HBase, MongoDB has been proposed to effectively manage the Big Data sets whose volume, velocity, and variability are difficult to deal with by using the traditional Relational Database Management Systems. Trade-offs between consistency, availability, partition tolerance, and latency are intrinsic to such systems. Although relations between these properties have been previously identified by the well-known CAP theorem in qualitative terms, it is still necessary to quantify how different consistency and timeout settings affect system latency. The paper reports results of Cassandra's performance evaluation using the YCSB benchmark and experimentally demonstrates how to read latency depends on the consistency settings and the current database workload. These results clearly show that stronger data consistency increases system latency, which is in line with the qualitative implication of the CAP theorem. Moreover, Cassandra latency and its variation considerably depend on the system workload. The distributed nature of such a system does not always guarantee that the client receives a response from the database within a finite time. If this happens, it causes so-called timing failures when the response is received too late or is not received at all. In the paper, we also consider the role of the application timeout which is the fundamental part of all distributed fault tolerance mechanisms working over the Internet and used as the main error detection mechanism here. The role of the application timeout as the main determinant in the interplay between system availability and responsiveness is also examined in the paper. It is quantitatively shown how different timeout settings could affect system availability and the average servicing and waiting time. Although many modern distributed systems including Cassandra use static timeouts it was shown that the most promising approach is to set timeouts dynamically at run time to balance performance, availability and improve the efficiency of the fault-tolerance mechanisms.
This paper analyses security problems of modern computer systems caused by vulnerabilities in their operating systems. Our scrutiny of widely used enterprise operating systems focuses on their vulnerabilities by examining the statistical data available on how vulnerabilities in these systems are disclosed and eliminated, and by assessing their criticality. This is done by using statistics from both the National Vulnerabilities database (NVD) and the Common Vulnerabilities and Exposures system (CVE). The specific technical areas the paper covers are the quantitative assessment of forever-day vulnerabilities, estimation of days-of-grey-risk, the analysis of the vulnerabilities severity and their distributions by attack vector and impact on security properties. In addition, the study aims to explore those vulnerabilities that have been found across a diverse range of operating systems. This leads us to analysing how different intrusion-tolerant architectures deploying the operating system diversity impact availability, integrity and confidentiality.
Experience Report: Study of Vulnerabilities of Enterprise Operating Systems
This experience report analyses security problems of modern computer systems caused by vulnerabilities in their operating systems. An aggregated vulnerability database has been developed by joining vulnerability records from two publicly available vulnerability databases: the Common Vulnerabilities and Exposures system (CVE) and the National Vulnerabilities database (NVD). The aggregated data allow us to investigate the stages of the vulnerability life cycle, vulnerability disclosure and the elimination statistics for different operating systems. The specific technical areas the paper covers are the quantitative assessment of vulnerabilities discovered and fixed in operating systems, the estimation of time that vendors spend on patch issuing, and the analysis of the vulnerability criticality and identification of vulnerabilities common for different operating systems.
Applying F(I)MEA-technique for SCADA-Based Industrial Control Systems Dependability Assessment and Ensuring
Dependability and security analysis of the industrial control computer-based systems (ICS) is an open problem. ICS is a complex system that as a rule consists of two levels supervisory control and data acquisition (SCADA) and programmable logic controllers (PLC) and has vulnerabilities on both levels. This paper presents results of the SCADA-based ICS dependability and security analysis using a modification of standardized FMEA (Failure Modes and Effects Analysis)-technique. The technique mentioned takes into account possible intrusions and is called F(I)MEA (Failure (Intrusion) Modes and Effects Analysis). F(I)MEAtechnique is applied for determining the weakest parts of ICS and the required means of fault prevention, fault detection and fault-tolerance ensuring. An example of F(I)MEA-technique applying for SCADA vulnerabilities analysis is provided. The solutions of SCADA-based ICS dependability improvement are proposed. © 2008 IEEE.
Using Inherent Service Redundancy and Diversity to Ensure Web Services Dependability
Achieving high dependability of Service-Oriented Architecture (SOA) is crucial for a number of emerging and existing critical domains, such as telecommunication, Grid, e-science, e-business, etc. One of the possible ways to improve this dependability is by employing service redundancy and diversity represented by a number of component web services with the identical or similar functionality at each level of the composite system hierarchy during service composition. Such redundancy can clearly improve web service reliability (trustworthiness) and availability. However to apply this approach we need to solve a number of problems. The paper proposes several solutions for ensuring dependable services composition when using the inherent service redundancy and diversity. We discuss several composition models reflecting different dependability objectives (enhancement of service availability, responsiveness or trustworthiness), invocation strategies of redundant services (sequential or simultaneous) and procedures of responses adjudication. © 2009 Springer Berlin Heidelberg.
Experimenting with exception propagation mechanisms in service-oriented architecture
Exception handling is one of the popular means used for improving dependability and supporting recovery in the Service-Oriented Architecture (SOA). This practical experience paper presents the results of error and fault injection into Web Services. We summarize our experiments with the SOA-specific exception handling features provided by the two development kits: the Sun Microsystems JAX-RPC and the IBM WebSphere Software Developer Kit for Web Services. The main focus of the paper is on analyzing exception propagation and performance as the major factors affecting fault tolerance (in, particular, error handling, and fault diagnosis) in Web Services. Copyright 2008 ACM.
Dependable Composite Web Services with Components Upgraded Online
Achieving high dependability of Web Services (WSs) dynamically composed from component WSs is an open problem. One of the main difficulties here is due to the fact that the component WSs can and will be upgraded online, which will affect the dependability of the composite WS. The paper introduces the problem of component WS upgrade and proposes solutions for dependable upgrading in which natural redundancy, formed by the latest and the previous releases of a WS being kept operational, is used. The paper describes how 'confidence in correctness' can be systematically used as a measure of dependability of both the component and the composite WSs. We discuss architectures for a composite WS in which the upgrade of the component WS is managed by switching the composite WS from using the old release of the component WS to using its newer release only when the confidence is high enough, so that the composite service dependability will not deteriorate as a result of the switch. The effectiveness of the proposed solutions is assessed by simulation. We discuss the implications of the proposed architectures, including ways of 'publishing' the confidence in WSs, in the context of relevant standard technologies, such as WSDL, UDDI and SOAP. © 2005 Springer-Verlag.
On composing Dependable Web Services using undependable web components
This paper proposes a novel approach to constructing and modelling Dependable Web Services (DeW) that are built by composing web components that can be undependable. This is achieved by applying a structured approach to the Web Services (WSs) development, based on the Web Service Composition Actions (WSCAs) scheme and a corresponding event-driven simulation model of composite WS. The dependability and fault-tolerance of composite WS is achieved by employing forward error recovery based on multilevel system structuring enabling application-specific exception handling. Copyright © 2007 Inderscience Enterprises Ltd.
Testing-as-a-Service for Mobile Applications: State-of-the-Art Survey
The paper provides an introduction to the main challenges in mobile applications testing. In the paper we investigate the state-of-the-art mobile testing technologies and overview related research works in the area. We discuss general questions of cloud testing and examine a set of existing cloud services and testing-as-a-service resources facilitating testing of mobile applications and covering a large range of the specific mobile testing features.
How to Enhance UDDI with Dependability Capabilities
How dependability is to be assessed and ensured during Web Service operation and how unbiased and trusted mechanisms supporting this are to be developed are still open issues. This paper addresses the following questions: who should publish dependability parameters, in which way they should be distributed, and who (and how) should monitor these parameters in the global Service-Oriented Architecture. We discuss several techniques of on-line dependability monitoring and measurement, which extend the UDDI (Universal Description, Discovery and Integration) Business Registry with dependability metadata publishing and monitoring capabilities. The paper also proposes UDDI add-ons and light-weight user-side mechanisms for public operational and exceptional reporting. © 2008 IEEE.
Contention window adaptation to ensure airtime consumption fairness in multirate Wi-Fi networks
In the paper we address a problem of throughput unfairness inherent in the very nature of multirate Wi-Fi networks employing CSMA/CA mechanism. This unfairness exhibits itself through the fact that slow clients consume more airtime to transfer a given amount of data, leaving less airtime for fast clients. The paper introduces analytical models allowing to estimate a fair contention window (CW) size to be used by each station depending on a ratio between station's data rates. Finally, we propose a lightweight distributed algorithm that enables stations to dynamically adapt their CW so that a suboptimal airtime distribution between them is ensured. It prevents network throughput from being dramatically degraded when a slow station is connected to the network. © 2014 IEEE.
The paper discusses our practical experience and theoretical results of investigating the impact of consistency on latency in distributed fault tolerant systems built over the Internet and clouds. We introduce a time-probabilistic failure model of distributed systems that employ the service-oriented paradigm for defining cooperation with clients over the Internet and clouds. The trade-offs between consistency, availability and latency are examined, as well as the role of the application timeout as the main determinant in the interplay between system availability and responsiveness. The model introduced heavily relies on collecting and analysing a large amount of data representing the probabilistic behaviour of such systems. The paper presents experimental results of measuring the response time in a distributed service-oriented system whose replicas are deployed at different Amazon EC2 location domains. These results clearly show that improvements in system consistency increase system latency, which is in line with the qualitative implication of the well-known CAP theorem. The paper proposes a set of novel mathematical models that are based on statistical analysis of collected data and enable quantified response time prediction depending on the timeout setup and on the level of consistency provided by the replicated system.
Measuring and Dealing with the Uncertainty of SOA Solutions
This book focuses on performance and dependability issues associated with service computing and these two complementary aspects, which include concerns of quality of service (QoS), real-time constraints, security, reliability and other ...
Search of Similar Programs Using Code Metrics and Big Data-Based Assessment of Software Reliability
The work offers the adaptation of the big data analysis methods for software reliability increase. We suggest using software with similar properties and with the known reliability indicators for reliability prediction of new software. The concept of similar programs is formulated on the basis of five principles. Search results of similar programs are shown. Analysis, visualization, and interpreting for offered reliability metrics of similar programs are executed. The conclusion is drawn on reliability similarity for similar software and on a possibility of use of metrics for prediction of new software reliability. The reliability prediction will allow developers to operate resources and processes of verification and refactoring and provide software reliability increase in cutting of costs for development.
Analysis of Computer Network Reliability and Criticality: Technique and Features
The paper describes modern technologies of Computer Network Reliability. Software tool is developed to estimate of the CCN critical failure probability (construction of a criticality matrix) by results of the FME(C)A-technique. The internal information factors, such as collisions and congestion of switchboards, routers and servers, influence on a network reliability and safety (besides of hardware and software reliability and external extreme factors). The means and features of Failures Modes and Effects (Critical) Analysis (FME(C)A) for reliability and criticality analysis of corporate computer networks (CCN) are considered. The examples of FME(C)A-Technique for structured cable system (SCS) is given. We also discuss measures that can be used for criticality analysis and possible means of criticality reduction. Finally, we describe a technique and basic principles of dependable development and deployment of computer networks that are based on results of FMECA analysis and procedures of optimization choice of means for fault-tolerance ensuring.
Intrusion Avoidance via System Diversity
Dependability of Service-Oriented Computing: Time-Probabilistic Failure Modelling
In the paper we discuss a failure and servicing model of software applications that employ the service-oriented paradigm for defining cooperation with clients. The model takes into account a time-probabilistic relationship between different servicing outcomes and failures modes. We put forward a set of measures for estimating dependability of service provisioning from the client's viewpoint and present analytical models to be used for the assessment of the mean servicing and waiting times depending on client's timeout settings. © 2012 Springer-Verlag.
A Study of Orbital Carrier Rocket and Spacecraft Failures: 2000-2009
A concept of distributed replicated NoSQL data storages Cassandra-like, HBase, MongoDB has been proposed to effectively manage Big Data set whose volume, velocity and variability are difficult to deal with by using the traditional Relational Database Management Systems. Tradeoffs between consistency, availability, partition tolerance and latency is intrinsic to such systems. Although relations between these properties have been previously identified by the well-known CAP and PACELC theorems in qualitative terms, it is still necessary to quantify how different consistency settings, deployment patterns and other properties affect system performance.This experience report analysis performance of the Cassandra NoSQL database cluster and studies the tradeoff between data consistency guaranties and performance in distributed data storages. The primary focus is on investigating the quantitative interplay between Cassandra response time, throughput and its consistency settings considering different single- and multi-region deployment scenarios. The study uses the YCSB benchmarking framework and reports the results of the read and write performance tests of the three-replicated Cassandra cluster deployed in the Amazon AWS. In this paper, we also put forward a notation which can be used to formally describe distributed deployment of Cassandra cluster and its nodes relative to each other and to a client application. We present quantitative results showing how different consistency settings and deployment patterns affect Cassandra performance under different workloads. In particular, our experiments show that strong consistency costs up to 22 % of performance in case of the centralized Cassandra cluster deployment and can cause a 600 % increase in the read/write requests if Cassandra replicas and its clients are globally distributed across different AWS Regions.
Benchmarking Dependability of a System Biology Application
In this paper we report our practical experience in benchmarking a System Biology Web Service, and investigate instability of its performance and the delays induced by the communication medium. We discuss the results of a statistical data analysis and discuss the causes affecting the Web Service performance. The uncertainty discovered in Web Services operations reduces the overall dependability of Service-Oriented Architecture and require specific resilience techniques. © 2009 IEEE.
The threat of uncertainty in service-oriented architecture
In this paper we present our practical experience in benchmarking a number of existing Web Services, and investigating the instability of their performance and the delays induced by the communication medium. We provide the results of statistical data analysis and discuss a technique of Web Services performance assessment taking out of the network delays. We have found that the uncertainty discovered in Web Services operations affects dependability of Service-Oriented Architecture and will require additional specific resilience techniques. Copyright 2008 ACM.
Using Diversity in Cloud-Based Deployment Environment to Avoid Intrusions
This paper puts forward a generic intrusion-avoidance architecture to be used for deploying web services on the cloud. The architecture, targeting the IaaS cloud providers, avoids intrusions by employing software diversity at various system levels and dynamically reconfiguring the cloud deployment environment. The paper studies intrusions caused by vulnerabilities of system software and discusses an approach allowing the system architects to decrease the risk of intrusions. This solution will also reduce the so-called system's days-of-risk which is calculated as a time period of an increased security risk between the time when a vulnerability is publicly disclosed to the time when a patch is available to fix it. © 2011 Springer-Verlag.
Extended Dependability Analysis of Information and Control Systems by FME(C)A-technique: Models, Procedures, Application
This paper addresses the problems associated with dependability analysis of complex information and control systems (I&CS). FME(C)A-technique is proposed as a unified approach to I&CS dependability assessment. Classic philosophy is extended by introducing new items into assessed objects, relevant causes, assessed effects, assessed attributes and used means. FME(Ñ)À-tables and models for dependability (reliability, survivability and safety) attributes assessment are constructed. Elements of information technology of I&CS analysis are presented.
Real Distribution of Response Time Instability in Service-Oriented Architecture
This paper reports our practical experience of benchmarking a complex System Biology Web Service, and investigates the instability of its behaviour and the delays induced by the communication medium. We present the results of our statistical data analysis and distributions which fit and predict the response time instability typical of Service-Oriented Architectures (SOAs) built over the Internet. Our experiment has shown that the request processing time of the target e-science Web Service (WS) has a higher instability than the network round trip time. It has been found that by using a particular theoretical distribution, within short time intervals the request processing time can be represented better than the network round trip time. Moreover, certain characteristics of the probability distribution series of the round trip time make it particularly difficult to fit them theoretically. The experimental work reported in the paper supports our claim that dealing with the uncertainty inherent in the very nature of SOA and WSs is one of the main challenges in building dependable service-oriented systems. In particular, this uncertainty exhibits itself through very unstable web service response times and Internet data transfer delays that are hard to predict. Our findings indicate that the more experimental data is considered the less precise distributional approximations become. The paper concludes with a discussion of the lessons learnt about the analysis techniques to be used in such experiments, the validity of the data, the main causes of uncertainty and possible remedial actions. © 2010 IEEE.
Reliability assessment and prediction of the number of faults/defects is an important part of the software engineering process. Many software reliability models assume that all detected are removed with certainty and no new faults are introduced. However, the introduction of secondary faults during software updates has become quite common in software development practice, which can be explained by the enormous complexity of modern computer applications. In the paper we consider different scenarios of introducing secondary faults and how to predict number of such faults. Finally, we discuss how different SRGMs like Jelinski-Moranda, Exponential, Schick-Wolverton, Musa and Lipov models can be modified to account secondary faults in order to improve accuracy of software reliability prediction. We use an industrial case study to demonstrate applicability of the proposed approach. Our results show that considering secondary faults helped to considerably improve accuracy of software failure rate prediction.
Modern Machine Learning (ML) models have a significant number of hyper-parameters that need adjusting to leverage performance and energy efficiency for a given model configuration during training. This becomes a considerable design challenge with increasing complexity requiring larger models. This paper explores the Tsetlin Machine (TM) – a new logic-based ML approach with only four hyper-parameters regardless of the problem space. Two of these hyper-parameters influence the TM architecture while the remaining two impact the learning efficacy. This work focuses on the systematic search for optimal hyper-parameters for the TM and aims to understand how hyper-parameter values affect performance and prediction accuracy using MNIST dataset as a case study.
Motivation The analysis of complex biomedical datasets is becoming central to understanding disease mechanisms, aiding risk stratification and guiding patient management. However, the utility of computational methods is often constrained by their lack of interpretability, which is particularly relevant in clinically critical areas where rapid initiation of targeted therapies is key. Results To define diagnostically relevant immune signatures in peritoneal dialysis patients presenting with acute peritonitis, we analysed a comprehensive array of cellular and soluble parameters in cloudy peritoneal effluents. Utilizing Tsetlin Machines, a logic-based machine learning approach, we identified pathogen-specific immune fingerprints for different bacterial groups, each characterized by unique biomarker combinations. Unlike traditional ‘black box’ machine learning models, Tsetlin Machines identified clear, logical rules in the dataset that pointed towards distinctly nuanced immune responses to different types of bacterial infection. Importantly, these immune signatures could be easily visualized to facilitate their interpretation, thereby allowing for rapid, accurate and transparent decision-making. This unique diagnostic capacity of Tsetlin Machines could help deliver early patient risk stratification and support informed treatment choices in advance of conventional microbiological culture results, thus guiding antibiotic stewardship and contributing to improved patient outcomes. Availability and implementation All underlying tools and the anonymized data underpinning this publication are available at https://github.com/anatoliy-gorbenko/biomarkers-visualization.
The inability to trace an AI’s reasoning process and understand why it makes each decision is known as the black box problem. This remains one of the major barriers to the trusted and widespread use of machine learning in many application domains. The paper explores pattern recognition performance and learning dynamics of the Tsetlin Machine – a new explainable logic-based machine-learning approach. Tsetlin Machine uses a collection of finite-state automata with a unique logic-based learning mechanism and provides a promising alternative to Artificial Neural Networks with several advantages, such as interpretability, low complexity, suitability for hardware implementation and high performance. This work investigates Tsetlin Machine’s mechanism for constructing conjunctive clauses from data and their interpretation for pattern recognition on several datasets. We demonstrate that during training the logical clauses learn persistent sub-patterns within the class. Each clause creates a class template by clustering a certain number of similar class samples, combining them through literal-wise logical conjunction (i.e., AND-ing). The number of class samples that each clause combines depends on Tsetlin Machine’s hyperparameters. The more class samples that are combined, the more general the clauses become. The paper aims at uncovering how Tsetlin Machine’s hyperparameters influence the balance between clause generalization and specialization and how this affects the accuracy of pattern recognition. It also studies the evolution of the machine’s internal state, its convergence and training completion.
Green economics: A roadmap to sustainable ICT development
© 2018 IEEE. The paper discusses a systematic approach to sustainable development. It puts forward an idea of analysing energy efficiency and sustainability of a particular product, service or even a process during the whole life-cycle. Minor carbon footprint or low energy consumption of a product during its operation or exploitation does not necessary mean that the product manufacturing, decommissioning and disposal are also sustainable. In this paper, we discuss a set of sustainable principles and propose a graphical notion describing key factors of product/process sustainability. We also consider information and communication technologies (ICT) as essential tools of sustainable development in various application domains. On the other hand, ICT themselves should be considered as an object of energy efficiency improvement. The paper discusses ICT impact on the environment and identifies the fundamental green ICT trade-off between dependability, performance and energy consumption. Finally, we consider problems and propose approaches to building green clouds and datacenters.
IEEE 802.11 wireless local area networks (WLANs) are shared networks, which use contention-based distributed coordination function (DCF) to share access to wireless medium among numerous wireless stations. The performance of the distributed coordination function mechanism mostly depends on the network load, number of wireless nodes and their data rates. The throughput unfairness, also known as performance anomaly is inherent in the very nature of mixed data rate Wi-Fi networks using the distributed coordination function. This unfairness exhibits itself through the fact that slow clients consume more airtime to transfer a given amount of data, leaving less airtime for fast clients. In this paper, we comprehensively examine the performance anomaly in multi-rate wireless networks using three approaches: experimental measurement, analytical modelling and simulation in Network Simulator v.3 (NS3). The results of our practical experiments benchmarking the throughput of a multi-rate 802.11ac wireless network clearly shows that even the recent wireless standards still suffer from airtime consumption unfairness. It was shown that even a single low-data rate station can decrease the throughput of high-data rate stations by 3–6 times. The simulation and analytical modelling confirm this finding with considerably high accuracy. Most of the theoretical models evaluating performance anomaly in Wi-Fi networks suggest that all stations get the same throughput independently of the used data rate. However, experimental and simulation results have demonstrated that despite a significant performance degradation high-speed stations still outperform stations with lower data rates once the difference between data rates becomes more significant. This is due to the better efficiency of the TCP protocol working over a fast wireless connection. It is also noteworthy that the throughput achieved by a station when it monopolistically uses the wireless media is considerably less than 50 % of its data rate due to significant overheads even in most recent Wi-Fi technologies. Mitigating performance anomaly in mixed-data rate WLANs requires a holistic approach that combines frame aggregation/fragmentation and adaption of data rates, contention window and other link-layer parameters.
Experimental Evaluation of Performance Anomaly in Mixed Data Rate IEEE802.11ac Wireless Networks
IEEE 802.11 wireless local area networks (WLANs) are shared networks, which use contention-based distributed coordination function (DCF) to share access to wireless medium. The performance of DCF mechanism depends on the network load, number of wireless nodes and their data rates. The throughput unfairness also known as performance anomaly is inherent in the very nature of mixed-data rate Wi-Fi networks. This unfairness exhibits itself through the fact that slow clients consume more airtime to transfer a given amount of data, leaving less airtime for fast clients. In this paper, we evaluate the performance anomaly of mixed rate wireless networks and present the results of practical experiments benchmarking throughput of a mixed rate 802.11ac wireless network. These results clearly show that even the most recent wireless standard still suffers from the airtime consumption unfairness. At the end of the paper we analyse related works offering possible solutions and discuss our approach to evade performance degradation in mixed data rate Wi-Fi environments.
Application of wavelet transform for image camera source identification has been widely reported in the literature and the written techniques use different wavelets. Due to the wavelets’ diversity and properties, it is beneficial for the research community to identify the best-performing wavelets for this application. This paper presents results for assessing the performance of the conventional wavelet-based image camera source identification technique against forty-one wavelets from Daubechies, Biorthogonal, Symlets, and Coiflets wavelet families. VISION image dataset comprising 34,427 images captured by eleven camera brands of thirty-five models was used to generate experimental results. Hundred plane images from each camera brand dataset were randomly selected and used to generate experimental results, where 70% of each dataset’s images were used to compute the camera brand’s signature, and 30% of the images were used to assess the performance of the method. Normalized cross-correlation of the camera brand signature and calculated image noise were used to find the camera match. To compare the method’s performance when using different wavelets, a new assessment criterion was introduced and used to quantify the method’s performance across images of different camera brands. Results show that the conventional wavelet-based image camera source identification achieves its highest performance when it uses sym2 closely followed by coif1 wavelets.
However, achieving these qualities requires resolving a number of trade-offs between various properties during system design and operation. This paper reviews trade-offs in distributed replicated databases and provides a survey of recent research papers studying distributed data storage. The paper first discusses a compromise between consistency and latency that appears in distributed replicated data storages and directly follows from CAP and PACELC theorems. Consistency refers to the guarantee that all clients in a distributed system observe the same data at the same time. To ensure strong consistency, distributed systems typically employ coordination mechanisms and synchronization protocols that involve communication and agreement among distributed replicas. These mechanisms introduce additional overhead and latency and can dramatically increase the time taken to complete operations when replicas are globally distributed across the Internet. In addition, we study trade-offs between other system properties including availability, durability, cost, energy consumption, read and write latency, etc. In this paper we also provide a comprehensive review and classification of recent research works in distributed replicated databases. Reviewed papers showcase several major areas of research, ranging from performance evaluation and comparison of various NoSQL databases to suggest new strategies for data replication and putting forward new consistency models. In particular, we observed a shift towards exploring hybrid consistency models of causal consistency and eventual consistency with causal ordering due to their ability to strike a balance between operations ordering guarantees and high performance. Researchers have also proposed various consistency control algorithms and consensus quorum protocols to coordinate distributed replicas. Insights from this review can empower practitioners to make informed decisions in designing and managing distributed data storage systems as well as help identify existing gaps in the body of knowledge and suggest further research directions.
Source Camera Identification Techniques: A Survey
Successful investigation and prosecution of major crimes like child pornography, insurance claims, movie piracy, traffic monitoring, and scientific fraud among others, largely depends on the availability of water-tight evidence to prove the case beyond any reasonable doubt. When the evidence required in investigating and prosecuting such crimes involves digital images/ videos, there is a need to prove without an iota of doubt the source camera/device of the image in question. Much research has been reported to address this need over the past decade. The proposed methods can be divided into brand or model-level identification or known imaging device matching techniques. This paper investigates the effectiveness of the existing image/video source camera identification techniques, which use both intrinsic hardware artefacts-based techniques like sensor pattern noise, and lens optical distortion, and software artefacts-based techniques like colour filter array, and auto white balancing, to determine their strengths and weaknesses. Publicly available benchmark image/video datasets and assessment criteria to quantify the performance of different methods are presented and the performance of the existing methods is compared. Finally, directions for further research on image source identification are given.
The successful investigation and prosecution of significant crimes, including child pornography, insurance fraud, movie piracy, traffic monitoring, and scientific fraud, hinge largely on the availability of solid evidence to establish the case beyond any reasonable doubt. When dealing with digital images/videos as evidence in such investigations, there is a critical need to conclusively prove the source camera/device of the questioned image. Extensive research has been conducted in the past decade to address this requirement, resulting in various methods categorized into brand, model, or individual image source camera identification techniques. This paper presents a survey of all those existing methods found in the literature. It thoroughly examines the efficacy of these existing techniques for identifying the source camera of images, utilizing both intrinsic hardware artifacts such as sensor pattern noise and lens optical distortion, and software artifacts like color filter array and auto white balancing. The investigation aims to discern the strengths and weaknesses of these techniques. The paper provides publicly available benchmark image datasets and assessment criteria used to measure the performance of those different methods, facilitating a comprehensive comparison of existing approaches. In conclusion, the paper outlines directions for future research in the field of source camera identification.
Successful investigation and prosecution of major crimes like child pornography, insurance claims, movie piracy, traffic monitoring, and scientific fraud among others, largely depends on the availability of water-tight evidence to prove the case beyond any reasonable doubt. When the evidence required in investigating and prosecuting such crimes involves digital images/ videos, there is a need to prove without an iota of doubt the source cam-era/device of the image in question. Much research has been reported to address this need over the past decade. The proposed methods can be divided into brand or model-level identification or known imaging device matching techniques. This paper investigates the effectiveness of the existing image/video source camera identification techniques, which use both intrinsic hardware artifacts-based techniques like sensor pattern noise, and lens optical distortion, and software artifacts-based techniques like color filter array, and auto white balancing, to determine their strengths and weaknesses. Publicly available benchmark image/video datasets and assessment criteria to quantify the performance of different methods are presented and the performance of the existing methods is compared. Finally, directions for further research on image source identification are given
Obesity is a major global concern with more than 2.1 billion people overweight or obese worldwide, which amounts to almost 30% of the global population. If the current trend continues, the overweight and obese population is likely to increase to 41% by 2030. Individuals developing signs of weight gain or obesity are also at the risk of developing serious illnesses such as type 2 diabetes, respiratory problems, heart disease, stroke, and even death. It is essential to detect childhood obesity as early as possible since children who are either overweight or obese in their younger age tend to stay obese in their adult lives. This research utilises the vast amount of data available via UK's millennium cohort study to construct machine learning driven framework to predict young people at the risk of becoming overweight or obese. The focus of this paper is to develop a framework to predict childhood obesity using earlier childhood data and other relevant features. The use of novel data balancing technique and inclusion of additional relevant features resulted in sensitivity, specificity, and F1-score of 77.32%, 76.81%, and 77.02% respectively. The proposed technique utilises easily obtainable features making it suitable to be used in a clinical and non-clinical environment.
Application of Unsupervised Learning in Weight-Loss Categorisation for Weight Management Programs
There has been an increase in the need to have a weight management system that prevents adverse health conditions which can in the future lead to various cardiovascular diseases. Several types of research were made in attempting to understand and better manage body-weight gain and obesity.This study focuses on a data-driven approach to identify patterns in profiles with body-weight change in a dietary intervention program using machine learning algorithms. The proposed line of investigation would analyse these patient's profile at the entry of dietary intervention program and for some, on a weekly basis. These attributes would serve as inputs into machine learning algorithms.From the unsupervised learning perspective, the paper seeks to address the first stage in applying machine learning algorithms to weight management data. The specific aim here is to identify the thresholds for weight loss categories which are required for supervised learning.
Machine Learning Approaches for the Analysis and Prediction of Risk of Excess Weight in Young People
Obesity is a major global concern with more than 2.1 billion people overweight or obese worldwide which amounts to almost 30% of the global population. If the current trend continues, the overweight and obese population is likely to increase to 41% by 2030. Individuals developing signs of weight gain or obesity are also at the risk of developing serious illnesses such as type 2 diabetes, respiratory problems, heart disease, stroke, and even death. Some intervention measures such as physical activity and healthy eating can be a fundamental component to maintain a healthy lifestyle and help in tackling obesity. Therefore, it is essential to detect childhood obesity as early as possible since children who are either overweight or obese around their adiposity rebound age tend to stay obese in their adult lives. This research utilises the vast amount of data available via UK’s millennium cohort study to construct machine learning driven frameworks to predict young people at the risk of becoming overweight or obese. Although there are several research examples globally in predicting childhood obesity, the use of Millennium Cohort Study (MCS) data remains underutilised. Furthermore, attempts have only been made to predict obesity with a small timelapse between the observed data and target prediction age. This research focuses on predicting obesity at two major milestones in children’s growth using data collected from birth, and subsequent surveys conducted at ages 3, 5, 7 and 11. The first milestone is the adiposity rebound age which occurs around the age of 6 or 7. This milestone is important since children tend to stay on the obesity centile in their later growth years whichever obesity centile they are on during their adiposity rebound age period. The second important milestone is when children enter their teen years and hormones start to change. Having a healthier weight when entering teen years will help them not having to worry about problems caused by overweight and obesity. The MCS survey took place 6 times and data from six survey waves were combined to form one dataset containing longitudinal and cross-sectional features relating to children’s growth. There is an inherent imbalance in the dataset of individuals with normal BMI and the ones at risk. Data balancing was carried out using the random under-sampling algorithm to under sample the majority class to create subsets of data to match the at risk class size. The results obtained from each subset were averaged to arrive at a final classification accuracy. This approach makes use of the complete dataset and does not involve generation of any synthetic data to over sample the minority class. Several frameworks are proposed including classification from regression to predict BMI and then determining obesity flags using age and sex and also to identify key factors that influence obesity. These proposed frameworks allow to maximise all classification metrics and predict the adiposity rebound age obesity with an accuracy of 83% using the age 5 BMI values. One of the frameworks was used to predict the adiposity rebound age obesity when using the data from age 3 survey. The use of data balancing and additional relevant features helped in improving prediction accuracy from 64.55% to over 73% and the F1 score from 45.93% to over 71%. Combining the other relevant features with the BMI data allows to predict the age 14 obesity status as early as age 3 with an accuracy of over 70%. There are hundreds of additional features but only the easily obtainable ones were considered so that even parents or caregivers can make use of them in predicting the obesity status of a child. The focus has been not only to maximise the average accuracies but to also to enhance the specificity and precision values to minimise the prediction of false positives. The suitability of each framework for clinical assessment and population monitoring is clearly identified.
Today, we live in the world of information technologies, which have penetrated into all possible spheres of human activity. Recent developments in database management systems have coincided with advances in parallel computing technologies. In view of this fact, a new class of data storage has appeared, namely globally distributed non-relational database management systems, and they are now widely used in Twitter, Facebook, Google and other modern distributed information systems to store and process huge volumes of data. Databases have undergone a certain evolution from mainframe architecture to globally distributed non-relational repositories designed to store huge amounts of information and serve millions of users. The article indicates the drivers and prerequisites of this development, and also considers the transformation of models of properties of database management systems and theorems that formalize the relationship between them. In particular, the conditionality of the transition from the ACID property model to the BASE model is considered, which relaxes the requirements for data consistency, which is necessary to ensure the high performance of distributed databases with many replicas. In addition, a concise justification of the SAR and PACELC theorems, which establish mutually exclusive relationships between availability, consistency, and speed in replicated information systems, is provided, and their limitations are analyzed. The compatibility issues of the consistency models used by different non-relational data stores are noted, and, as an example, the possible consistency settings of the NoSQL databases Cassandra, MongoDB, and Azure CosmosDB are discussed in detail. The results of the evolution of distributed database architectures are summarized using the GSN (Goal Structuring Notation). Further directions of scientific research and ways of further developing globally distributed information systems and data repositories are also outlined.
Innovative Education and Science in Information Technologies: Experience of N. Ye. Zhukovsky National Aerospace University (KhAI)
Information technologies are among the most promising and fastest growing sectors in the world and in the Ukrainian industry. Herein, the authors share the experience of National Aerospace University (KhAI) in the personnel education and training for the IT industry and the results of successful cooperation with IT companies. Innovative education programs, as well as scientific and practical researches in the field of information technologies, which are implementing in KhAI have been discussed.
Source Camera Identification (SCI) is essential in digital image forensics, enabling reliable attribution of images to their originating devices for legal, investigative, and security applications. Yet, existing SCI methods often struggle under diverse imaging conditions due to scene-dependent noise and texture interference. This thesis advances SCI through four key contributions. First, a systematic evaluation of forty-two wavelets using the VISION dataset (34,427 images, 35 camera models, 11 brands) identified cdf9/7 as the most effective for Sensor Pattern Noise (SPN) extraction, followed by sym2 and coif1. Second, an Improved Camera Source Identification using Wavelet Noise Residuals and Texture Filtering (ICSI-WNRTF) method was developed to suppress high-texture regions, achieving 99% accuracy for model identification and 98% for device attribution. Third, a Curvelet-Based Camera Source Identification Leveraging Image Smooth Regions (CBCSI-SR) framework was introduced. By exploiting multi-scale directional features and isolating smooth regions, it achieved 99.6% model-level and 98.9% device-level accuracy while reducing false decisions. Finally, a Deep Learning-Based Texture Exclusion for Source Camera Identification (DLTESCI) approach combined texture suppression with a fine-tuned ResNet50, reaching 99.7% accuracy and outperforming contemporary methods across Accuracy, Precision, Recall, FPR, and FNR. Together, these contributions establish a progression from wavelet-based to curvelet-based and deep learning-driven SCI techniques, delivering robust, scalable, and highly precise solutions for forensic applications.
Source Camera Identification using Sensor Pattern Noise
Source Camera Identification (SCI) is essential in digital image forensics, enabling reliable attribution of images to their originating devices for legal, investigative, and security applications. Yet, existing SCI methods often struggle under diverse imaging conditions due to scene-dependent noise and texture interference. This thesis advances SCI through four key contributions. First, a systematic evaluation of forty-two wavelets using the VISION dataset (34,427 images, 35 camera models, 11 brands) identified cdf9/7 as the most effective for Sensor Pattern Noise (SPN) extraction, followed by sym2 and coif1. Second, an Improved Camera Source Identification using Wavelet Noise Residuals and Texture Filtering (ICSI-WNRTF) method was developed to suppress high-texture regions, achieving 99% accuracy for model identification and 98% for device attribution. Third, a Curvelet-Based Camera Source Identification Leveraging Image Smooth Regions (CBCSI-SR) framework was introduced. By exploiting multi-scale directional features and isolating smooth regions, it achieved 99.6% model-level and 98.9% device-level accuracy while reducing false decisions. Finally, a Deep Learning-Based Texture Exclusion for Source Camera Identification (DLTESCI) approach combined texture suppression with a fine-tuned ResNet50, reaching 99.7% accuracy and outperforming contemporary methods across Accuracy, Precision, Recall, FPR, and FNR. Together, these contributions establish a progression from wavelet-based to curvelet-based and deep learning-driven SCI techniques, delivering robust, scalable, and highly precise solutions for forensic applications.
Current teaching
- Computer Communications (level 4)
- Ethical Hacking and Penetration Testing (level 4)
- Cloud Computing Development (Level 7)
Teaching Activities (5)
Sort By:
Featured First:
Search:
Computer Communications
22 January 2017
Digital Security Landscapes
23 September 2018
Cloud Computing Development
24 September 2017
Digital Security Landscapes
September 2020
Grants (1)
Sort By:
Featured First:
Search:
KNOT: Resource-aware Knowledge Transfer Methods for Machine Learning Hardware in At-the-Edge Applications
Featured Research Projects
VULNERABILITY: Rigorous approach to software vulnerability life cycle management
The purpose of the project is to develop a framework aimed at rigorous vulnerability management (including vulnerability scanning, alerting, isolation,removal, intrusion avoidance, etc.)
TRADE-OFF: A trade-off framework for globally distributed NoSQL data storages
The purpose of the project is to develop a framework allowing developers of large-scale distributed systems to interplay between Energy Consumption, Latency, Availability, Durability and Consistency of distributed data storages like Cassandra NoSQL.
AVOID: A cloud platform-as-a-service for secure services deployment avoiding intrusions
The purpose of the project is to develop a framework for secure internet browsing and secure deployment of cross-platform (e.g. Java, Python, Ruby) web-applications.
COSMO: Risks analysis of the launch vehicle crashes and spacecraft failures
The purpose of the work is to analyse the risks of the launch vehicle crashes and spacecraft failures, which occurred during the first two decades of the 21st century.
{"nodes": [{"id": "21809","name": "Dr Anatoliy Gorbenko","jobtitle": "Reader","profileimage": "/-/media/images/staff/dr-anatoliy-gorbenko.jpg","profilelink": "/staff/dr-anatoliy-gorbenko/","department": "School of Built Environment, Engineering and Computing","numberofpublications": "67","numberofcollaborations": "67"},{"id": "19660","name": "Dr Akbar Sheikh Akbari","jobtitle": "Reader","profileimage": "/-/media/images/staff/lbu-approved/beec/akbar-sheikh-akbari.jpg","profilelink": "/staff/dr-akbar-sheikh-akbari/","department": "School of Built Environment, Engineering and Computing","numberofpublications": "141","numberofcollaborations": "7"},{"id": "9117","name": "Fash Safdari","jobtitle": "Senior Lecturer","profileimage": "/-/media/images/staff/fash-safdari.jpg","profilelink": "/staff/fash-safdari/","department": "School of Built Environment, Engineering and Computing","numberofpublications": "5","numberofcollaborations": "2"},{"id": "6513","name": "Professor Ah-Lian Kor","jobtitle": "Professor","profileimage": "/-/media/images/staff/lbu-approved/beec/ah-lian-kor.jpg","profilelink": "/staff/professor-ah-lian-kor/","department": "School of Built Environment, Engineering and Computing","numberofpublications": "145","numberofcollaborations": "1"},{"id": "304","name": "Dr Balbir Singh","jobtitle": "Part-Time Lecturer","profileimage": "/-/media/images/staff/default.jpg","profilelink": "/staff/dr-balbir-singh/","department": "School of Built Environment, Engineering and Computing","numberofpublications": "12","numberofcollaborations": "2"},{"id": "27907","name": "Dr Sepehr Ghaffari","jobtitle": "Senior Lecturer","profileimage": "/-/media/images/staff/dr-sepehr-ghaffari.jpg","profilelink": "/staff/dr-sepehr-ghaffari/","department": "School of Built Environment, Engineering and Computing","numberofpublications": "38","numberofcollaborations": "2"}],"links": [{"source": "21809","target": "19660"},{"source": "21809","target": "9117"},{"source": "21809","target": "6513"},{"source": "21809","target": "304"},{"source": "21809","target": "27907"}]}
Dr Anatoliy Gorbenko
21809
