With the advances in hardware technologies, embedded and IoT devices are now able to offer enough memory and computational power to accommodate light-weight machine-learning (ML) classifiers. However, the amount of energy consumed even by light-weight ML applications is still a practical barrier as most of such devices operate under tight power budgets.
In this paper, our objective is to maximize energy efficiency of devices equipped with light-weight ML classifiers without compromising classification accuracy. This specifically helps those devices that collect streaming data via sensors and analyze the data to offer real-time classification services.
We propose and implement a hardware-friendly pre-processing mechanism that takes into account the accuracy of the ML classifier along with a proposed similarity metric between incoming data frames. The pre-processing mechanism combines the even-driven and time-driven strategies for scheduling the ML classifiers in a way that reduces the frequency of execution of the energy-hungry ML module. The event-driven strategy triggers the ML module only upon significant changes (dissimilarity) in the streaming data frames.
Using the proposed pre-processing mechanism, we achieved up to 80% reduction in the number of function calls for the ML classifier while only losing less than 2% of classification accuracy.
At its core, the Internet of Things is about sensors embedded into devices of all shapes and forms, which provides continuous streams of data via a plethora of connectivity mechanism (e.g. Ethernet, Bluetooth, RFID, etc.) to one or more central locations. The purposes for transmitting sensor data are numerous, but the assumption in all cases is that the data can then be analyzed and become actionable in some way that is beneficial to the user.
This means that all IoT-related services, no matter how disparate they may be, always follow demonstrate these five basic traits: Collection, Transmission, Storage, Analysis and Decision Making.
The birth of IoT devices is directly related to Moore’s Law which was an observation doubling every year in the number of components per integrated circuit, and projected this rate of growth would continue for at least another decade. Although it is not as relevant today because of the physical limit of number of transistors that can fit in a chip, it made quite an impression on the Computer Science community during the the time of its publication and even to modern days. Moore’s Law made the first three steps in this chain (Collection, Transmission, and Storage) ubiquitous and affordable. The hardware, software, and connectivity required to perform these steps have become very small, very cheap, very efficient, and broadly available. When we hit the point of critical mass a couple of years back, when all of those qualifiers became applicable, the Internet of Things was born.
However, for any IoT application to be worth purchasing (or manufacturing) and by extension having the potential to be released into a production environment, it must demonstrate value in the last step of that chain, the ‘Decision Making’. Of course, it can mean an infinite number of things, ranging from a profound physical machine action (e.g. dispatching an ambulance to the site of a car accident) to simply providing much needed information to a relevant consumer (e.g. sending a text message to alert a driver that their car needs an oil change). But no matter what the ultimate step of ‘Decision Making’ actually is, its worth is entirely dependent on the penultimate Analysis.
At the ‘Analysis’ stage, the true worth of any Internet of Things service is determined, and this is where Artificial Intelligence (or, more specifically, the subset of AI called ‘Machine Learning’) will play an important role. Machine learning (ML) is a form of programming that makes actions valuable by empowering a software with the ability to identify patterns in the given data such that the ML can learn from the data to adjust the ways in which it then analyzes that data.
When machine learning is applied to the ‘Analysis’ step, it can dramatically change what is (or is not) performed at the subsequent ‘Decision Making’ step, which in turn dictates whether the action has high, low, or no value to the end user.
In order for ML systems to work efficiently in real-time embedded and IoT applications, they must be developed under the design constraints these systems in addition to prediction accuracy. These constraints generally consist of energy consumption, memory bottleneck, and realtimeness.
To facilitate this goal, in this paper, we propose a hardware friendly pre-processing mechanism for low-power embedded and IoT systems. We leverage this approach to maximize the energy efficiency of the AI module integrated into the device so that minor variations in the data samples will not trigger the energy-demanding AI module for updating, learning and classification purposes.
The rest of this paper is organized as follows. Section presents architectural details of the proposed pre-processing technique.
To maximize energy efficiency of machine learning applications deployed embedded and IoT platforms, we have to deal with the intensive computations that ML algorithms need for performing the analysis task. Ideally, the data captured by the sensors of embedded and IoT devices is considered to contain highly useful and valuable information. However, since sensor data are usually correlated both temporally and spatially, not all the gathered data samples carry useful information to be extracted by further/complex data processing and analysis. Such a data redundancy decreases the performance of embedded devices as it imposes unnecessary computations, excessive data accesses and data transmissions, as well as inefficient energy consumption for performing these tasks. The data redundancy has different sources, however, over sampling (sampling data with a higher frequency than needed) seems to be the most common culprit. Since the redundancy does not necessarily reflect an exact equivalence between data samples, it is not easy to detect and remove redundant samples by simple comparison operations (bit/byte level comparisons would result in unacceptable false positive/negative outcomes). To investigate this, we have done an experiment to study the effectiveness of simple bit/byte level comparison methods.
First, we trained a standalone machine-learning-classifier for texture classification. For the experiments, we used k-nearest neighbor classifier on the 13 texture classes of the Brodatz dataset.
Since we are dealing with grayscale images, we used Local Binary Patterns (LBP) to perform texture matching inside the trigger mechanism for the machine leaning module. The original LBP operator replaces the value of the pixels of an image with decimal numbers, which are called LBP codes that encode the neighborhood with an association of eight pixels around each pixel. Each central pixel is compared with its eight neighbors; the neighbors having value lower than that of the central pixel will be set to the bit $0$ while the other neighbors having value greater than or equal to that of the central pixel will be to the bit $1$. Then when iterating through all the central pixels, we can generate a binary number that is obtained by concatenating all these binary bits in a clockwise manner, starting from the one of its top-left neighbor. The central pixel’s value will be replaced by the decimal value corresponding to the generated binary number. Due to its discriminative power and computational simplicity, LBP texture operator has produced very convincing results when integrated into this experiment.
After applying the proposed HFTM on this experiment, we observed a significant saving in time and energy while a negligible accuracy loss.
In fact, with both absolute and relative change thresholds, the accuracy of the machine learning algorithm when paired with the event trigger is able to closely match the accuracy of the basic classifier. In some cases(absolute change threshold less than 0.004 and relative change threshold less than 1.5) the classifier ends up achieving better results than the former although it’s accuracy starts degrading pretty quickly with higher threshold values. This issues seems to have been addressed by the time trigger mechanism that make sure that the model used by the classifier is always kept updated thus yielding results closely matching the basic classifier even with higher threshold values. By implementing those changes, we can also see a significant reduction in the overall usage of the machine learning module which is imperative to make the device more efficient in its use of the cpu, memory and energy.