This article details the key considerations for building an ARM-based AIoT (Artificial Intelligence of Things) device for facial recognition. For specific AIoT use cases and applications read our accompanying article.
Let’s start by defining an ARM-based AIoT device. AIoT devices are connected to the internet and run AI algorithms to carry out tasks. Inside the actual device is an ARM-based SoC (system-on-a-chip) or SOM (system-on-a-module) with a CPU for general purpose processing and a GPU for accelerations. Many AIoT vendors also incorporate an APU (AI Processing Unit) or NPU (Neural Processing Unit) into the SoC or SOM so that AI applications can run more efficiently on the edge instead of cloud servers.
ARM-based devices are energy efficient, using only a fraction of the power of x64 computers. ARM processors are favored for use in thin edge-based AIoT devices such as kiosks, industrial tablets, digital signs, wearables, and mobile devices.
Building AIoT devices varies depending on the use case and environment. Factors such as performance requirements, cost constraints, and power consumption all have to be taken into consideration.
To determine performance requirements you need to consider how complex the tasks will be. For example, do you need the device to recognize only one face at a time, or do you need it to recognize many faces concurrently?
Let’s take smart locks and access control as a use case. If the device only needs to recognize one face at a time, as people approach individually, a smaller AI model with less computing performance will work perfectly fine. The FaceMe® H (high precision) model is sufficient for this use case. However, if you want to identify block-listed people by performing facial recognition through cameras placed at the entrance of a busy shopping mall, you will need more computing power.
You also need to determine if the use case requires identifying people or simply collecting demographic information. For example, with smart signage that only needs to identify demographic information (such as age and gender) you can use a device with lower computing performance. If you need the smart sign to also perform facial recognition and identification you will need higher performance.
The more computing performance your AIoT device requires the higher the cost will be. Each chip vendor ((such as NVIDIA, Intel, Qualcomm, MediaTek, NXP, etc.) offers their own pricing model, with tiered product lines and costs based on the interface (such as Wi-Fi, HDMI, or USB). For example, the NVIDIA Jetson solution is a commonly used chipset because it delivers good performance. However, its high performance comes at a higher price.
The best way to think about form factor is size. If you require the AIoT device to match certain dimensions, such as a smart lock that fits on a door, size will likely be a significant constraint. However, size won’t be a constraint if the device is for a giant screen.
For other considerations in building your AIoT device, check out 7 Success Factors for Choosing the Best Facial Recognition Solution.
Computing performance and power go hand-in-hand. The higher the device’s performance the more power required. The good news is that when vision technologies and AI algorithms are accelerated through dedicated chips, they become more power efficient.
The MediaTek Genio 350 chipset, with the addition of an APU (AI Processing Unit), is a good example. The APU and NeuroPilot platform work together, enabling the AI algorithm to perform faster while saving power. When compared to general-purpose CPUs, an APU enhances efficiency for large parallel data computations commonly used in image processing for AI algorithms.
MediaTek’s Pumpkin Developer Board.
Source: MediaTek
Here are the four most common AIoT device configurations for facial recognition from highest to lowest in terms of performance and cost.
The Jetson was first introduced by NVIDIA in 2014 as the Jetson TX. Since then, NVIDIA has rapidly revamped the product through several generations. The latest iteration is the Jetson Orin Series, released in the end of 2022, that includes several models such as Jetson Orin NX, Jetson Orin Nano, and Jetson AGX Orin series. It provides a good balance between AI inference performance, power consumption and cost. Jetson platforms can handle more than one video input for facial recognition at a time, along with other applications. If you require a device to perform multiple tasks at once, Jetson is a great option.
NVIDIA provides tools and SDKs, including CUDA, TensorRT, DeepStream and more, which are some of the most mature options available on the market. All of these tools will enable the AI algorithm to run smoothly, which makes it easy to deploy facial recognition using Jetson. NVIDIA provides JetPack as the full development package, which includes a modified version of Ubuntu and all required toolchains and libraries.
Based on our testing, Jetson Orin Dev Kit can support FaceMe® VH model for 120 fps, and UH model for 96 fps. Each input image contains 1 face, and image size comparison in 1080p resolution.
Jetson devices are also quite small, with a form factor as small as a mini-PC less than 5” x 5” and 10 watts. However, these are some of the costlier devices.
NVIDIA’s Jetson Family.
Source: NVIDIA
Qualcomm provides a wide range of SOC’s, from those that go into mobile phones and tablets, like the Snapdragon line, to AIoT Edge products like the QCS410/610 . The Qualcomm Neutral Processing SDK (SNPE), which enables algorithms to run on GPU or DSP chips as part of the SOC, is one of the best platforms to enhance runtime speed for AI algorithms and facial recognition.
Through our testing of FaceMe® on an Advantech industrial tablet with Snapdragon 660 using Android OS, we found the end-to-end facial recognition processing time to be reduced by 40% and CPU usage down by 64% by enabling AI to run with SNPE on GPU. The Snapdragon 660 ran the VH model for 16fps and H for 26fps, both at 720p.
The MediaTek Genio 350 provides very reasonable performance at a competitive price for AIoT devices. it works best for devices with a touchscreen, such as POS terminals, smart locks, industrial tablets or display panels embedded into smart home appliances or fitness equipment. MediaTek is known for providing great turnkey solutions to unbranded mobile phones. They provide SDKs, sample codes and tools, which make it easy for anyone to build their own applications. Genio 350 and 700 both support Android and Ubuntu.
When we tested it with FaceMe®, MediaTek Genio 350 ran the VH model for 16 fps while Genio 700 ran the VH model for 163 fps, both at 720p image resolution.
NXP Semiconductors provideprovides a wide variety of mixed signal and standard products for security, identification, automotive, networking, radio, analog and power management expertise. The NXP i.MX8M Plus, available since early 2021, is the first NXP platform to include an NPU into their SOC. This will effectively speed up AI algorithms and reduce CPU workloads when running AI applications.
In order to support the APU on NXP, TensorFlow Lite framework and inference engines are used. TensorFlow Lite is an open source project, mostly developed by Google, which uses it in many of its AI applications. It is a feature-rich, stable and mature platform. The i.MX8M Plus is great for porting a number of AI applications into the platform. If you have extra AI applications that you want to support on this platform, it is also very easy to port your applications on to it.
From our testing with FaceMe®, the i.MX8M Plus ran the VH model for 14 fps at 720p image resolution.
Here’s a short summary showing the performance of each platform
* For above testing, each image contains one face
The two most popular operating systems used in AIoT devices for facial recognition are Linux (Ubuntu and Debian) and Android. When selecting which OS is right for you it will be important to consider the needs of your specific use case. FaceMe® is one of the most versatile facial recognition engines, supporting more than ten OS' including Linux, Ubuntu, Redhat, CentOS, Android, iOS, and Windows. However, there are a few key differences to note between Linux/Ubuntu and Android.
Linux is very customizable and flexible. You can completely personalize components of the OS for your specific use case, and make it as fast and efficient as possible by easily removing services you don’t need. Ubuntu is one of the most popular Linux variations and is widely used, from edge devices to cloud servers. Ubuntu Linux releases long-term support versions (LTS) every two years to keep the technology up to date, with support provided for up to five years. Since it is open source there is a rich set of code libraries at your disposal which can greatly speed up the development of your project.
Android, developed by Google, is based on a modified version of Linux. It is most used for smartphones and tablets. Android is a developer-friendly solution and is often the preferred OS for AIoT devices. Android toolchains, IDE (Android Studio), and SDKs are mature and professional tools which can be used at no cost and do not carry any royalties. Android applications are often developed in Java, which is easier to code and manage than C++. However, it also supports integration of components written in C++ into the Java program, enabling the benefits of both languages.
There are several popular AIoT configurations available to engineers and developers when building systems for facial recognition. When evaluating options the most important thing is to understand the use case and consider specific performance, form factor, extensibility, and budget concerns. Once you think you have the right build and design for your use case we recommend conducting a proof-of-concept (POC) to identify any necessary fixes and improvements before officially deploying the application.