Understanding bias

Alicio Fecha de publicación: 4 de October de 2021

Facial recognition is increasingly being discussed, its applications deployed across several and diversified industries.

The pandemic has brought about a proliferation of contactless solutions aimed at preventing the spread of the virus, and biometrics are often at the top of the list.

Despite an increasing number of applications using biometrics and face recognition in particular, it is common to hear unpleasant stories of biased algorithms and ensuing discrimination, particularly when the technology is deployed by law enforcement.

???? While the impact of these events should not be diminished, it is important to consider that face recognition is merely a tool, and that any bias in it is inherently a result of how its algorithms are trained.

What is bias in face recognition?

The vast majority of face recognition systems today work by scanning face images, translating them into numerical expression, and then comparing them to determine similarity.

To execute this process, however, the system first needs to use artificial neural networks to scan a substantial number of face images.

Through initially established rules, the algorithms then use deep learning to improve their predictions and get more accurate results.

The rules by which the algorithms start their learning process represents one of the two main issues related to biases in face recognition. From biased premises, after all, is always inferred a biased result.

The second issue relates to the sample selected to aid biometric algorithms in their learning process.

In other words, while to facilitate the process and comply with privacy regulations, many companies train their algorithms on standard datasets instead of creating their own, there is often no guarantee that the samples in the dataset are free from biases.

How can bias be mitigated?

Recently, our colleagues at Alice Biometrics, Daniel Pérez Cabo, Esteban Vázquez Fernández and Artur Costa-Pazo, accompanied by David Jiménez-Cabello and José Luis Alba Castro, analyzed this issue in a study focused on demographic biases.

The main purpose of the research is to enable accountability and fair comparison of several face Presentation Attack Detection (PAD) approaches.

Throughout the analysis, Costa‐Pazo and co-authors highlighted the flaws in most main image databases, then presented an attempt to evaluate the ethnic bias in a self‐designed dataset and protocol.

“Fairness is a critical aspect for any deployable solution aiming at creating models that are agnostic to different social and demographic characteristics,” the paper reads.

To pursue this goal, the researchers took the GRAD-GPAD framework, then added the categorisation of three new datasets, thus increasing the number of identities by more than 300% and the number of samples by more than 181% more.

In addition, the team introduced new categorisation and labelling for sex, age and skin tone, as well as novel demographic protocols, visualisation tools and metrics to detect and measure the existence of biases.

The researchers dubbed the improved database GRAD-GPAD v2.

Different objectives, different scenarios

According to the paper, despite the best efforts of the community, the research results indicate that individual datasets have a strong built-in bias.

“The rationale behind this bias originates on the experimental setting as follows: each dataset has different objectives and evaluates the performance of the models in different scenarios.”

For instance, some of these systems may be built to be run only on mobile devices, either in outdoor or lab environments, with synthetic or natural illumination, using a simulated onboarding setting, or for deployment in other, specific scenarios.

“Even if we try to incorporate every possible scenario, we find that the biases are still present in some form,” the paper explains.

By incorporating additional datasets and a novel labelling approach, Costa‐Pazo’s team achieved some interesting results.

Beyond the lack of representativeness

Aggregating the datasets as it is done in GRAD‐GPAD v2 not only showed an improvement in mitigating the bias coming from the data but also helped the team understand the bias distribution within the training dataset.

“[This] allowed us to incorporate compensations in the learning process and to improve future dataset captures,” the researchers wrote.

While the study is only one example of how biases in face recognition can be mitigated, it is a relevant one.

In fact, GRAD‐GPAD v2 bridges the gap of the lack of representativeness of state‐of‐the‐art works and moves a step towards fair evaluations between methods.

“[It does so] not only from the perspectives of the different instruments used to perform the attacks but also considering realistic settings in production,” the paper concludes.

???? Of course, these findings are used in Alice Biometrics developments with the aim of providing secure and unbiased biometric data.

Si quieres conocer más sobre nuestra tecnología, ¡contáctanos!

This publication has been financed by the Agencia Estatal de Investigación DIN2019-010735 / AEI / 10.13039/501100011033

Understanding bias

What is bias in face recognition?

How can bias be mitigated?

Different objectives, different scenarios

Beyond the lack of representativeness

You may also be interested in

Chatting with Your Documentation: LLMs and RAG for Enhanced Information Retrieval

Scalable infrastructure in identity verification services

Refactoring in the Frontend: Ideas and Reflections