Databases Antipatterns (@WTMZ23)
In a sector such as identity verification, where detection systems are based on Artificial Intelligence technology, a very high volume of data is involved, so it is crucial that access to data is efficient, fast and allows complex processing and loading operations to be carried out with relative simplicity.
It is common to work with volumes of millions of users in real time, combining information stored as relational data or as external files. At Alice Biometrics we know this operation very well and we have integrated it into our day-to-day work, both to provide a quality service to our customers and to incorporate this data into the flow of continuous improvement of our fraud detection systems. That is why we know the importance of treating data properly from its origin: in the construction of the systems that will store the data.
In relational databases, it is common to incur in anti-patterns when building the tables and schemas that make up the database. The easiest way to prevent our databases from becoming systems that are not very operative and complex to manipulate is to know in what ways these antipatterns can manifest themselves in order to find appropriate solutions when designing schemas.
What is an antipattern?
An anti-pattern is a common response to a recurring problem that is often both inefficient and counterproductive, but is the simplest and most logical way to solve the problem.
Made to fit
One of the most common anti-patterns is the complete delegation of data typing to different layers of applications. Sometimes it is easy to leave out a more precise definition of data types at the database layer because in the OOP paradigm that definition is already done at our application layer. This can cause corner cases, bugs not detected in time or errors in the application logic to have an immediate and very negative impact on data quality. This anti-pattern is a major problem in terms of how quickly data can be corrupted and how costly it can be to fix in large, fast-growing live systems.
The easiest way to solve this is to establish robust database schemas from the outset, using the following rules:
1. Use robust data typing
Database engines self-optimise based on the types of data stored in their tables. The more specific you are, the faster the database will be and the less storage it will take up. In today’s technological environment, where storage in different cloud providers is the norm, this always allows you to reduce costs and better adjust existing resources.
2. Precisely define which fields will not admit null values with the constraint NOT NULL
In line with the previous point, this measure helps to optimise the performance of the database, to better define the form that data can take and to better define the relationships between entities. Additionally, it serves as a last barrier to prevent data that is necessary for the business from not being present in the database tables.
3. Take full advantage of the capabilities of your database management system
Database systems incorporate specific functions and tools for working with relational data, so in the vast majority of cases they will prove to be the best prepared, most efficient and fastest systems for, for example, ensuring transactional data writing.
Another common anti-pattern when creating tables and columns occurs when object modelling is moved in parallel to database entity modelling and table construction. In the field of identity verification where different identity documents are used, it is common that these documents do not have the same number of fields or store the same types of data. One of the most common solutions is to store the data of these documents in tables in the following way, where in a key column the unique identifier of the document is stored, in an attribute column the name of the document field is stored, and finally in a value column with a non-specific typing (very large generic data types such as TEXT where the information is stored in plain text form) the value of the field referenced in attribute is stored.
While this may seem like a simple solution to a complex problem, because it allows any type of data to be stored for a field in a document and allows document fields to be optional and only present for documents where they are binding, there are several problems with this practice. Firstly, all referential integrity is lost in the entity, which is the property of databases that allow relationships to be established with other entities and which ensures that records in related tables are valid and are not deleted or modified in error. With this solution, it is not possible to establish secure and protected relationships between stored documents and other stored data, such as users or even documents of other types. At the same time, and as we have indicated above, an unspecific typing of our database can lead to performance problems, and in this particular case the editing of the stored data is impaired.
There are several solutions to this problem, and the first is to combine our SQL database with a NoSQL system that provides the flexibility needed in these cases. If you want to store this type of data in a relational database to take advantage of the functionalities that these offer, you can always opt to use a semi-structured data storage strategy, where the fields of the documents that you want to store use columns with specific typing and adjusted to the size of what will be stored in them, and in which optional fields or those present only in some documents will be stored in a column of a semi-structured data type such as XML or JSON, which are found in all modern database management systems. This solution brings the flexibility of using a NoSQL database with the advantages of keeping our data interrelated.
Another possible solution, if you want to keep the data stored in a relational database, is to adopt a design pattern called Class Inheritance when constructing the tables, in which the concepts of object inheritance are applied to the modelling of the entities. With this pattern, what is achieved is a master table where the data common to all documents is stored and as many related tables as there are different types of documents. In this way it is easy to control the existing document subtypes while maintaining a stricter typing on the columns that store the document fields.
In short, there are a number of table and column creation anti-patterns that can cause identity verification systems to not take full advantage of the database management systems they use. Knowing about them and identifying them before they occur is the best possible strategy for making stored data easily accessible, complete and usable.