Introducing v1.0.0
We are excited to announce the latest update to our web-based software platform, Tator v1.0.0, which marks a significant milestone for the product. This release brings about changes at both the architectural layer and API level, providing a rock-solid foundation for future iterations of the platform. In addition to laying the groundwork for future features, Tator v1.0.0 brings a plethora of bug fixes, UI consistency improvements, and quality of life enhancements that we believe our users will appreciate.
One of the most noteworthy changes in this update is the removal of the Elasticsearch subsystem, which caused significant ripples throughout the API. Although Elasticsearch and PostgreSQL can complement each other, managing their integration can lead to challenges related to data consistency and maintenance. By utilizing structured metadata in PostgreSQL, we can achieve our search and analytics requirements without the need for Elasticsearch, reducing maintenance costs and improving scalability.
Additionally, removing the Elasticsearch subsystem from Tator v1.0.0 also extricated us from a tricky licensing situation. Elasticsearch is licensed under the Elastic License, which imposes certain limitations on the use and distribution of Elasticsearch-based software. By removing Elasticsearch from Tator, we no longer have to navigate these licensing restrictions and can offer a more flexible and open-source solution to our users.
A dynamic PSQL-based index manager
In Tator v1.0.0, we introduce a dynamic PSQL-based index manager to improve search performance for our users. We use JSONB fields to store semi-structured attribute information on Media, Localizations, Files, and States, which provides flexibility and customization for each project's attribute schema and types. However, this approach can lead to slow search and fetch performance, as the default DBMS settings treat JSONB columns as unindexed flexible data.
To address this, we have created a custom PostgreSQL index manager integrated with our Django application. Our datatype definitions are scoped to each project, allowing us to create project-scoped type-scoped indices for faster searches of attribute values. This index creation process happens automatically when attribute types are added or removed from a LocalizationType or MediaType, ensuring consistency and optimal performance. During this index creation process, access to records may be slower, but always consistent.
New object search capability
The UI was updated to make use of the new object search capability in the project, analytics, and annotator views
In removing Elasticsearch, we needed to replace the legacy Lucene search functionality previously being used. To do this, we introduced an object search capability. With the introduction of the encoded_search
and encoded_related_search
parameters, users can now make complex queries using AttributeOperationSpec
-defined JSON objects.
The encoded_search
parameter is a base64-encoded string representing the JSON object defined by AttributeOperationSpec
in the Schema. This parameter allows users to query Media based on attribute values. Similarly, the encoded_related_search
parameter can be used to search for Media that has metadata matching the search terms. Together, these parameters provide a highly flexible and customizable search capability for Tator users. Examples of the object is below:
Basic Example
# Matches Freddie Mercury or Fred Thompson
{'attribute': 'name', 'operation': 'icontains', 'value': 'Fred'}
Utilizing combinatorial logic
# Matches "Freddie Mercury or "Fred Thompson" or "Robert Redford". SQL Looks like "name LIKE Fred% OR name LIKE Robert%"
{'method':'OR', 'operations': [{'attribute': 'name', 'operation': 'icontains', 'value': 'Fred' },{'attribute': 'name', 'operation': 'icontains', 'value': 'Robert' }]}
Utilizing nested combinatorial logic
# Matches "Freddie Mercury or "Fred Thompson" or "Robert Redford". SQL Looks like "age >= 30 AND (name LIKE Fred% OR name LIKE Robert%)"
{'method': 'AND', 'operations': [{'method':'OR', 'operations': [{'attribute': 'name', 'operation': 'icontains', 'value': 'Fred' },{'attribute': 'name', 'operation': 'icontains', 'value': 'Robert' }]}, {'attribute': 'age', 'operation': 'gte', 'value': 30 }]}
Looking forward
In conclusion, by relying solely on PostgreSQL, Tator now has a more nimble database footprint, allowing for the delivery of new and exciting features in future releases. The roadmap includes the implementation of row-based permission and versioning capabilities, which would have been challenging to achieve in a hybrid database scheme. Additionally, versioned metadata with user isolation capabilities can now be implemented efficiently without compromising access performance. Stay tuned for more updates as we continue to innovate and enhance Tator's capabilities.