Elemental Versioning Scheme
In this article we discuss the architecture behind versioned metadata in Tator. In Tator, metadata includes the localizations and states associated with video and imagery. The intended audience of this article are people with understanding of data science, databases, and those who wish to learn how to setup and enable user workflows in Tator.
Versioning in computing is a complex topic that has had multiple approaches across both computer and data science. Tools such as git, SVN, and IBM's Clearcase have unique concepts and a history for successful management of computer source code and other data. Tator has built on these concepts to provide a streamlined versioning system specific to video and image metadata.
Key Concepts
- Version : Versions are used to describe different interpretations of the same object. A Version in Tator is the account of an object from a particular point of view. An object belongs to a Version not the other way around. As an analogy imagine a movie developed about a book. Both the movie and the book represent a Version. Characters may exist on one, both, or neither of the versions. Versions can derive off other versions or version off the Baseline. Other tools use the term branch for this concept.
- Baseline : The Baseline is the default version within a project. It does not derive off of any other versions. Other tools such as SVN use trunk for this concept; git used a term from the recording industry, master copy, and later transitioned to main.
- Variant : A specific reference to an object that exists on a version is referred to as a variant.
- Elemental Id : Variants of the same object that exist across multiple versions share the same elemental id. This is used to associate them across versions or even Tator deployments.
- Mark: Denotes the vertical dimension of versioning within Tator. On the same version, edits of the same object make new marks of that object. Some systems may use terms such as revision number for this concept.
- View : A term used to define the selection rules and ultimately the result set for an operation. The view settings are effectively the arguments to
get_localization_list
,get_state_list
or the respectiveGET
requests in theREST
API. If a Version derives off of another, a merge view can be used to view the set of objects that are present in both versions as one result set. As an example, in a merge view, if the defined movie derives off the book, when inspecting the world of the movie if a character is not redefined explicitly its definition from the book is considered valid. - Delete : Deleting a variant ends its life on a given version. Short of a restoration action, when looking at the given version or derived versions, the object is no longer present unless a special parameter is used to show deleted objects.
- Prune : Pruning an object completely eliminates all existence of a variant from its version moving forward. Restoration is not possible short of a database restoration operation. Pruning metadata should be done judiciously.
Version inheritance
Version inheritance is the architectural tool to enable advanced user workflows. Put another way inheritance is when a Version derives off of another. A GET
request to the REST
service, either directly or through tator-py
creates a view of the data. Arguments to the request define how to filter and display versioned data.
In a merge view, the default in the UI and via REST
, if Version B
derives off of Version A
result sets can be thought of similar to a SQL COALESCE
operation. The psuedo-code COALESCE(objB,objA)
is applied on a per object basis such that at most only 1 variant is displayed or returned per unique elemental id.
If the merge parameter is set to 0 in the REST
API, multiple variants of of the same elemental id can be returned.
If the show_deleted parameter is also set to 1, deleted objects are returned as part of the result set. The variant_deleted
parameter of an object can be inspected to determine whether the object should be used for analysis.
If merge is 0 and show_deleted is 1; this returns an unfiltered view of all variants across any selected Version. This view can be intuitive for Version introspection of an object.
Given the example of Version B
and A
above the following table summarizes the various views possible:
Selected Versions | Merge | Show Deleted | Description |
---|---|---|---|
A and B | Yes | No | Default view in UI. Shows the merged result of Version B on top of Version A. If an object doesn't exist on Version B, the Version A object is shown. If the object is deleted on Version B, no object is shown or returned. |
A and B | No | Yes | This returns all variants present in Version A and B , irrespective of inheritance ordering or deletion status. This view can be useful for Version introspection. |
A or B | N/A | Yes | This only returns objects present on Version A or B based on which is selected. It includes objects that were deleted. In a return set, those objects will have variant_deleted set to True . |
A or B | N/A | False | This only returns objects present on the selected Version. It does includes objects that were deleted, in a return set, all objects will have variant_deleted set to False . |
Table 1: Tabular summary of view selection options
Figure 1: In a merge operation row-by-row a flow chart is executed to determine the correct object in a case where the view defined merges Version B over Version A.
Understanding mark codes
Mark codes are the revision number of a given object, on a given version. In document preparation it can be common to use revision codes to understand the history of a released document. One might imagine "Instructions for Flawless Espresso, Edition 2" as a follow-up to "Instructions for a Flawless Espresso".
Making a different version of an object resets the mark code to 0, for that version. Marks are automatically created when objects are PATCH
ed via the REST API
. Comparing the values of an object between two subsequent marks represents the changeset.
build_instructions/
├── english
│ ├── mark0.txt
│ ├── mark1.txt
│ └── mark2.txt
└── spanish
└── mark0.txt
3 directories, 4 files
Code Block 2: A directory tree can be helpful to visualize the difference between versions and marks in practice. In the example are instructions in both english and spanish, representing two versions. The english translation has had 3 revisions.
Literary Example
Using the literary example from Table 1, we can further explore some examples of this versioning approach. An initial novel about Sherlock Holmes, now in the public domain, could be defined as a Version. Each adaptation of the character set would be in another defined Version derived off the original version by Arthur Doyle. In this example, the versioning system would be utilized the manage the possibilities of characters across the many adaptations and works involving Sherlock Holmes. One could imagine each object would contain attributes about the character, including backstory, name, and potentially actor if a screen portrayal.
Figure 2: The works of Sir Arthur Doyle can be analyzed in the Tator Versioning Scheme to serve as a visual example of available architectural constructs.
If a given character is not mentioned in a given Version and it is logical to assume the definitions provided by the source material, this is equivalent to the default behavior in Tator with a merge
operation set to 1.
Operations get more exciting based on the amount of works involving Sherlock Holmes. At least one version of the Sherlock Holmes story doesn't include a Dr. Watson. In this Version the Dr. Watson is deleted. In our system variant_deleted
would be True
. He wouldn't appear in any fetches about characters in this Version unless show deleted
was enabled. If the Dr. Watson variant was pruned from this Version subsequent merge accesses would fetch information about Dr. Watson from the derived Version.
In other literary works, the Dr. Watson character, whose human readable name serves as an effective elemental_id
, would be almost unrecognizable without it. Querying all Versions in our system for its Dr. Watson would result in a listing of all the variants that exist across Sherlock Holmes stories, ranging from a robot in Sherlock Holmes in the 22nd Century to a more traditional portrayal by Martin Freeman in BBC's Sherlock to Lucy Liu's version in Elementary.
Action Description
The following sections describe common actions and the best practices around accomplishing them.
Creating a new version
A version can be created via the admin UI or tator-py
. When creating a version one can select the bases of the version, this informs the version definition what if any versions this version derives off of.
Code Examples
If the definition of the version is not known, it is important to acquire any bases prior to querying against the metadata. The knowledge of the version, specifically its base braches, is critical to generating the correct access query using the logic table presented in Table 1. Examples in python follow:
# Task: Given version 3, which derives off of an unknown version,
# fetch all the localizations of type 7 from the media id of 20 from project 4
version_spec = api.get_version(3)
required_versions = [3,*version_spec.bases]
# This matches row 1 of Table 1 above.
localizations = api.get_localization_list(4, media_id=[20], type=7, version=[required_versions], merge=1)
# A different operation yields what is different about version 3 from its base.
# This matches row 3 in Table 1.
difference = api.get_localization_list(4, media_id=[20], type=7, version=[3], show_deleted=1)
# To perform server side filtering of deleted objects, the same can be executed
difference = api.get_localization_list(4, media_id=[20], type=7, version=[3], show_deleted=0)
# A further excursion, shows the utility of row 2. Fetching all the variants of a given object
# Example to fetch the elemental_id of an object of interest
example_elemental_id = difference[0].elemental_id
# All variants of the object across versions, including deletes, are acquired.
variants = api.get_localization_list(4,media_id=[20], type=7, elemental_id=example_elemental_id, show_deleted=1)
# Show all variants of the object across all versions, included deletes, and all marks (everything view)
variants = api.get_localization_list(4,media_id=[20], type=7, elemental_id=example_elemental_id, show_deleted=1, show_all_marks=1)
Code Block 3: Example of get_version usage prior to get_localization_list in tator-py.
Using versions like layers of a map
Another way to utilize versions is similar to layers of a map. In this case versions don't actually relate to another in any way. This method can be utilized to turn on or off things in analysis as versions can be selected or de-selected to change result sets by inclusion of additional versions. In this case the merge
parameter is a no-op between two unrelated versions.
Association across deployments with elemental_id
The elemental_id
field of metadata is modifiable by the user. With this utility comes responsibility to not inadvertently disassociate variants from one another. If one is managing an offline Tator deployment that periodically syncs with a cloud deployment of Tator, one could set the elemental_id
upon import. Because the elemental_id
would be consistent across deployments, it would allow analysis to be performed across multiple Tator deployments.