Data

Introduction

All data persisted through topics are stored in the Quix Catalogue as streams.

The Quix Catalouge

The catalogue is a unified data store consisting of time-series and document database technologies merged to work in perfect harmony. Each workspace consists of individual instances of each database technology which have been tightly integrated with our own technologies to provide a simple yet powerful solution.

We have completely abstracted away the complexity of building, scaling and working with different database technologies so that you no longer have to think about buckets and blob stores and data locations, instead, you simply decide whether you want the data stored, and define the location - we do the rest.

Streams

Streams are the central concept of the catalogue. They unify time-series and metadata into a single object that groups all relevant information into one discreet session.

Streams make it very easy to manage, discover and work with your data; they are key to architecting excellent data governance in your organisation.

Features

The catalogue is incredibly powerful, fast and flexible. The features help organisation build rigour into their data practices.

Indexing

We have implemented fast and efficient indexing architecture that spans both the time-series and document database technologies. Our indexing delivers rapid data retrieval so that you can:

  • Quickly navigate the catalogue to find relevant datasets.
  • Perform big data and machine learning tasks.
  • Build responsive applications using our Query API.

Again, we have taken care of the tech so you can focus on your application.

Rigorous Data Management

The catalogue and streams provide a powerful way to manage your data.

Each row in the catalogue is one stream identified by it’s name and ordered by the date created column (newest on top) by default.

Each column in the catalogue is key item of metadata which can be used to order and search streams.

Each stream is grouped and ordered in the catalogue using it’s metadata. Use our SDK’s to define data grouping and a data hierarchy that suits your needs.

Data Grouping

Data is grouped in the catalogue by stream, location, topic and metadata.

Grouping by stream: A stream is used to group time-series data (parameters) together with the events and metadata which give those parameters and streams context. Each stream is one row in the catalogue with the most recent stream on top by default.

Grouping by location: The catalogue includes a navigation pane in which you can quickly find streams by navigating through the data hierarchy to find the data location.

Grouping by topic: Streams are automatically grouped by topic as another means to quickly find what you are looking for. Searching for streams by topic will return only those streams that have some data parameters contained in that topic.

Grouping by metadata: Streams are also grouped in the catalogue by some of their metadata, including: stream name, stream start and end, stream status, stream topic and stream creation date/time. Each item of metadata is one column in the catalogue which can be ordered in the UX.

Data Hierarchy

The data hierarchy allows you to define the location of your streams within the catalogue. For some applications the location may be tied to the physical location of the data source, for others the content could be product related; for example:

  • Racing teams may want to organise their data into a hierarchy based on the location of their races such as Race Series > Season > Country > Circuit.
  • App developers may want to organise their data based on the type of device used, such as: App Name > Version > Platform; or the physical location of their users, such as: App Name > Region > Country > Town.
  • A MedTech company might want to organise IoT data from wearables by patient, pathology or anatomy.

The data hierarchy is an extremely flexible feature which can be tailored to your needs using Locations Properties in our SDK.

Data Governance

Streams are key to good data governance. Use them to organise your data in the catalogue by:

  • Creating a data hierarchy to group incoming data by session, location or other feature.
  • Logging separate or continuous sessions depending upon the use case. 

Flexibility

A stream is very flexible:

  • It can be never-ending, such as a stream of data from a power station, or
  • It can begin and end with with the start and finish of a session, such as a football match, or
  • It can be a continuous stream of batches, such as daily stock market prices concatenated at the daily market open/close.

The catalogue is also very flexible. You define how to set-up the management of your data according to the needs of your organisation, project or product by customising the data hierarchy and metadata using our SDK.

Data Discovery

It is very easy for any user to find data in the catalogue using our UX and navigating by location or topic, or by searching and filtering by column.

All metadata for a stream is appended to it in the streams table and can be quickly accessed by clicking on the ‘open’ arrows.

Finally, any data in the catalogue can be quickly visualised in Quix or external applications such as PowerBI or Grafana to further improve discovery.

Working with data

SDK’s

Most of the work involved in setting up your catalogue is done programatically using our SDK. See our samples for a range of example use cases.

Stream Status

Each stream is automatically tagged as Open when first received and will remain so until a stream end is received.

Open: The stream is live and data is being persisted.

Closed/Aborted/Terminated: The stream is now historic and no additional data is to be expected. It is up to the sender to determine which stream end should be used. Quix currently does not make distiction between them. An example use case of 'Terminated' could be sender shutting down before sending all stream data.

Interrupted: This status is only available when a stream is being persisted. When an Open stream is inactive for over 10 minutes this state is set for the stream. While this state is active no new data is being persisted for the stream. Once new data is read for the stream, it will move back to Open state. This state is also being used when there is an interruption in the persisting service.

A billable resource

Data storage is a billable resource. We charge by the Gigabyte stored (persisted) to the catalogue.

There are many use cases where you may want to reduce the amount of data persisted such as when pre-processing or downsampling data. In such circumstances we strongly suggest creating topics to manage persistence of the raw and processed data.

How-to

Click on Locations in the Data sub-menu; use the left hand context pane to click down through the data tree to find the streams you are looking for.

Click on Topics in the Data sub-menu; use the left hand context pane to select the topics you want to filter by.

Order data in the catalogue

Click any of the column headings to filter by that category.

Search for a stream

Use the search field at the top right hand side of the Data page. Search currently filters stream by name only.

View metadata

Click the double arrows in the second column of the data table to view the metadata for each stream.

Select streams

Click one or more check boxes in the first column of the data table.

Delete streams

Once you have selected a stream, click the delete button. Note that this action is permanent.

Visualise data

Once you have selected a stream, click the visualise button. You will be taken to the Visualise page where you will have to select parameters from that stream to see data.