Intro to InfluxDB
InfluxDB isn't just another database in the vast sea of data management systems. It stands out as a specialized time series database, designed for speed and efficiency in handling time-stamped data. Time series data is everywhere, but two large sources are found in the physical and virtual world. Anytime our environment is being monitored, we’re capturing time series data from the physical world. This includes all IoT data and sensor data (like temperature, pressure, concentration, flow rate, light, etc.). In the virtual world we see time series data in DevOps monitoring, server performance monitoring, network monitoring, and much more.
InfluxDB is engineered precisely for the purpose of storing real-time time series data. In this blog post, we'll embark on a journey to explore InfluxDB - from its core features to why it's becoming a go-to choice for handling time series data. Whether you're a seasoned pro or just starting, there's something here for everyone. In this post, we’ll look at a variety of projects that use InfluxDB, so that you can get a better understanding of what you can do with InfluxDB.
Let's dive in and unlock the potential of InfluxDB together!
Introduction InfluxDB 3.0
InfluxDB is more than just a source time series database. There are also client libraries which enable developers to easily integrate InfluxDB into their applications. Telegraf is an open source collection agent for application metrics and events. InfluxDB 3.0 also offers interoperability with a lot of data analytics and visualization tools so that you execute your analytics workload with the tools that you’re already familiar with.
The new InfluxDB engine is built with Rust, Apache Arrow (and Arrow Flight), DataFusion, and Parquet. This technology stack makes InfluxDB 3.0 an ideal choice for storing time series data because it offers increased resource management, high performance, and interoperability. Let’s take a second to understand how each piece contributes benefits to InfluxDB:
- Rust is a programming language that is very performant and offers fine grain memory management. This allows users to have more operator control over memory usage (for certain versions of InfluxDB v3).
- Apache Arrow is a framework for defining in-memory columnar data. The columnar data representation enables really fast compression, which enables InfluxDB users to write over 4 million values per second.
- Parquet is a column-oriented durable file format. Parquet files are 16 times cheaper to store than CSV files. Additionally
- Arrow Flight is a “new general-purpose client-server framework to simplify high performance transport of large datasets over network interfaces.” Arrow and Arrow Flight enable interoperability with other tools that leverage those technologies as well including: Pandas, Kafka, Snowflake, Spark, Clickhouse, and more.
- DataFusion is an “extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.” DataFusion lets InfluxDB users query with SQL (and eventually maybe Python), so you don’t have to worry about learning a new query language.
Get started using InfluxDB 3.0
The easiest way to get started using InfluxDB 3.0 is by signing up for a free InfluxDB Cloud Serverless trial. Then I recommend loading some sample data with any of these approaches:
- Install Telegraf. Then navigate to the InfluxDB UI and follow this short video to configure it through the InfluxDB UI. I recommend configuring a CPU input plugin so you get some system stats.
- Write the Air Sensors Sample dataset through the UI.
- Navigate to the Buckets Page.
- Create a new bucket “airSensors”.
- Click the +Add Data button with the Line Protocol option
- Navigate to this page and copy and paste the line protocol data in the Enter Manually box and write it to your bucket.
- Write data with the Python Client Library. Follow this tutorial. Or essentially, install with pip install infludbv3-python. Then use the following code:
Understanding InfluxDB through example Projects
Perhaps the easiest way to understand something is through examples. The easiest way to to understand some of the capabilities of InfluxDB v3 is through the following collection of example projects (all of these projects are containerized, so its a really easy way to get started with InfluxDB 3.0):
- OpenTelemetry Demo: This repository contains the OpenTelemetry Astronomy Shop, a microservice-based distributed system intended to illustrate the implementation of OpenTelemetry in a near real-world environment.
- Quix Machine Anomaly Detection: This project provides an example of how to use Quix and InfluxDB 3.0 to build a machine anomaly detection data pipeline.
- Mage Anomaly Detection: This example tutorial shows you how to build an anomaly detection pipeline for machine data with Mage.ai, the open source alternative to Airflow, and InfluxDB. Generate machine data and anomalies in real time and send alerts to Slack.
- Downsampler: A project to make it easy to downsample data or create materialized views with InfluxDB 3.0. It handles running a downsampling task of up to 1 minute resolution, but can handle hour and day intervals as well. All containerized so you can easily integrate with something like Fargate.
- MQTT simulator: This project allows you to spin up different MQTT simulators, producing fake IoT data for different scenarios.
Final thoughts
I hope this post helps familiarize you with InfluxDB. Get started with InfluxDB Cloud 3.0 here by signing up for a free InfluxDB Cloud Serverless trial. I also recommend the following resources:
- Forums
- Slack
- InfluxCommunity GitHub Org: this org contains a ton of examples for how to use InfluxDB with other tools and for different use cases.
- Docs
- Blog
like this