SlideShare a Scribd company logo
1 of 32
Paul Dix
@pauldix
paul@influxdata.com
InfluxDB IOx Project
Update
© 2020 InfluxData. All rights reserved.2
Foundational Work
• Backward compatibility & integration (Flux & InfluxQL)
• In-memory columnar store (Segment Store)
• Data lifecycle (WAL, Parquet)
• Apache Arrow Flight (RPC)
© 2020 InfluxData. All rights reserved.3
Integration
InfluxDB 2.x Flux InfluxQL
InfluxDB IOx
GRPC
© 2020 InfluxData. All rights reserved.4
How data is organized
mydata
2020-12-07 2020-12-08 2020-12-09 Partitions
1 2 3211 2 Chunks
Database
© 2020 InfluxData. All rights reserved.5
Data lifecycle
WAL Buffer Write Buffer
WAL Segment
(object store)
Chunks
Segment Store
(in memory)
Parquet Files
(object store)
Ingest
Writable
Immutable
© 2020 InfluxData. All rights reserved.6
When can you use
it?
Thank You
© 2020 InfluxData. All rights reserved.8
A Rusty Introduction to Apache
Arrow and how it Applies to a
Time Series Database
December 9, 2020
Andrew Lamb
InfluxData
© 2020 InfluxData. All rights reserved.9
IOx Team at InfluxData
Query Optimizer / Architect @ Vertica
(Columnar Database),
Chief Architect @ DataRobot (Machine
Learning Platform )
Chief Architect @ Nutonian (Machine
Learning Apps
XLST JIT Compiler Team at
DataPower
© 2020 InfluxData. All rights reserved.10
Goals + Outline
Goal: ⇒ Arrow is a good basis for a new (time series) Databases ❤️
● Opinions and Perspectives of Databases
● Background on Arrow
● Arrow Examples, in Rust
© 2020 InfluxData. All rights reserved.11
Databases -- Trend Towards Specialization
Relational
Key-Value
Timeseries
Graph
Array / Scientific
Document
Stream
Michael Stonebraker and Ugur Cetintemel. 2005. "One Size Fits All": An Idea Whose Time Has Come and Gone. In Proceedings of the 21st
International Conference on Data Engineering (ICDE '05). IEEE Computer Society, USA, 2–11. DOI:https://doi.org/10.1109/ICDE.2005.1
Data Model Deployment
Embedded /
Edge
Cloud
Single-Node
Hybrid
Ecosystem
Hadoop
Java
Json /
Javascript
AWS
GCP
Azure
Apple Cloud
Use Case
Transactions
Analytics
Streaming
...
© 2020 InfluxData. All rights reserved.12
… and our new database is …
🎉
InfluxDB IOx - The Future Core of InfluxDB Built
with Rust and Arrow
© 2020 InfluxData. All rights reserved.13
Analytic Systems (vs Transactional)
● Transactional (OLTP, Key-value stores, etc)
○ Workload is “lookup a record by id”, “update a record”, “keep data
durable and consistent”
○ Examples: Oracle, Postgres, Cassandra, DynamoDB, MongoDB, etc etc
● Analytic (OLAP, “Big Data”, etc)
○ Workload: aggregate many rows to get historical view, bulk loads, rarely
updated
○ Examples: ClickHouse, MapReduce, Spark, Vertica, Pig, Hive, InfluxDB,
etc etc
© 2020 InfluxData. All rights reserved.14
So, you want to build a new database… ?
Databases need many features just to look like a database:
● Get Data In and Out
● Store Data and Catalog / Metadata
● Query Store: + Query Language
● Connect: Client API
…
Before you can invest in what makes your database special
© 2020 InfluxData. All rights reserved.15
Implementation timeline for a new Database system
Client
API
In memory
storage
In-Memory
filter + aggregation
Durability /
persistence
Metadata Catalog +
Management
Query
Language
Parser
Optimized /
Compressed
storage
Execution on
Compressed
Data
Joins!
Additional Client
Languages
Outer
Joins
Subquery
support
More advanced
analytics
Cost
based
optimizer
Out of core
algorithms
Storage
Rearrangement
Heuristic
Query
Planner
Arithmetic
expressions
Date / time
Expressions
Concurrency
Control
Data Model /
Type System
Distributed query
execution
Resource
Management
“Lets Build
a
Database”
🤔
“Ok now
this is pretty
good”
😐
“Look mom!
I have a
database!”
😃
Online
recovery
© 2020 InfluxData. All rights reserved.16
Arrow Project Goals
“Build a better open source
foundation for data science”
How is this related to
databases?
https://arrow.apache.org/
© 2020 InfluxData. All rights reserved.17
Arrow == toolkit for a modern analytic databases
match tool_needed {
File Format (persistence) => Parquet
Columnar memory representation => Arrow Arrays
Operations (e.g. add, multiply) => Compute Kernels
Network transfer => Arrow Flight IPC
_ => ... to be continued ...
}
© 2020 InfluxData. All rights reserved.18
InfluxDB line protocol
weather,location=us-east temperature=82,humidity=67 1465839830100400200
weather,location=us-midwest temperature=82,humidity=65 1465839830100400200
weather,location=us-west temperature=70,humidity=54 1465839830100400200
weather,location=us-east temperature=83,humidity=69 1465839830200400200
weather,location=us-midwest temperature=87,humidity=78 1465839830200400200
weather,location=us-west temperature=72,humidity=56 1465839830200400200
weather,location=us-east temperature=84,humidity=67 1465839830300400200
weather,location=us-midwest temperature=90,humidity=82 1465839830400400200
weather,location=us-west temperature=71,humidity=57 1465839830400400200
Line Protocol Tutorial (link)
Measurements
Tags Fields
Timestamp
© 2020 InfluxData. All rights reserved.19
IOx Data Model
weather,location=us-east temperature=82,humidity=67 1465839830100400200
weather,location=us-midwest temperature=82,humidity=65 1465839830100400200
weather,location=us-west temperature=70,humidity=54 1465839830100400200
weather,location=us-east temperature=83,humidity=69 1465839830200400200
weather,location=us-midwest temperature=87,humidity=78 1465839830200400200
weather,location=us-west temperature=72,humidity=56 1465839830200400200
weather,location=us-east temperature=84,humidity=67 1465839830300400200
weather,location=us-midwest temperature=90,humidity=82 1465839830400400200
weather,location=us-west temperature=71,humidity=57 1465839830400400200
location
"us-east"
"us-midwest"
"us-west"
"us-east"
"us-midwest"
"us-west"
"us-east"
"us-midwest"
"us-west"
temperature
82
82
70
83
87
72
84
90
71
humidity
67
65
54
69
78
56
67
82
57
timestamp
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.3004002Z
2016-06-13T17:43:50.3004002Z
2016-06-13T17:43:50.3004002Z
Table: weather
© 2020 InfluxData. All rights reserved.20
Code Examples
Thesis: “When writing an analytic database, you will end up
implementing the Arrow feature set”
(Ecosystem integration is another major benefit of Arrow, subject of a future talk)
+
* Take performance comparisons with a large grain of salt
Compare Plain Rust and Rust using the Arrow library
© 2020 InfluxData. All rights reserved.21
Motivating Example
“Find the rows that are not in `us-west`”
© 2020 InfluxData. All rights reserved.22
Create the Array
let string_vec: Vec<String> =
(0..NUM_TAGS)
.map(|i| {
match i % 3 {
0 => "us-east",
1 => "us-midwest",
2 => "us-west",
}.into()
})
.collect();
let mut builder =
StringBuilder::new(NUM_TAGS);
(0..NUM_TAGS).enumerate()
.for_each(|(i, _)| {
let location = match i % 3 {
0 => "us-east",
1 => "us-midwest",
2 => "us-west",
};
builder.append_value(location)
.unwrap()
});
let array = builder.finish();
> created array with 10000000 elements
~600ms
> created array with 10000000 elements
~400ms
+
© 2020 InfluxData. All rights reserved.23
Memory Footprint
let size =
size_of::<Vec<String>>() +
string_vec
.iter()
.fold(0, |sz, s| {
sz + size_of::<String>() + s.len()
});
println!("total size: {} bytes", size);
println!("total size: {} bytes",
array.get_array_memory_size());
> total size: 320000023 bytes
~320 MB *
> total size: 149206128 bytes
~150 MB
+
© 2020 InfluxData. All rights reserved.24
Find Rows != “us-west”
let not_west_bitset: Vec<bool> =
string_vec
.iter()
.map(|s| s != "us-west")
.collect();
let num_not_west = not_west_bitset
.iter()
.filter(|&&v| v)
.count();
let not_west_bitset =
neq_utf8_scalar(
&array,
"us-west"
).unwrap();
let num_not_west = not_west_bitset
.iter()
.filter(|v| matches!(v, Some(true)))
.count();
> Found 6666667 not in west
~50ms
> Found 6666667 not in west
~120ms
+
© 2020 InfluxData. All rights reserved.25
Find Rows != “us-west” (with null handling)
let string_vec: Vec<Option<String>> = ...;
let not_west_bitset: Vec<bool> =
string_vec
.iter()
.map(|s| {
s.as_ref()
.map(|s| s != "us-west")
.unwrap_or(false)
})
.collect();
let num_not_west = not_west_bitset
.iter()
.filter(|&&v| v)
.count();
+
Same as previous
> Found 6666667 not in west
~50ms
© 2020 InfluxData. All rights reserved.26
Materialize rows for future processing
let not_west: Vec<String> = not_west_bitset
.iter()
.enumerate()
.filter_map(|(i, &v)| {
if v {
Some(string_vec[i].clone())
} else {
None
}
})
.collect();
let not_west = filter(
&array,
&not_west_bitset
).unwrap();
> Made array of 6666667 Strings not in west
~450 ms
> Made array of 6666667 Strings not in west
~50 ms
+
© 2020 InfluxData. All rights reserved.27
More efficient encoding (dictionary)
let vb = StringBuilder::new();
let kb = Int8Builder::new();
let mut builder =
StringDictionaryBuilder::new(vb,kb);
(0..NUM_TAGS)
.enumerate()
.for_each(|(i, _)| {
let location = match i % 3 {
0 => "us-east",
1 => "us-midwest",
2 => "us-west",
};
builder.append(location).unwrap();
});
let array = builder.finish();
> total size: 10000688 bytes
10MB
250 ms
+
dictionary
"us-east"
"us-midwest"
"us-west"
Location
0
1
2
0
1
2
0
1
2
[0]
[1]
[2]
[u8]
© 2020 InfluxData. All rights reserved.28
SIMD Anyone?
let output = gt(
&left,
&right
).unwrap();
+10
20
17
5
23
5
9
12
4
5
76
2
3
5
2
33
2
1
6
7
8
2
7
2
5
6
7
8
left right output
1
0
1
1
1
0
1
1
0
1
1
0
0
0
>
>
>
>
© 2020 InfluxData. All rights reserved.29
SIMD Implementation
#[cfg(all(any(target_arch = "x86", target_arch = "x86_64"),
feature = "simd"))]
fn simd_compare_op<T, F>(left: &PrimitiveArray<T>,
right: &PrimitiveArray<T>, op: F) -> Result<BooleanArray>
where
T: ArrowNumericType,
F: Fn(T::Simd, T::Simd) -> T::SimdMask,
{
// use / error checking elided
let null_bit_buffer = combine_option_bitmap(
left.data_ref(), right.data_ref(), len
)?;
let lanes = T::lanes();
let mut result = MutableBuffer::new(
left.len() * mem::size_of::<bool>()
);
let rem = len % lanes;
for i in (0..len - rem).step_by(lanes) {
let simd_left = T::load(left.value_slice(i, lanes));
let simd_right = T::load(right.value_slice(i, lanes));
let simd_result = op(simd_left, simd_right);
T::bitmask(&simd_result, |b| {
result.write(b).unwrap();
});
}
Source: arrow/src/compute/kernels/comparison.rs
if rem > 0 {
let simd_left = T::load(left.value_slice(len - rem, lanes));
let simd_right = T::load(right.value_slice(len - rem, lanes));
let simd_result = op(simd_left, simd_right);
let rem_buffer_size = (rem as f32 / 8f32).ceil() as usize;
T::bitmask(&simd_result, |b| {
result.write(&b[0..rem_buffer_size]).unwrap();
});
}
let data = ArrayData::new(
DataType::Boolean,
left.len(),
None,
null_bit_buffer,
0,
vec![result.freeze()],
vec![],
);
Ok(PrimitiveArray::<BooleanType>::from(Arc::new(data)))
}
© 2020 InfluxData. All rights reserved.30
Other things needed in a database
Vec<Option<String>> to support nulls
Handle other data types with same code
Vectorized implementations of filter, aggregate, etc
Persist it to storage
Send data over the network
Ecosystem compatibility
...
© 2020 InfluxData. All rights reserved.31
Rust / Arrow Community: Good and Getting better
Major Roadmap Items (see also Apache Arrow (Rust) 2.0.0)
1. Support Stable Rust
2. Improved DictionaryArray support and performance
3. Improved compute kernel performance
4. SQL: Joins
5. Parallel CPU-bound operations; Additional platform support (e.g.
ARMv8)InfluxData specifically is investing in:
1. Flight IPC
2. Improved Dictionary and Date/Time support
3. Data Fusion (some other tech talk)
© 2020 InfluxData. All rights reserved.32
Thank You
Find us online
Github: https://github.com/influxdata/influxdb_iox
Slack: https://influxdata.com/slack
It is early days; there are many cool things left to implement
And we are hiring (Senior IOx Engineer Job Posting)

More Related Content

What's hot

Observability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System TablesObservability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System TablesInfluxData
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
 
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxData
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxDataInfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxData
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxDataInfluxData
 
IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...
IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...
IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...InfluxData
 
InfluxData Architecture for IoT | Noah Crowley | InfluxData
InfluxData Architecture for IoT | Noah Crowley | InfluxDataInfluxData Architecture for IoT | Noah Crowley | InfluxData
InfluxData Architecture for IoT | Noah Crowley | InfluxDataInfluxData
 
Setting up InfluxData for IoT
Setting up InfluxData for IoTSetting up InfluxData for IoT
Setting up InfluxData for IoTInfluxData
 
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...InfluxData
 
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...InfluxData
 
Introduction to Flux and Functional Data Scripting
Introduction to Flux and Functional Data ScriptingIntroduction to Flux and Functional Data Scripting
Introduction to Flux and Functional Data ScriptingInfluxData
 
Kapacitor Manager
Kapacitor ManagerKapacitor Manager
Kapacitor ManagerInfluxData
 
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...InfluxData
 
Introduction to Flux and Functional Data Scripting
Introduction to Flux and Functional Data ScriptingIntroduction to Flux and Functional Data Scripting
Introduction to Flux and Functional Data ScriptingInfluxData
 
Virtual training Intro to Kapacitor
Virtual training  Intro to Kapacitor Virtual training  Intro to Kapacitor
Virtual training Intro to Kapacitor InfluxData
 
How to Build a Telegraf Plugin by Noah Crowley
How to Build a Telegraf Plugin by Noah CrowleyHow to Build a Telegraf Plugin by Noah Crowley
How to Build a Telegraf Plugin by Noah CrowleyInfluxData
 
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...InfluxData
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...InfluxData
 
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...InfluxData
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...InfluxData
 
Kapacitor Stream Processing
Kapacitor Stream ProcessingKapacitor Stream Processing
Kapacitor Stream ProcessingInfluxData
 

What's hot (20)

Observability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System TablesObservability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System Tables
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxData
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxDataInfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxData
InfluxDB 101 - Concepts and Architecture | Michael DeSa | InfluxData
 
IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...
IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...
IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...
 
InfluxData Architecture for IoT | Noah Crowley | InfluxData
InfluxData Architecture for IoT | Noah Crowley | InfluxDataInfluxData Architecture for IoT | Noah Crowley | InfluxData
InfluxData Architecture for IoT | Noah Crowley | InfluxData
 
Setting up InfluxData for IoT
Setting up InfluxData for IoTSetting up InfluxData for IoT
Setting up InfluxData for IoT
 
Keep Calm and Distributed Tracing
Keep Calm and Distributed TracingKeep Calm and Distributed Tracing
Keep Calm and Distributed Tracing
 
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
How to Store and Visualize CAN Bus Telematic Data with InfluxDB Cloud and Gra...
 
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
 
Introduction to Flux and Functional Data Scripting
Introduction to Flux and Functional Data ScriptingIntroduction to Flux and Functional Data Scripting
Introduction to Flux and Functional Data Scripting
 
Kapacitor Manager
Kapacitor ManagerKapacitor Manager
Kapacitor Manager
 
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
Lessons Learned Running InfluxDB Cloud and Other Cloud Services at Scale by T...
 
Introduction to Flux and Functional Data Scripting
Introduction to Flux and Functional Data ScriptingIntroduction to Flux and Functional Data Scripting
Introduction to Flux and Functional Data Scripting
 
Virtual training Intro to Kapacitor
Virtual training  Intro to Kapacitor Virtual training  Intro to Kapacitor
Virtual training Intro to Kapacitor
 
How to Build a Telegraf Plugin by Noah Crowley
How to Build a Telegraf Plugin by Noah CrowleyHow to Build a Telegraf Plugin by Noah Crowley
How to Build a Telegraf Plugin by Noah Crowley
 
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
Alan Pope, Sebastian Spaink [InfluxData] | Data Collection 101 | InfluxDays N...
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
 
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
 
Kapacitor Stream Processing
Kapacitor Stream ProcessingKapacitor Stream Processing
Kapacitor Stream Processing
 

Similar to InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it Applies to a Time Series Database

Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...InfluxData
 
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...InfluxData
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...DataStax Academy
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
Taming the Tiger: Tips and Tricks for Using Telegraf
Taming the Tiger: Tips and Tricks for Using TelegrafTaming the Tiger: Tips and Tricks for Using Telegraf
Taming the Tiger: Tips and Tricks for Using TelegrafInfluxData
 
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...InfluxData
 
MySQL Goes to 8! FOSDEM 2020 Database Track, January 2nd, 2020
MySQL Goes to 8!  FOSDEM 2020 Database Track, January 2nd, 2020MySQL Goes to 8!  FOSDEM 2020 Database Track, January 2nd, 2020
MySQL Goes to 8! FOSDEM 2020 Database Track, January 2nd, 2020Geir Høydalsvik
 
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenGrokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenHuy Nguyen
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5SAP Concur
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseVictoriaMetrics
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Altinity Ltd
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Daniel Lemire
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceSachin Aggarwal
 
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...The Linux Foundation
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureDataStax Academy
 
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Future of Data Meetup
 

Similar to InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it Applies to a Time Series Database (20)

Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
 
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
 
Big Data Seervices in Danaos Use Case
Big Data Seervices in Danaos Use CaseBig Data Seervices in Danaos Use Case
Big Data Seervices in Danaos Use Case
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Taming the Tiger: Tips and Tricks for Using Telegraf
Taming the Tiger: Tips and Tricks for Using TelegrafTaming the Tiger: Tips and Tricks for Using Telegraf
Taming the Tiger: Tips and Tricks for Using Telegraf
 
Spark etl
Spark etlSpark etl
Spark etl
 
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
Martin Moucka [Red Hat] | How Red Hat Uses gNMI, Telegraf and InfluxDB to Gai...
 
MySQL Goes to 8! FOSDEM 2020 Database Track, January 2nd, 2020
MySQL Goes to 8!  FOSDEM 2020 Database Track, January 2nd, 2020MySQL Goes to 8!  FOSDEM 2020 Database Track, January 2nd, 2020
MySQL Goes to 8! FOSDEM 2020 Database Track, January 2nd, 2020
 
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenGrokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
 
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
 

More from InfluxData

Announcing InfluxDB Clustered
Announcing InfluxDB ClusteredAnnouncing InfluxDB Clustered
Announcing InfluxDB ClusteredInfluxData
 
Best Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemBest Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemInfluxData
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...InfluxData
 
Power Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBPower Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBInfluxData
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base InfluxData
 
Build an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackBuild an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackInfluxData
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustMeet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustInfluxData
 
Introducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedIntroducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedInfluxData
 
Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB InfluxData
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...InfluxData
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...InfluxData
 
Introducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineIntroducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineInfluxData
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena InfluxData
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineInfluxData
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBStreamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBInfluxData
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...InfluxData
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022InfluxData
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022InfluxData
 

More from InfluxData (20)

Announcing InfluxDB Clustered
Announcing InfluxDB ClusteredAnnouncing InfluxDB Clustered
Announcing InfluxDB Clustered
 
Best Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemBest Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow Ecosystem
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
 
Power Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBPower Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDB
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
 
Build an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackBuild an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING Stack
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustMeet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using Rust
 
Introducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedIntroducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud Dedicated
 
Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
 
Introducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineIntroducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage Engine
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage Engine
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBStreamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
 

Recently uploaded

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Recently uploaded (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it Applies to a Time Series Database

  • 2. © 2020 InfluxData. All rights reserved.2 Foundational Work • Backward compatibility & integration (Flux & InfluxQL) • In-memory columnar store (Segment Store) • Data lifecycle (WAL, Parquet) • Apache Arrow Flight (RPC)
  • 3. © 2020 InfluxData. All rights reserved.3 Integration InfluxDB 2.x Flux InfluxQL InfluxDB IOx GRPC
  • 4. © 2020 InfluxData. All rights reserved.4 How data is organized mydata 2020-12-07 2020-12-08 2020-12-09 Partitions 1 2 3211 2 Chunks Database
  • 5. © 2020 InfluxData. All rights reserved.5 Data lifecycle WAL Buffer Write Buffer WAL Segment (object store) Chunks Segment Store (in memory) Parquet Files (object store) Ingest Writable Immutable
  • 6. © 2020 InfluxData. All rights reserved.6 When can you use it?
  • 8. © 2020 InfluxData. All rights reserved.8 A Rusty Introduction to Apache Arrow and how it Applies to a Time Series Database December 9, 2020 Andrew Lamb InfluxData
  • 9. © 2020 InfluxData. All rights reserved.9 IOx Team at InfluxData Query Optimizer / Architect @ Vertica (Columnar Database), Chief Architect @ DataRobot (Machine Learning Platform ) Chief Architect @ Nutonian (Machine Learning Apps XLST JIT Compiler Team at DataPower
  • 10. © 2020 InfluxData. All rights reserved.10 Goals + Outline Goal: ⇒ Arrow is a good basis for a new (time series) Databases ❤️ ● Opinions and Perspectives of Databases ● Background on Arrow ● Arrow Examples, in Rust
  • 11. © 2020 InfluxData. All rights reserved.11 Databases -- Trend Towards Specialization Relational Key-Value Timeseries Graph Array / Scientific Document Stream Michael Stonebraker and Ugur Cetintemel. 2005. "One Size Fits All": An Idea Whose Time Has Come and Gone. In Proceedings of the 21st International Conference on Data Engineering (ICDE '05). IEEE Computer Society, USA, 2–11. DOI:https://doi.org/10.1109/ICDE.2005.1 Data Model Deployment Embedded / Edge Cloud Single-Node Hybrid Ecosystem Hadoop Java Json / Javascript AWS GCP Azure Apple Cloud Use Case Transactions Analytics Streaming ...
  • 12. © 2020 InfluxData. All rights reserved.12 … and our new database is … 🎉 InfluxDB IOx - The Future Core of InfluxDB Built with Rust and Arrow
  • 13. © 2020 InfluxData. All rights reserved.13 Analytic Systems (vs Transactional) ● Transactional (OLTP, Key-value stores, etc) ○ Workload is “lookup a record by id”, “update a record”, “keep data durable and consistent” ○ Examples: Oracle, Postgres, Cassandra, DynamoDB, MongoDB, etc etc ● Analytic (OLAP, “Big Data”, etc) ○ Workload: aggregate many rows to get historical view, bulk loads, rarely updated ○ Examples: ClickHouse, MapReduce, Spark, Vertica, Pig, Hive, InfluxDB, etc etc
  • 14. © 2020 InfluxData. All rights reserved.14 So, you want to build a new database… ? Databases need many features just to look like a database: ● Get Data In and Out ● Store Data and Catalog / Metadata ● Query Store: + Query Language ● Connect: Client API … Before you can invest in what makes your database special
  • 15. © 2020 InfluxData. All rights reserved.15 Implementation timeline for a new Database system Client API In memory storage In-Memory filter + aggregation Durability / persistence Metadata Catalog + Management Query Language Parser Optimized / Compressed storage Execution on Compressed Data Joins! Additional Client Languages Outer Joins Subquery support More advanced analytics Cost based optimizer Out of core algorithms Storage Rearrangement Heuristic Query Planner Arithmetic expressions Date / time Expressions Concurrency Control Data Model / Type System Distributed query execution Resource Management “Lets Build a Database” 🤔 “Ok now this is pretty good” 😐 “Look mom! I have a database!” 😃 Online recovery
  • 16. © 2020 InfluxData. All rights reserved.16 Arrow Project Goals “Build a better open source foundation for data science” How is this related to databases? https://arrow.apache.org/
  • 17. © 2020 InfluxData. All rights reserved.17 Arrow == toolkit for a modern analytic databases match tool_needed { File Format (persistence) => Parquet Columnar memory representation => Arrow Arrays Operations (e.g. add, multiply) => Compute Kernels Network transfer => Arrow Flight IPC _ => ... to be continued ... }
  • 18. © 2020 InfluxData. All rights reserved.18 InfluxDB line protocol weather,location=us-east temperature=82,humidity=67 1465839830100400200 weather,location=us-midwest temperature=82,humidity=65 1465839830100400200 weather,location=us-west temperature=70,humidity=54 1465839830100400200 weather,location=us-east temperature=83,humidity=69 1465839830200400200 weather,location=us-midwest temperature=87,humidity=78 1465839830200400200 weather,location=us-west temperature=72,humidity=56 1465839830200400200 weather,location=us-east temperature=84,humidity=67 1465839830300400200 weather,location=us-midwest temperature=90,humidity=82 1465839830400400200 weather,location=us-west temperature=71,humidity=57 1465839830400400200 Line Protocol Tutorial (link) Measurements Tags Fields Timestamp
  • 19. © 2020 InfluxData. All rights reserved.19 IOx Data Model weather,location=us-east temperature=82,humidity=67 1465839830100400200 weather,location=us-midwest temperature=82,humidity=65 1465839830100400200 weather,location=us-west temperature=70,humidity=54 1465839830100400200 weather,location=us-east temperature=83,humidity=69 1465839830200400200 weather,location=us-midwest temperature=87,humidity=78 1465839830200400200 weather,location=us-west temperature=72,humidity=56 1465839830200400200 weather,location=us-east temperature=84,humidity=67 1465839830300400200 weather,location=us-midwest temperature=90,humidity=82 1465839830400400200 weather,location=us-west temperature=71,humidity=57 1465839830400400200 location "us-east" "us-midwest" "us-west" "us-east" "us-midwest" "us-west" "us-east" "us-midwest" "us-west" temperature 82 82 70 83 87 72 84 90 71 humidity 67 65 54 69 78 56 67 82 57 timestamp 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.3004002Z 2016-06-13T17:43:50.3004002Z 2016-06-13T17:43:50.3004002Z Table: weather
  • 20. © 2020 InfluxData. All rights reserved.20 Code Examples Thesis: “When writing an analytic database, you will end up implementing the Arrow feature set” (Ecosystem integration is another major benefit of Arrow, subject of a future talk) + * Take performance comparisons with a large grain of salt Compare Plain Rust and Rust using the Arrow library
  • 21. © 2020 InfluxData. All rights reserved.21 Motivating Example “Find the rows that are not in `us-west`”
  • 22. © 2020 InfluxData. All rights reserved.22 Create the Array let string_vec: Vec<String> = (0..NUM_TAGS) .map(|i| { match i % 3 { 0 => "us-east", 1 => "us-midwest", 2 => "us-west", }.into() }) .collect(); let mut builder = StringBuilder::new(NUM_TAGS); (0..NUM_TAGS).enumerate() .for_each(|(i, _)| { let location = match i % 3 { 0 => "us-east", 1 => "us-midwest", 2 => "us-west", }; builder.append_value(location) .unwrap() }); let array = builder.finish(); > created array with 10000000 elements ~600ms > created array with 10000000 elements ~400ms +
  • 23. © 2020 InfluxData. All rights reserved.23 Memory Footprint let size = size_of::<Vec<String>>() + string_vec .iter() .fold(0, |sz, s| { sz + size_of::<String>() + s.len() }); println!("total size: {} bytes", size); println!("total size: {} bytes", array.get_array_memory_size()); > total size: 320000023 bytes ~320 MB * > total size: 149206128 bytes ~150 MB +
  • 24. © 2020 InfluxData. All rights reserved.24 Find Rows != “us-west” let not_west_bitset: Vec<bool> = string_vec .iter() .map(|s| s != "us-west") .collect(); let num_not_west = not_west_bitset .iter() .filter(|&&v| v) .count(); let not_west_bitset = neq_utf8_scalar( &array, "us-west" ).unwrap(); let num_not_west = not_west_bitset .iter() .filter(|v| matches!(v, Some(true))) .count(); > Found 6666667 not in west ~50ms > Found 6666667 not in west ~120ms +
  • 25. © 2020 InfluxData. All rights reserved.25 Find Rows != “us-west” (with null handling) let string_vec: Vec<Option<String>> = ...; let not_west_bitset: Vec<bool> = string_vec .iter() .map(|s| { s.as_ref() .map(|s| s != "us-west") .unwrap_or(false) }) .collect(); let num_not_west = not_west_bitset .iter() .filter(|&&v| v) .count(); + Same as previous > Found 6666667 not in west ~50ms
  • 26. © 2020 InfluxData. All rights reserved.26 Materialize rows for future processing let not_west: Vec<String> = not_west_bitset .iter() .enumerate() .filter_map(|(i, &v)| { if v { Some(string_vec[i].clone()) } else { None } }) .collect(); let not_west = filter( &array, &not_west_bitset ).unwrap(); > Made array of 6666667 Strings not in west ~450 ms > Made array of 6666667 Strings not in west ~50 ms +
  • 27. © 2020 InfluxData. All rights reserved.27 More efficient encoding (dictionary) let vb = StringBuilder::new(); let kb = Int8Builder::new(); let mut builder = StringDictionaryBuilder::new(vb,kb); (0..NUM_TAGS) .enumerate() .for_each(|(i, _)| { let location = match i % 3 { 0 => "us-east", 1 => "us-midwest", 2 => "us-west", }; builder.append(location).unwrap(); }); let array = builder.finish(); > total size: 10000688 bytes 10MB 250 ms + dictionary "us-east" "us-midwest" "us-west" Location 0 1 2 0 1 2 0 1 2 [0] [1] [2] [u8]
  • 28. © 2020 InfluxData. All rights reserved.28 SIMD Anyone? let output = gt( &left, &right ).unwrap(); +10 20 17 5 23 5 9 12 4 5 76 2 3 5 2 33 2 1 6 7 8 2 7 2 5 6 7 8 left right output 1 0 1 1 1 0 1 1 0 1 1 0 0 0 > > > >
  • 29. © 2020 InfluxData. All rights reserved.29 SIMD Implementation #[cfg(all(any(target_arch = "x86", target_arch = "x86_64"), feature = "simd"))] fn simd_compare_op<T, F>(left: &PrimitiveArray<T>, right: &PrimitiveArray<T>, op: F) -> Result<BooleanArray> where T: ArrowNumericType, F: Fn(T::Simd, T::Simd) -> T::SimdMask, { // use / error checking elided let null_bit_buffer = combine_option_bitmap( left.data_ref(), right.data_ref(), len )?; let lanes = T::lanes(); let mut result = MutableBuffer::new( left.len() * mem::size_of::<bool>() ); let rem = len % lanes; for i in (0..len - rem).step_by(lanes) { let simd_left = T::load(left.value_slice(i, lanes)); let simd_right = T::load(right.value_slice(i, lanes)); let simd_result = op(simd_left, simd_right); T::bitmask(&simd_result, |b| { result.write(b).unwrap(); }); } Source: arrow/src/compute/kernels/comparison.rs if rem > 0 { let simd_left = T::load(left.value_slice(len - rem, lanes)); let simd_right = T::load(right.value_slice(len - rem, lanes)); let simd_result = op(simd_left, simd_right); let rem_buffer_size = (rem as f32 / 8f32).ceil() as usize; T::bitmask(&simd_result, |b| { result.write(&b[0..rem_buffer_size]).unwrap(); }); } let data = ArrayData::new( DataType::Boolean, left.len(), None, null_bit_buffer, 0, vec![result.freeze()], vec![], ); Ok(PrimitiveArray::<BooleanType>::from(Arc::new(data))) }
  • 30. © 2020 InfluxData. All rights reserved.30 Other things needed in a database Vec<Option<String>> to support nulls Handle other data types with same code Vectorized implementations of filter, aggregate, etc Persist it to storage Send data over the network Ecosystem compatibility ...
  • 31. © 2020 InfluxData. All rights reserved.31 Rust / Arrow Community: Good and Getting better Major Roadmap Items (see also Apache Arrow (Rust) 2.0.0) 1. Support Stable Rust 2. Improved DictionaryArray support and performance 3. Improved compute kernel performance 4. SQL: Joins 5. Parallel CPU-bound operations; Additional platform support (e.g. ARMv8)InfluxData specifically is investing in: 1. Flight IPC 2. Improved Dictionary and Date/Time support 3. Data Fusion (some other tech talk)
  • 32. © 2020 InfluxData. All rights reserved.32 Thank You Find us online Github: https://github.com/influxdata/influxdb_iox Slack: https://influxdata.com/slack It is early days; there are many cool things left to implement And we are hiring (Senior IOx Engineer Job Posting)