TimescaleDB vs. InfluxDB: built differently for time-series data

By mfreed - 21 hours ago

Showing first level comment(s)

Wonderful analysis, I was waiting for something like this to come out!

Recently, I've gone through this very same choice and ended up with vanilla PostgreSQL (Timescale was not mature enough).

[Shameless self plug] You can read some of the details here: https://medium.com/@neslinesli93/how-to-efficiently-store-an...

neslinesli93 - 20 hours ago

As someone who is struggling with issues with InfluxDB in production environments, this just moved my `Investigate replacing Influx with Timescale` issue higher up my priority list. Many of the problems with InfluxDB pointed out in the article are indeed real-world pain points for us.

NickBusey - 19 hours ago

These kind of articles by one of the compared parties only become interesting once the other party responds. So waiting for Paul Dix to show up in this thread as usual.

NewsAware - 20 hours ago

The TimescaleDB benchmark code is a fork of code I wrote, as an independent consultant, for InfluxData in 2016 and 2017. The purpose of my project was to rigorously compare InfluxDB and InfluxDB Enterprise to Cassandra, Elasticsearch, MongoDB, and OpenTSDB. It's called influxdb-comparisons and is an actively-maintained project on Github at [0]. I am no longer affiliated with InfluxData, and these are my own opinions.

I designed and built the influxdb-comparisons benchmark suite to be easy to understand for customers. From a technical perspective, it is simulation-based, verifiable, fast, fair, and extensible. In particular, I created the "use-case approach" so that, no matter how technical our benchmark reports got, customers could say to themselves: "I understand this!". For example, in the devops use-case, we generate data and queries from a realistic simulation of telemetry collected from a server fleet. Doing it this way creates benchmarking stories that appeal to a wide variety of both technical and nontechnical customers.

This user-first design of a benchmarking suite was a novel innovation, and was a large factor in the success of the project.

Another aspect of the project is that we tried to do right by the competition. That means that we spoke with experts (sometimes, the creators of the databases themselves) on how to best achieve our goals. In particular, I worked hard to make the Cassandra, Elasticsearch MongoDB, and OpenTSDB benchmarks show their respective databases in the best light possible. Concretely, each database was configured in a way that is 1) featureful, like InfluxDB, 2) fast at writes, 3) fast at reads, and 4) efficient with disk space.

As an example of my diligence in implementing this benchmark suite for InfluxData, I included a mechanism by which the benchmark query results can be verified for correctness across competing databases, to within floating point tolerances. This is important because, when building adapters for drastically different databases, it is easy to introduce bugs that could give a false advantage to one side or the other (e.g. by accidentally throwing data away, or by executing queries that don't range over the whole dataset).

I don't see that TimescaleDB is using the verification functionality I created. I encourage TimescaleDB to run query verification, and write up their benchmarking methods in detail, like I did here: [1].

I think it's great that TimescaleDB is taking these ideas and extending them. At InfluxData, we made the code open-source so that others could build and learn from our work. In that tradition, I hope that the ongoing discussion about how to do excellent benchmarking of time-series databases keeps evolving.

[0] https://github.com/influxdata/influxdb-comparisons (Note that others maintain this project now.)

[1] https://rwinslow.com/rwinslow-benchmark-tech-paper-influxdb-...

rw - 18 hours ago

> the focus of time-series databases has been narrowly on metrics and monitoring

I am curious if ppl are using TSDB's for business predictions, machine learning, exploratory visualisation, datascience and AI. I got curious after seeing a udacity course on tsdb predictions.

https://www.udacity.com/course/time-series-forecasting--ud98...

dominotw - 18 hours ago

Nice article but is it deliberate that you don't mention getting data into the database (line protocol or similar) or analysing the results and displaying them.

Input = use postGres Output = Grafana has a postGres data source which I assume works and mention in timescales b's issues of a Grafana query helper.

Also lack of analysis, consolidation (continuous queries) and retention policies.

I am however intrigued as it does seem to hit my sweet spot of 100's of servers each with 5 to 10 different series of 10 metrics each, every 10 sec.

Database size might be an issue, as would the complexity of deployment (a big win for Prometheus rather than influxdb though).

Final thoughts Can't help feeling this looks like a few input scripts running on postGres rather than a system solution for metrics and annotations.

mintyc - 8 hours ago

Does anyone have thoughts on why Postgres shouldn't provide:

- Automatic sharding of tables per-"shard key".

- Automatic sharding of those keyed shards by the range of some primary index.

Doesn't this get you 90%+ of the way there? (There's no "adaptive" time bucketing, I guess.)

For the record, I am a veteran of a naive Postgres time series scheme that was brought to its knees by seek times.

evdev - 18 hours ago

At the end of the day Influx is going to be cerchunking along like the square wheel garbage collection is with occasional wild memory and latency gyrations. Couple with that a log structured tree for more "fun".

C doesn't guarantee you wont do things like that, but Timescale is built in such a way as to minimize most of this kind of extreme waste in patrols with full postgres user experience.

kev009 - 19 hours ago

Why not compare it with the market leader kdb+? https://kx.com/media/2018/06/KdbTransitive-Comparisons.pdf

hellomichibye - 17 hours ago

Would love a comparison to Market Store https://github.com/alpacahq/marketstore

freediver - 14 hours ago