Using memory mapped database tables for data acquisition

P

Thread Starter

pvbrowser

For data acquisition we normally use a shared memory where a data acqusiton daemon stores the acquired data.

In principle you could do that with a sql database instead of shared memory also. For example MySQL supports tables that are stored in Memory and not on disk.

http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html

Can anybody tell me more about the performance that can be achieved with such a database ? That is how many queries/second can be done ?

This is our principle for data acquisition:
http://pvbrowser.de/pvbrowser/pic/prinzip.png

See:
http://pvbrowser.org

There the shared memory could be easily substituted by a sql database table if the performance is sufficient. I think such a table should be stored in memory and not on disk.
 
The usual rule of thumb that I've seen for in-memory databases is ten times faster than conventional databases. That is assuming you have enough RAM of course. The problem with any such number is that a lot depends on what features you need, how you are using them, how complex your queries are, etc. Also, if your dataset is small enough a conventional database will often have its entire dataset in buffer anyway, so it's "in-memory" even if it isn't designed as such (although admittedly there's no guarantee it will stay in-memory).

A big question is also whether or not you need SQL. A lot of the new activity in databases these days centres around non-SQL databases. If you don't need the features that SQL provides (and most people don't), then you can have large performance improvements by using other methods (usually some sort of key-value store with a map/reduce type query mechanism).

Another big question is whether the data needs to be persistent, and if so what sort of guarantees you need. Some of the in-memory databases simply lose everything if the power goes off, while others back changes to disk on a regular schedule.

Some of the databases that come to mind are:

1) MonetDB
2) The real-time version of MySQL
3) Redis
4) VoltDB

Your best bet is to post some questions to the forums that belong to those databases. If you need actual performance numbers, then the only certain way of getting accurate ones is to do your own benchmarks using your own datasets and queries. One of the people involved with the above databases could probably give you a rough estimate however, provided you can tell them how many records, the size of the records, and the sort of queries you want to make.
 
> then the only certain way of getting accurate ones is to do your own benchmarks using your own datasets and queries. <

Thank you,

i will do that and compare the results from MySQL to our shared memory.

An advantage of an SQL database for data acquisition (especially MySQL) is that you can use the acquired data from any web application also.

Our solution with the shared memory needs our class rlSharedMemory
http://pvbrowser.org/pvbrowser/sf/manual/rllib/html/classrlSharedMemory.html

which you can't use so easily from PHP (as long as you do not create a language binding with swig).

I guess that a shared memory should be much faster than an SQL query. But SQL might be an option for moderate update cycles.
 
C

curt wuollet

And try not to use more software than you need if speed is your goal. Back when resources were really expensive, (not very long ago), it was common practice to roll your own with various ISAM libraries, moving to an engine only if the size or traffic got out of hand. Most of these were much faster than SQL engines, and I know there are many CISAM (Informix) and other ISAM dbs out there because they need the performance. Especially for smaller datasets. Small now, would have been very large then. Berkeley db is a free, high performance engine that could save a lot of work. Use a small, efficient, dbm and keep your most accessed data in one or a few tables and let the vmm keep it in core. It's all memory mapped already.

Regard,
cww
 
If all you are trying to do is to provide a web interface to the shared memory then I don't think a database is necessary. You might want the database for logging data, but there are other ways of providing a memory interface.

If your web application can do http requests you can have a proxy server read the shared memory (I believe rlib uses memory mapped files, I am correct?). If so, it would work like this:

1) The web app receives a request, and starts to process it.

2) It reaches the point where it needs to read or write some data, and figures out what data it needs to work with.

3) It forms an HTTP GET or POST and sends that to the proxy server.

4) The proxy server reads or writes the memory mapped file for the appropriate data. This should be very fast.

5) The proxy server constructs the response and sends it back to the web server.

6) The web application continues on finishing constructing its own response and sends it.

7) The web application waits for the next request.

A number of my own programs put various types of web interfaces on data tables by one means or another, and this doesn't look too different from that. A straightforward rlib web proxy that just read from and wrote to a shared memory file without any elaborate transformations could be written in Python in a few hours. The only dependencies would be the standard Python library, and Python is part of the base installation of most Linux distros.

If you aren't just trying to provide a web interface to rlib and the real application is logging data with occasional web reports, then I'm not sure where the in-memory database helps. If you are logging data, then in typical high volume logging applications the problem is write speed (and having enough disk capacity). Being in-memory doesn't help you there if you want the data to be persistent.

The main application for the newer non-SQL databases by the way is very high volume web applications. SQL databases just don't scale to those levels. I'm not aware of any automation applications that require those levels of performance however.
 
Hi M Griffin,

i'm a little bit brain storming how to do data acquisition right, so that it is useful for as many people as possible.

We run http://pvbrowser.org and everything is right for us.

Our project provides rllib

http://pvbrowser.org/pvbrowser/sf/manual/rllib/html/classes.html
which is a platform independent library in C++ .

Within that library we have "rlSharedMemory" for example. When we do data acquisition we have a separate daemon that reads the data cyclically (e.g. over Modbus) and stores them in the shared memory. Outputs are send over a mailbox to the daemon "rlMailbox". So everything is fine for us.

If we want to use rllib from PHP or any other scripting language we use http://swig.org for generating a language binding. Thus the shared memory can be used from a web application directly. But there is a more simple way to use the shared memory from a web application. In PHP you can call "void passthru ( string $command [, int &$return_var ] )" and run a C/C++ app from within the web application almost like a CGI script.

Now another option would be to use a table from a SQL database instead of our shared memory. Because of performance reasons this table should be in memory. It does not need to be persistent because it's content will be continuously updated by the data acquisition daemon. Logging historical data would be done in tables which reside on disk of course.

For access to SQL databases we rely on

http://doc.qt.nokia.com/4.6/qsqldatabase.html

We provide a template that uses this class so that our users can use any database supported by Qt. See:

http://pvbrowser.de/pvbrowser/tar/pvbaddon.tar.gz
directory: pvbaddon/templates/qtDatabase

So within our project we have all pieces to solve the puzzle.
But the problem is that there are so many protocols in use for data acquisition and most are proprietary. My dream would be to have a really open interface for that. In theory this should be "OPC UA" in the future. But is it really open ? A really open system could use a standard SQL database table in my opinion. So everybody capable to read/write SQL database tables would have access to process data.

I think you and curt wuollet are also thinking about open source solutions in automation industry. My idea is to develop an open interface to any hardware available in automation industry.

We now do it like this (shared memory / mailbox)
http://pvbrowser.de/pvbrowser/pic/prinzip.png

and support
Modbus (Serial Line and TCP)
Siemens TCP
Siemens PPI
Ethernet_IP
EIBnet/KNX
OPC XML-DA
PROFIBUS
CAN
DCON protocol

If we would cooperate we could support more i think. I do not have all the hardware available out there. But some of you might have it.
 
It sounds like what you are really looking for is something like what this guy is working on:

http://www.freedesktop.org/wiki/Specifications/DBPC
http://sourceforge.net/projects/dbpc/

It's called DBPC - DBus for Process Control. You can think of it as being something like OPC, but based on open principles (and without the complexity and overhead of COM/DCOM).

It's a new project and is not in a usable state today. I think for what you want however, this is a much more promising avenue. He contacted me a few weeks ago, so I know he is interested in more participants.
 
Top