# Non-Volatile Store

C

#### Curt Wuollet

Hi all

This is kinda out of the clear blue, but I thought I'd bounce it off ya and see what you think.

I was playing with some code for the old demo and got to thinking about how to preserve state across a boot or, heaven forbid, a crash. This provoked much discussion the first time around with schemes flying by and half a dozen threads. Since I was doing an external memory map mmap()ed into user space, it occurred to me that a very *nix way was staring me right in the face.

The data to be preserved would be written to or kept on a disk file mmap()ed into user space. With the right parameters this will flow to disk quickly through the existing kernel facilities.
I believe we can even block until the data is safely written, at least to the hdd cache. On restart the file is mmap()ed again and the data is just there. Simple and automatic. No fuss, no
bother.

OK Now go ahead and shoot holes in it.........

regards

cww

_______________________________________________
LinuxPLC mailing list
[email protected]
http://linuxplc.org/mailman/listinfo/linuxplc

C

#### Campbell, David (Ex AS17)

Not bad, seen the equivalent on a live DCS system (called "auto checkpoint").

Having the mmap area mapped to a disk file is no guarantee that the contents of memory will match the disk. I need to check the relevant kernel routines, but the page may only be flushed to disk if it has not been modified after "n" seconds.

You *may* need to spawn a thread in the process to forcibly flush the mmap contents to disk on a periodic basic (say 5 seconds). If this could be done in the mmap flags then this would be better.

Of course your method would work for a controlled shutdown, but for the instance of "is this power cord for the monitor?" it may not be enough. If you think the accidental power cord trick is unlikely to happen, I have been in an oil refinery control room when someone (not myself) accidentally plugged a machine set for 110V into a 240V outlet *BANG*, needless to say there were a few people cursing the relevant US standard which requires a physical switch to convert between voltages rather than the auto-sensing arrangement the rest of the world has adopted).

Finally, make sure there is some start-up option which allows the mmap to be cleared. Someone is going to write a program that will require a memory purge to unwedge. Of course a quick "rm" should do the trick but do you trust everyone

David Campbell

_______________________________________________
LinuxPLC mailing list
[email protected]
http://linuxplc.org/mailman/listinfo/linuxplc

J

#### Jiri Baum

Curt Wuollet wrote:
> The data to be preserved would be written to or kept on a disk file
> mmap()ed into user space.
...
> On restart the file is mmap()ed again and the data is just there. Simple
> and automatic. No fuss, no bother.

Currently, the SMM is initialized to all zeros (Mario, is this still right?).

With a little coding effort, it should be possible to use mmap() instead of shmem. I believe Mario was using the SMM temporarily during loading of the confmap, but it shouldn't be a problem to allocate another memory area (or
to dump the confmap to file too).

That said, I'm not sure if that's the right semantics.

In any case, we simply cannot guarantee non-volatile data without specialized hardware. We should be able to guarantee a *consistent* data
set no older than perhaps several seconds.

For that, we'd need a crash-proof filesystem (under development, as I heard) and a module that periodically dumps the SMM (or part thereof) to
disk with suitable checksums.

Then we need config specifications for volatile/non-volatile (and, while we're at it, initial value). "smm-mgr -g" would initialize the map to the latest consistent data set (for NV points) or the config init value (for volatile initialized points).

Maintaining NV points across config changes is left as an exercise for, umm, me I guess.

Jiri
--
Jiri Baum <[email protected]>
What we do Every Night! Take Over the World! Step 1 - bid for SMOFcon

_______________________________________________
LinuxPLC mailing list
[email protected]
http://linuxplc.org/mailman/listinfo/linuxplc

M

#### Mario J. R. de Sousa

Jiri Baum wrote:

> Curt Wuollet wrote:
> > The data to be preserved would be written to or kept on a disk file
> > mmap()ed into user space.
> ...
> > On restart the file is mmap()ed again and the data is just there. Simple
> > and automatic. No fuss, no bother.
>
> Currently, the SMM is initialized to all zeros (Mario, is this still
> right?).
>

Yes.

>
> With a little coding effort, it should be possible to use mmap() instead of
> shmem.

Yep, not very dificult.

> (...)
> In any case, we simply cannot guarantee non-volatile data without
> specialized hardware. We should be able to guarantee a *consistent* data
> set no older than perhaps several seconds.
>
> For that, we'd need a crash-proof filesystem (under development, as I
> heard) and a module that periodically dumps the SMM (or part thereof) to
> disk with suitable checksums.
>

Jiri has a good point here. If the system crashes, the filesystem may become unrecoverable, so its no use saving the info there.
[a little beside the point]
Actually, even without NV points, if you want the LPLC to withstand power failures, Linux will need to be configured to use a read-only file system or
to decompress the filesystem image at startup.

> Then we need config specifications for volatile/non-volatile (and, while
> we're at it, initial value). "smm-mgr -g" would initialize the map to the
> latest consistent data set (for NV points) or the config init value (for
> volatile initialized points).
>
> Maintaining NV points across config changes is left as an excercise for,
> umm, me I guess.
>

Yes, that will be interesting... ;-)
Let's not worry about this too much at the moment, especially since no crash proof filesystem exists yet, and we have yet to think about what semantics to support for online config changes. Eventually we will have to look into NV
points, but as we use Linux's virtual memory, I don't think it will be too difficult.

Mario

--
----------------------------------------------------------------------------
Mario J. R. de Sousa [email protected]
----------------------------------------------------------------------------

_______________________________________________
LinuxPLC mailing list
[email protected]
http://linuxplc.org/mailman/listinfo/linuxplc

C

#### Curt Wuollet

Jiri Baum wrote:

> Currently, the SMM is initialized to all zeros (Mario, is this still
> right?).
>
> With a little coding effort, it should be possible to use mmap() instead of
> shmem. I believe Mario was using the SMM temporarily during loading of the
> confmap, but it shouldn't be a problem to allocate another memory area (or
> to dump the confmap to file too).
>
> That said, I'm not sure if that's the right semantics.
>
> In any case, we simply cannot guarantee non-volatile data without
> specialized hardware. We should be able to guarantee a *consistent* data
> set no older than perhaps several seconds.
>
> For that, we'd need a crash-proof filesystem (under development, as I
> heard) and a module that periodically dumps the SMM (or part thereof) to
> disk with suitable checksums.

See Mandrake-Linux "Corporate Server 1.0" Reiser journaling fs in release. More bluesky: perhaps a two phase commit with fallback to last known state? Last thing in a scan cycle is a commit
There's always some uncertainty even with special hardware, we should do what we can do for free anyway. I'll peer into the kernel docs.

> Then we need config specifications for volatile/non-volatile (and, while
> we're at it, initial value). "smm-mgr -g" would initialize the map to the
> latest consistent data set (for NV points) or the config init value (for
> volatile initialized points).
>
> Maintaining NV points across config changes is left as an excercise for,
> umm, me I guess.

Is it really the problem to resume exactly or will last known state do? I don't think anyone can guarantee the former, especially with intelligent IO and peripherals. In a Micro perhaps, but with remote IO, I doubt it. Perhaps
some of our PLC gurus can relate the commercial reality.

cww

_______________________________________________
LinuxPLC mailing list
[email protected]
http://linuxplc.org/mailman/listinfo/linuxplc

C

#### Curt Wuollet

Hi all

A quick addendum: The msync() system call will sync the file with the map and wait until it's done. This would correspond to a commit, sorta.
Otherwise there is no guarantee that they are synced until an unmap().

regards

cww

_______________________________________________
LinuxPLC mailing list
[email protected]
http://linuxplc.org/mailman/listinfo/linuxplc

P

#### Philip Costigan

On Wed, 06 Sep 2000, Curt Wuollet wrote:
> Jiri Baum wrote:
>
> > Curt Wuollet wrote:
> > > The data to be preserved would be written to or kept on a disk file
> > > mmap()ed into user space.
> > ...
> > > On restart the file is mmap()ed again and the data is just there. Simple
> > > and automatic. No fuss, no bother.
> >
> > Currently, the SMM is initialized to all zeros (Mario, is this still
> > right?).
> >
> > With a little coding effort, it should be possible to use mmap() instead of
> > shmem. I believe Mario was using the SMM temporarily during loading of the
> > confmap, but it shouldn't be a problem to allocate another memory area (or
> > to dump the confmap to file too).
>
> Is it really the problem to resume exactly or will last known state do? I don't
>
> think anyone can guarantee the former, especially with intelligent IO and
> peripherals. In a Micro perhaps, but with remote IO, I doubt it. Perhaps
> some of our PLC gurus can relate the commercial reality.
>

Usually it is only internal data / io that needs to be remembered through a power fail (crash). remote io tends to look after itself.

If only there was some battery backed ram available somewhere.

Phil

_______________________________________________
LinuxPLC mailing list
[email protected]
http://linuxplc.org/mailman/listinfo/linuxplc

J

#### Johan Bengtsson

Ok, if it is important that the save state is consistent, ie that the whole block is from the same time. Can we guarantee that here?

/Johan Bengtsson

----------------------------------------
Box 252, S-281 23 H{ssleholm SWEDEN
Tel: +46 451 49 460, Fax: +46 451 89 833
E-mail: [email protected]
Internet: http://www.pol.se/
----------------------------------------

J

C

#### Curt Wuollet

Nothing is guaranteed :^) even atomic operations depend on power. But we can probably provide a useful degree of recovery this way. If we can keep the data to a single page and control when the page is flushed it's probably as close as we can get without static, battery backed RAM or other special hardware. And to a flash disk it would be as good as anything. Like I said before, the commercial guys can't do this either except for possibly the base rack. Care would have to be taken that the right state data gets saved. The only thing that would require speed would be the map. Config changes and things of that class would be updated as they occur, simply waiting until they hit the disk before proceeding. I do believe that a page write is atomic, it would be insanity if it's not. Since the average hdd cache is > 4k it should be possible to do this quickly. The good thing is that the existing kernel code does most of the critical stuff.

On a related note, I've got a small hack I wrote for an APC backups pro that does a controlled shutdown for my vision linux boxes. I know other people do this but mine is tiny and does what I want. I simply avoid the power off problem rather than try to keep going. The box shuts down and then the UPS shuts down so everything is ready
again on powerup. Everything I tried would run until the battery gave out. If power returned the box was still shut down unless the battery died. Sounds simple, but I couldn't find anything that
does it that way. I'll clean it up a little and post it. This way a small cheap UPS can make life much easier and these problems much simpler.

cww

_______________________________________________
LinuxPLC mailing list
[email protected]
http://linuxplc.org/mailman/listinfo/linuxplc

J

#### Jiri Baum

Mario de Sousa wrote:
> Jiri Baum wrote:
...
> > In any case, we simply cannot guarantee non-volatile data without
> > specialized hardware. We should be able to guarantee a *consistent*
> > data set no older than perhaps several seconds.

> > For that, we'd need a crash-proof filesystem (under development, as I
> > heard) and a module that periodically dumps the SMM (or part thereof)
> > to disk with suitable checksums.

> Jiri has a good point here. If the system crashes, the filesystem may
> become unrecoverable, so its no use saving the info there.

As Curt said, ReiserFS. (Which I didn't want to name because I know little about it except the name.)

Curt, what is the status of ReiserFS? Is it officially Released? I had a vague impression it wasn't...

Jiri
--
Jiri Baum <[email protected]>
What we do Every Night! Take Over the World! Step 1 - bid for SMOFcon

_______________________________________________
LinuxPLC mailing list
[email protected]
http://linuxplc.org/mailman/listinfo/linuxplc

J

#### Johan Bengtsson

Well, I don't agree.

If you do write it yourself, you can write down a new copy without discarding the last one, and the one before. If you constantly have several copies, each one time-stamped it is easy to roll back to a completely written copy (verified with some checksum method, a 64bit CRC or more could cover up quite a lot of probability), older copies are of course to be discarded, but say you have 4 complete copies, if the write operation on the last one isn't completed (or readable) you
will know that (checksum) and use the one before.

An UPS if of couse a good solution to avoiding the problem, it won't solve a problem where you have some hardware failure and similar, but it definitely reduce the risk very much.

If you have the money to put in an UPS, you could perhaps find the money to put in an battery backed RAM too... it would after all not have to be that expensive.

/Johan Bengtsson

----------------------------------------
Box 252, S-281 23 H{ssleholm SWEDEN
Tel: +46 451 49 460, Fax: +46 451 89 833
E-mail: [email protected]
Internet: http://www.pol.se/
----------------------------------------

_______________________________________________
LinuxPLC mailing list
[email protected]
http://linuxplc.org/mailman/listinfo/linuxplc

C

#### Curt Wuollet

Hi Jiri

Mandrake is apparently calling it release code and willing to support it, I saw an announcement that their Corporate Server V1.0 offered the Rieserfs. I know there has been some friction as Rieser is a for money operation. But, I can't imagine anyone bundling it and pushing it unless they have seen it work. Anyway, they offer a
download which may be possible in a few days when the /. effect wears off. Ext3 can't be far behind. And with a $100.00 UPS and the quick hack I wrote for APC backups pro, the only concern should be a coding bug and we know we won't have any of those :^p regards cww _______________________________________________ LinuxPLC mailing list [email protected] http://linuxplc.org/mailman/listinfo/linuxplc J #### Jiri Baum Curt Wuollet: > Mandrake is apparently calling it release code ... OK, no worries. Like I wrote, I know little about it, but I recall someone grumbling about the quality (or lack thereof) of the fsck. > And with a$100.00 UPS and the quick hack I wrote for APC backups pro,

That's true, too.

(Though in that case the machine can shut down in a controlled manner, and multiple checksummed dumps become an overkill. At that stage shutdown can proceed under program control in an orderly fashion.)

Jiri
--
Jiri Baum <[email protected]>
What we do Every Night! Take Over the World! Step 1 - bid for SMOFcon

_______________________________________________
LinuxPLC mailing list
[email protected]
http://linuxplc.org/mailman/listinfo/linuxplc

C

#### Curt Wuollet

Hi Johan

I think someone offers this already and as long as it maps above the "normal" ram we wouldn't need drivers or anything. Instead of making a flash disk and mapping a file, we could simply mmap() the flashram itself as I am doing in the demo with ram excluded from the memory manager. No latency in the buffer cache, it would be a simple page copy on msync(). We could use the
fact that there is no sync until an unmap as a feature, the maps would be effectively decoupled except when we msync() and should survive a power loss or crash just fine. I'm gonna check around to see what's available.

regards

cww

_______________________________________________
LinuxPLC mailing list
[email protected]
http://linuxplc.org/mailman/listinfo/linuxplc