heartbeat 1.0.1 released
[Posted February 24, 2003 by cook]
| From: |
| Alan Robertson <alanr@unix.sh> |
| To: |
| ha-linux List <linux-ha@muc.de>,
Linux-HA Development List <linux-ha-dev@lists.community.tummy.com>,
Linux Weekly News <lwn@lwn.net> |
| Subject: |
| Announcing version 1.0.1 of heartbeat: |
| Date: |
| Wed, 19 Feb 2003 13:10:51 -0700 |
Announcing version 1.0.1 of heartbeat, the first major stable release of the
Linux-HA project since March 2001, and the culmination of a long series of
successful beta releases.
The major features in this stable release include support for many new
STONITH devices, an application monitoring subsystem, unicast
communications, standby capability, IP connection montoring (IPfail)
feature, improved realtime performance, CCM membership subsystem, ability to
specify fractional seconds in times, *BSD and Solaris compatibility,
documentation improvements, extensions to heartbeat client API, etc.
Major bug fixes include:
Better handling and tracking of STONITH operations, new test cases,
compatibility with new compilers and libraries, split-brain recovery, better
tracking of child processes, more robustness with respect to timing issues,
significant portability improvements, etc.
The complete ChangeLog is attached.
Enjoy!
--
Alan Robertson <alanr@unix.sh>
"Openness is the foundation and preservative of friendship.... Let me claim
from you at all times your undisguised opinions." - William Wilberforce
* Mon Feb 17 2003 Alan Robertson <alanr@unix.sh> (see doc/AUTHORS file)
+ Version 1.0.1:
+ Fixed some compile errors on different platforms, and library versions
+ Disable ccm from running on 'ping' nodes
+ Put in Steve Snodgrass' fix to send_arp to make it work on non-primary
interfaces.
* Thu Feb 13 2003 Alan Robertson <alanr@unix.sh> (see doc/AUTHORS file)
+ Version 1.0.1 beta series
0.4.9g:
+ Changed default deadtime, warntime, and heartbeat interval
+ Auto* tool updates
+ VIP loopback fixes for IP address takeover
+ Various Solaris and FreeBSD fixes
+ added SNMP agent
+ Several CCM bug fixes
+ two new heartbeat API calls
+ various documentation fixes, including documentation for ipfail
+ Numerous minor cleanups.
+ Fixed a few bugs in the IPC code.
+ Fixed the (IPC) bug which caused apphbd to hang the whole machine.
+ Added a new IPC call (waitout)
+ Wrote a simple IPC test program.
+ Clarified several log messages.
+ Cleaned up the ucast communications plugin
+ Cleaned up for new C compilers
+ Fixed permissions bug in IPC which caused apphbd to not be usable by all
+ Added a new rtprio option to the heartbeat config file
+ updated apphbtest program
+ Changed ipfail to log things at same level heartbeat does
* Sat Nov 30 2002 Alan Robertson <alanr@unix.sh> (see doc/AUTHORS file)
+ Version 0.5 beta series (now renamed to 1.0.1 beta series).
0.4.9f:
+ Added pre-start, pre-stop, post-stop and pre-stop constructs in init script
+ various IPC fixes
+ Fix to STONITH behavior: STONITH unresponsive node right after we reboot
+ Fixed extreme latency in IPC code
+ various configure.in cleanups
+ Fixed memory leak in IPC socket code
+ Added streamlined mainloop/IPC integration code
+ Moved more heartbeat internal communication to IPC library
+ Added further support for ipfail
+ Added supplementary groups to the respawn-ed clients
+ Added standby to init script actions
+ Lots of minor CCM fixes
+ Split (most) resource management code into a separate file.
+ Fixes to accommodate different versions of libraries
+ Heartbeat API client headers fixup
+ Added new API calls
+ Simplified (and fixed) handling of local status. This would sometimes cause
obscure failures on startup.
+ Added new IPsrcaddr resource script
KNOWN BUGS:
+ apphbd goes into an infinite loop on some platforms
* Wed Oct 9 2002 Alan Robertson <alanr@unix.sh> (see doc/AUTHORS file)
0.4.9e:
+ Changed client code to keep write file descriptor open at all times
(realtime improvement)
+ Added a "poll replacement" function based on sigtimedwait(2), which
should be faster for those cases that can use it.
+ Added a hb_warntime() call to the application heartbeat API.
+ Changed all times in the configuration file to be in milliseconds
if specified with "ms" at the end. (seconds is still the default).
+ Fixes to serious security issue due to Nathan Wallwork <nwallwo@pnm.com>
+ Changed read/write child processes to run as nobody.
+ Fixed a bug where ping packets are printed incorrectly when debugging.
+ Changed heartbeat code to preallocate a some heap space.
+ CCM daemon API restructuring
+ Added ipc_channel_pair() function to the IPC library.
+ Changed everything to use longclock_t instead of clock_t
+ Fixed a bug concerning the ifwalk() call on ping nodes in the API
+ Made apphbd run at high priority and locked into memory
+ Made a library for setting priority up.
+ Made ucast comm module at least be configurable and loadable.
+ Fixed a startup/shutdown timing problem.
0.4.9d:
+ removed an "open" call for /proc/loadavg (improve realtime behavior)
+ changed API code to not 1-char reads from clients
+ Ignored certain error conditions from API clients
+ fixed an obscure error message about trying to retransmit a packet
which we haven't sent yet. This happens after restarts.
+ made the PILS libraries available in a separate package
+ moved the stonith headers to stonith/... when installed
+ improved debugging for NV failure cases...
+ updated AUTHORS file and simplified the changelog authorship
(look in AUTHORS for the real story)
+ Added Ram Pai's CCM membership code
+ Added the application heartbeat code
+ Added the Kevin Dwyer's ipfail client code to the distribution
+ Many fixes for various tool versions and OS combinations.
+ Fixed a few bugs related to clients disconnecting.
+ Fixed some bugs in the CTS test code.
+ Added BasicSanityCheck script to tell if built objects look good.
+ Added PATH-like capabilities to PILS
+ Changed STONITH to use the new plugin system.
+ *Significantly* improved STONITH usage message (from Lorn Kay)
+ Fixed some bugs related to restarting.
+ Made exit codes more LSB-compliant.
+ Fixed various things so that ping nodes don't break takeovers.
0.4.9c and before:
+ Cluster partitioning now handled correctly (really!)
+ Complete rearchitecture of plugin system
+ Complete restructure of build system to use automake and port things
to AIX, FreeBSD and solaris.
+ Added Lclaudio's "standby" capability to put a node into standby
mode on demand.
+ Added code to send out gratuitous ARP requests as well as gratuitous
arp replies during IP address takeover.
+ Suppress stonith operations for nodes which went down gracefully.
+ Significantly improved real-time performance
+ Added new unicast heartbeat type.
+ Added code to make serial ports flush stale data on new connections.
+ The Famous CLK_TCK compile time fixes (really!)
+ Added a document which describes the heartbeat API
+ Changed the code which makes FIFOs to not try and make the FIFOs for
named clients, and several other minor API client changes.
+ Fixed a fairly rare client API bug where it would shut down the
client for no apparent reason.
+ Added stonith plugins for: apcmaster, apcmastersnmp switches, and ssh
module (for test environments only)
+ Integrated support for the Baytech RPC-3 switch into baytech module
+ Fixes to APC UPS plugin
+ Got rid of "control_process: NULL message" message
+ Got rid of the "controlfifo2msg: cannot create message" message
+ Added -h option to give usage message for stonith command...
+ Wait for successful STONITH completion, and retry if its configured.
+ Sped up takeover code.
+ Several potential timing problems eliminated.
+ Cleaned up the shutdown (exit) code considerably.
+ Detect the death of our core child processes.
+ Changed where usage messages go depending on exit status from usage().
+ Made some more functions static.
+ Real-time performance improvement changes
+ Updated the faqntips document
+ Added a feature to heartbeat.h so that log messages get checked as
printf-style messages on GNU C compilers
+ Changed several log messages to have the right parameters (discovered
as a result of the change above)
+ Numerous FreeBSD, Solaris and OpenBSD fixes.
+ Added backwards compatibility kludge for udp (versus bcast)
+ Queued messages to API clients instead of throwing them away.
+ Added code to send out messages when clients join, leave.
+ Added support for spawning and monitoring child clients.
+ Cleaned up error messages.
+ Added support for DB2, ServeRAID and WAS, LVM, and Apache (IBMhttp too),
also ICP Vortex controller.
+ Added locking when creating new IP aliases.
+ Added a "unicast" media option.
+ Added a new SimulStart and standby test case.
+ Diddled init levels around...
+ Added an application-level heartbeat API.
+ Added several new "plumbing" subsystems (IPC, longclock_t, proctrack, etc.)
+ Added a new "contrib" directory.
+ Fixed serious (but trivial) bug in the process tracking code which caused
it to exit heartbeat - this occured repeatably for STONITH operations.
+ Write a 'v' to the watchdog device to tell it not to reboot us when
we close the device.
+ Various ldirectord fixes due to Horms
+ Minor patch from Lorn Kay to deal with loopback interfaces which might
have been put in by LVS direct routing
+ Updated AUTHORS file and moved list of authors over
* Fri Mar 16 2001 Alan Robertson <alanr@unix.sh>
+ Version 0.4.9
+ Split into 3 rpms - heartbeat, heartbeat-stonith heartbeat-ldirectord
+ Made media modules and authentication modules and stonith modules
dynamically loadable.
+ Added Multicast media support
+ Added ping node/membership/link type for tiebreaking. This will
be useful when implementing quorum on 2-node systems.
(not yet compatible with nice_failback(?))
+ Removed ppp support
+ Heartbeat client API support
+ Added STONITH API library
+ support for the Baytech RPC-3A power switch
+ support for the APCsmart UPS
+ support for the VACM cluster management tool
+ support for WTI RPS10
+ support for Night/Ware RPC100S
+ support for "Meatware" (human intervention) module
+ support for "null" (testing only) module
+ Fixed startup timing bugs
+ Fixed shutdown sequence bugs: takeover occured before
resources were released by other system
+ Fixed various logging bugs
+ Closed holes in protection against replay attacks
+ Added checks that complain if all resources aren't idle on startup.
+ IP address takeover fixes
+ Endian fixes
+ Removed the 8-alias limitation
+ Takeovers now occur faster (ARPs occur asynchronously)
+ Port number changes
+ Use our IANA port number (694) by default
+ Recognize our IANA port number ("ha-cluster") if it's in /etc/services
+ Moved several files, etc. from /var/run to /var/lib/heartbeat
+ Incorporated new ldirectord version
+ Added late heartbeat warning for late-arriving heartbeats
+ Added detection of and partial recovery from cluster partitions
+ Accept multiple arguments for resource scripts
+ Added Raid1 and Filesystem resource scripts
+ Added man pages
+ Added debian package support
* Fri Jun 30 2000 Alan Robertson <alanr@unix.sh>
+ Version 0.4.8
+ Incorporated ldirectord version 1.9 (fixes memory leak)
+ Made the order of resource takeover more rational: Takeover is now
left-to-right, and giveup is right-to-left
+ Changed the default port number to our official IANA port number (694)
+ Regularized more messages, eliminated some redundant ones.
+ Print the version of heartbeat when starting.
+ Print exhaustive version info when starting with debug on.
+ Hosts now have 3 statuses {down, up, active} active means that it knows
that all its links are operational, and it's safe to send cluster
messages
+ Significant revisions to nice_failback (mainly due to lclaudio)
+ More SuSE-compatibility. Thanks to Friedrich Lobenstock <fl@fl.priv.at>
+ Tidied up logging so it can be to files, to syslog or both (Horms)
+ Tidied up build process (Horms)
+ Updated ldirectord to produce and install a man page and be
compatible with the fwmark options to The Linux Virtual Server (Horms)
+ Added log rotation for ldirectord and heartbeat using logrotate
if it is installed
+ Added Audible Alarm resource by Kirk Lawson <lklawson@heapy.com>
and myself (Horms)
+ Added init script for ldirectord so it can be run independently
of heartbeat (Horms)
+ Added sample config file for ldirectord (Horms)
+ An empty /etc/ha.d/conf/ is now part of the rpm distribution
as this is where ldirectord's configuration belongs (Horms)
+ Minor startup script tweaks. Hopefully, we should be able to make core
files should we crash in the future. Thanks to Holger Kiehl for diagnosing
the problem!
+ Fixed a bug which kept the "logfile" option from ever working.
+ Added a TestCluster test utility. Pretty primitive so far...
+ Fixed the serial locking code so that it unlocks when it shuts down.
+ Lock heartbeat into memory, and raise our priority
+ Minor, but important fix from lclaudio to init uninited variable.
* Sat Dec 25 1999 Alan Robertson <alanr@unix.sh>
+ Version 0.4.7
+ Added the nice_failback feature. If the cluster is running when
the primary starts it acts as a secondary. (Luis Claudio Goncalves)
+ Put in lots of code to make lost packet retransmission happen
+ Stopped trying to use the /proc/ha interface
+ Finished the error recovery in the heartbeat protocol (and got it to work)
+ Added test code for the heartbeat protocol
+ Raised the maximum length of a node name
+ Added Jacob Rief's ldirectord resource type
+ Added Stefan Salzer's <salt@cin.de> fix for a 'grep' in IPaddr which
wasn't specific enough and would sometimes get IPaddr confused on
IP addresses that prefix-matched.
+ Added Lars Marowsky-Bree's suggestion to make the code almost completely
robust with respect to jumping the clock backwards and forwards
+ Added code from Michael Moerz <mike@cubit.at> to keep findif from
core dumping if /proc/route can't be read.
* Mon Nov 22 1999 Alan Robertson <alanr@unix.sh>
+ Version 0.4.6
+ Fixed timing problem in "heartbeat restart" so it's reliable now
+ Made start/stop status compatible with SuSE expectations
+ Made resource status detection compatible with SuSE start/stop expectations
+ Fixed a bug relating to serial and ppp-udp authentication (it never worked)
+ added a little more substance to the error recovery for the HB protocol.
+ Fixed a bug for logging from shell scripts
+ Added a little logging for initial resource acquisition
+ Added #!/bin/sh to the front of shell scripts
+ Fixed Makefile, so that the build root wasn't compiled into pathnames
+ Turned on CTSRTS, enabling for flow control for serial ports.
+ Fixed a bug which kept it from working in non-English environments
* Wed Oct 13 1999 Alan Robertson <alanr@unix.sh>
+ Version 0.4.5
+ Mijta Sarp added a new feature to authenticate heartbeat packets
using a variety of strong authentication techniques
+ Changed resource acquisition and relinquishment to occur in heartbeat,
instead of in the start/stop script. This means you don't *really*
have to use the start/stop script if you don't want to.
+ Added -k option to gracefully shut down current heartbeat instance
+ Added -r option to cause currently running heartbeat to reread config files
+ Added -s option to report on operational status of "heartbeat"
+ Sped up resource acquisition on master restart.
+ Added validation of ipresources file at startup time.
+ Added code to allow the IPaddr takeover script to be given the
interface to take over, instead of inferring it. This was requested
by Lars Marowsky-Bree
+ Incorporated patch from Guenther Thomsen to implement locking for
serial ports used for heartbeats
+ Incorporated patch from Guenther Thomsen to clean up logging.
(you can now use syslog and/or file logs)
+ Improved FreeBSD compatibility.
+ Fixed a bug where the FIFO doesn't get created correctly.
+ Fixed a couple of uninitialized variables in heartbeat and /proc/ha code
+ Fixed longstanding crash bug related to getting a SIGALRM while in malloc
or free.
+ Implemented new memory management scheme, including memory stats
* Thu Sep 16 1999 Alan Robertson <alanr@unix.sh>
+ Version 0.4.4
+ Fixed a stupid error in handling CIDR addresses in IPaddr.
+ Updated the documentation with the latest from Rudy.
* Wed Sep 15 1999 Alan Robertson <alanr@unix.sh>
+ Version 0.4.3
+ Changed startup scripts to create /dev/watchdog if needed
+ Turned off loading of /proc/ha module by default.
+ Incorporated bug fix from Thomas Hepper <th@ant.han.de> to IPaddr for
PPP configurations
+ Put in a fix from Gregor Howey <ghowey@bremer-nachrichten.de>
where Gregor found that I had stripped off the ::resourceid part
of the string in ResourceManager resulting in some bad calls later on.
+ Made it compliant with the FHS (filesystem hierarchy standard)
+ Fixed IP address takeover so we can take over on non-eth0 interface
+ Fixed IP takeover code so we can specify netmasks and broadcast addrs,
or default them at the user's option.
+ Added code to report on message buffer usage on SIGUSR[12]
+ Made SIGUSR1 increment debug level, and SIGUSR2 decrement it.
+ Incorporated Rudy's latest "Getting Started" document
+ Made it largely Debian-compliant. Thanks to Guenther Thomsen, Thomas
Hepper, Iñaki Fernández Villanueva and others.
+ Made changes to work better with Red Hat 6.1, and SMP code.
+ Sometimes it seems that the Master Control Process dies :-(
* Sat Aug 14 1999 Alan Robertson <alanr@unix.sh>
+ Version 0.4.2
+ Implemented simple resource groups
+ Implemented application notification for groups starting/stopping
+ Eliminated restriction on floating IPs only being associated with eth0
+ Added a uniform resource model, with IP resources being only one kind.
(Thanks to Lars Marowsky-Bree for a good suggestion)
+ Largely rewrote the IP address takeover code, making it clearer, fit
into the uniform resource model, and removing some restrictions.
+ Preliminary "Getting Started" document by Rudy Pawul
+ Improved the /proc/ha code
+ Fixed memory leak associated with serial ports, and problem with return
of control to the "master" node.
(Thanks to Holger Kiehl for reporting them, and testing fixes!)
* Tue Jul 6 1999 Alan Robertson <alanr@unix.sh>
+ Version 0.4.1
+ Fixed major memory leak in 0.4.0 (oops!)
+ Added code to eliminate duplicate packets and log lost ones
+ Tightened up PPP/UDP startup/shutdown code
+ Made PPP/UDP peacefully coexist with "normal" udp
+ Made logs more uniform and neater
+ Fixed several other minor bugs
+ Added very preliminary kernel code for monitoring and controlling
heartbeat via /proc/ha. Very cool, but not really done yet.
* Wed Jun 30 1999 Alan Robertson <alanr@unix.sh>
+ Version 0.4.0
+ Changed packet format from single line positional parameter style
to a collection of {name,value} pairs. A vital change for the future.
+ Fixed some bugs with regard to forwarding data around rings
+ We now modify /etc/ppp/ip-up.local, so PPP-udp works out of the box
(at least for Red Hat)
+ Includes the first version of Volker Wiegand's Hardware Installation Guide
(it's pretty good for a first version!)
* Wed Jun 09 1999 Alan Robertson <alanr@unix.sh>
+ Version 0.3.2
+ Added UDP/PPP bidirectional serial ring heartbeat
(PPP ensures data integrity on the serial links)
+ fixed a stupid bug which caused shutdown to give unpredictable
results
+ added timestamps to /var/log/ha-log messages
+ fixed a couple of other minor oversights.
* Sun May 10 1999 Alan Robertson <alanr@unix.sh>
+ Version 0.3.1
+ Make ChangeLog file from RPM specfile
+ Made ipresources only install in the DOC directory as a sample
* Sun May 09 1999 Alan Robertson <alanr@unix.sh>
+ Version 0.3.0
+ Added UDP broadcast heartbeat (courtesy of Tom Vogt)
+ Significantly restructured code making it easier to add heartbeat media
+ added new directives to config file:
+ udp interface-name
+ udpport port-number
+ baud serial-baud-rate
+ made manual daemon shutdown easier (only need to kill one)
+ moved the sample ha.cf file to the Doc directory
* Sat Mar 27 1999 Alan Robertson <alanr@unix.sh>
+ Version 0.2.0
+ Make an RPM out of it
+ Integrated IP address takeover gotten from Horms
+ Added support to tickle a watchdog timer whenever our heart beats
+ Integrated enough basic code to allow a 2-node demo to occur
+ Integrated patches from Andrew Hildebrand <andrew@pdi.com> to allow it
to run under IRIX.
- Known Bugs
- Only supports 2-node clusters
- Only supports a single IP interface per node in the cluster
- Doesn't yet include Tom Vogt's ethernet heartbeat code
- No documentation
- Not very useful yet :-)
###########################################################
High-Availability Linux Project
Download heartbeat
(
Log in to post comments)