b8 development goes on
It's been a while since the last release of my Bayesian PHP spam filter b8. But a lot has been done in the meantine.
The next b8 release (0.5) will be a major one with major changes. Oliver Lillie sent me a basic PHP 5 port of b8, I used his code as the base for b8 0.5. Almost all parts have been completely rewritten, only the math remains untouched. The most significant changes are:
- No PHP 4 compatibility anymore. Much cleaner code base with less hacks.
- Completely reworked storage model. The SQL performance increased dramatically, the Berkeley DB performance remains as fast as it always has been.
- Better lexer which can also handle non-latin1 texts in a nice way, so that e. g. Cyrillic or Chinese texts can be classified more performant. This lexer has also been back-ported to the 0.4.x branch of b8. It will work better than b8 0.4.4's lexer, but be aware that PHP 4 won't handle Unicode correctly.
- No config files anymore, multiple instances of b8 can be now created in the same script with different configuration, databases and no problems.
- No spooky administration interface anymore that needs an SQL database, even if Berkeley DB is used (anybody who actually used this?! I never did ;-).
- No "install" scripts and routines and a less end-user compatible documentation. Anybody integrating b8 in his homepage won't be an end-user, will he?
Everybody is invited to check out the current Subversion trunk of b8 and test it. It should work fine. At the moment, there's no SQLite backend available, but I'll inform Laurent Goussard of the upcoming release, perhaps, he will port his 0.4.4 backend to b8 0.5.
I'll announce the new release here; if anybody finds bugs or has an idea of how to make b8 better: feel free to send me an E-Mail.