b8-0.4_pre5
I'll write this in English so that all the (millions and billions of ;-) b8 users out there can also read it.
b8 0.4 is finished. It only lacks Laurent Goussard's storage class for SQLite, so I made a pre-release. Everything is done, only the class with the documentation for it will be added in the final version, so I don't want to let you wait any longer :-)
Really a lot of stuff happened since the last release. The performance increased a lot, and so, the useability did (at least, I do hope so)! Here is the ChangeLog from the b8-0.4_pre5 release:
- Let's go the whole hog. b8's class is now "b8" and no more "bayes", and all internal variables have now according names.
- Reworked the whole (surprisingly crappy) implementation of b8. No more
global()
calls, everything happens inside the classes now. Made that whole stuff really object oriented (as good as possible with PHP's poor OOP model ;-). - No more PHP code in the configuration files.
- Created an extra lexer class. This is now also configurable.
- Storage classes now can create their own databases.
- MySQL calls are no random shots anymore: either, a MySQL-link resource is passed to b8 on startup which will be used for the queries, or the class sets up it's own link
- interface now uses a separate storage backend capable of SQL. In this way, we really can query the database for e. g. an ordered list of tokens. After doing what we wanted with this work database, the b8 database can be synced with it.
- Added a lot of verbose error handling.
- Fixed a dumb error: all tokens from a text were used for the spamminess calculation, because two
for()
loops both used$i
as their counter. D'oh!!! Now, the filter's performance is way better. - Catched on the way how that whole math stuff works a little more ;-) Now, the calculation of the single probabilities proposed by Mr. Robinson does a little more the stuff it was intended to do, because …
- Made some calculation constants parameters: the number of tokens to use, the default rating for unknown tokens and Gary Robinson's s constant.
- Introduced an optional minimum deviation that a token's rating must have to be considered in the spamminess calculation.
- The default extreme ratings for tokens only in ham or spam are now optional. One can also choose to calculate all ratings by Mr. Robinson's method.
- Noticed that text primary keys are not case sensitive by default in MySQL, which has a noticeable impact on the filter's performance. Informed the MySQL users about that.
- The whole code sucks much less ;-) b8 should be way more user friendly now.
- Re-wrote the whole documentation.
That's all, folks! Have a lot of fun using the new b8 version if you don't use SQLite and just be pacient a few days more if you do ;-)