I just uploaded the new bugfix release 0.2.4 of serienbrief, the LaTeX form letter generator.
There have been problems with complex documents, probably caused by colliding aux files. This has been fixed by using different temp files for each form letter.
Additionally, the command line option --repeat has been introduced so that LaTeX can be called multiple times to e. g. get cross-references right.
I just uploaded the new major release of my statistical ("Bayesian") PHP spam filter b8. A lot of work has been done and there are a lot of changes. If you experience any problem with the database update or any bug in general, please contact me!
From the ChangeLog, with comments:
Changes
Finally did an actually really complete abstraction of the storage backends. Now, the storage backends can really do what they want to store b8's wordlist. In this way, it was possible to change MySQL's database layout to store the data in multiple columns, rather than emulating the Berkeley DB behaviour.
Kicked out the never-used lastseen parameter. This results in less wasted space the wordlist takes and no more write actions when classifying a text. Data will now only be written to the database when learning or deleting a text.
Renamed the internal variables to b8*..., combined bayes*text.ham and bayes*texts.spam to b8*texts. This is just consequent and it volunteered to do so, as an update in the database structure was necessary anyhow (update scripts are included in the new release).
Removed all validate() functions in favor of throwing exceptions when something's wrong. In this way, b8 finally behaves like I wanted it to from the start: when something's wrong, simply no instance of b8 will be created. This was not possible back in PHP 4 times.
Made the lexer more flexible. Added functions for all split work, that, except for the raw split, can be turned on and off via a new config array.
The lexer now supports getting BBCode.
Added an additional check to the lexer to be sure no token will collide with an internal variable.
Added multibyte support to the degenerator so it is now able to handle non-latin-1 texts in the same way as it handles latin-1-texts. The difference of using or not using multibyte operations will only show up when non-latin-1 text is processed by b8. For example, if we have an unknown token HeLlO!, the degenerator will provide the degenerated versions hello!, HELLO!, Hello!, hello, HELLO, Hello and HeLlO, no matter if multibyte operations are used or not. When we have a non-latin-1 word, we may get a different result. For example, if we have the unknow token ПрИвЕт!, the degenerator will only provide one degenerated version of it when not using multibyte operations: ПрИвЕт. Using multibyte operations, we get the same variants as with the latin-1 word: привет!, ПРИВЕТ!, Привет!, привет, ПРИВЕТ, Привет and ПрИвЕт.
b8's constructor now takes four config arrays, the third is the lexer config, the fourth is the degenerator config.
Bugfixes
Removed the ucfirst function from the degenerator and replaced it with a custom one. It did not what I always thought it would do (first letter upper case, rest lower case), but does only converted the first letter to upper case.
Fixed the MySQL backend so it's now able to handle a get() request for an empty array or an array containing just one token.
Fixed the MySQL backend when doing a query with no returned result.
Fixed the lexer to never output an empty array of tokens, but a placeholder token if no token has been found.
I wanted to setup a small and easy mailing list for the music group of the local choir society, the Gesangverein 1860 Konradsreuth. First, I messed around with Mailman, but I wanted something much simpler.
Well, I didn't find any program that matched my needs. So I wrote my own: the very simple mailing list. After some weeks of initial testing and bugfixing, it does it's job now. So it's time to do a release and call it 0.1 :-)
Perhaps, somebody out there also looks for such a small mailing list solution that is only email-controlled without a web interface or the features of a fully-blown mailing list. If so: have a lot of fun :-)
There has been no release of b8, my PHP implemented statistical spam filter, for quite a while now. But b8’s not dead! It does it’s work day for day, here and probably on many other homepages. Just to say it: development goes on!
At the moment, I’m working on improvements on the database. The goal is to abstact the database layer in a way so that SQL backends can actually use multiple columns to store the data and don’t have to emulate the Berekely DB behaviour with only key-value-storage. Additionally, the infamous lastseen parameter will be finally kicked out, as it never has been used for anything and just eats computing time and database space.
As this is a one-man project, it will probably take me some more time until the new b8 0.6 release will be done. But I’m working on it :-)