dnAnalytics 0.3.1 Beta

December 6, 2008

We’ve released a beta of dnAnalytics 0.3.1.  The final releases is scheduled for late February 2009.

Release notes:
* Adds initial F# interface
* Adds sparse solvers
* Adds Matlab matrix readers/writers
* Adds visual debuggers for matrices and vectors
* Adds probability distributions
* Adds random number generation classes
* Adds a descriptive statistics class

{ 0 comments }

For our descriptive statistics class, we need to compute the standard deviation of a data series.  We tested a half dozen or so algorithms for speed and accuracy and we settled on the “two pass method” [1].

public double Mean(IEnumerable<double> data) {
   double mean = 0;
   int m = 0;
   foreach (var d in data){
      mean += (d - mean)/++m;
   }
   return mean;
}
public double StdDeviation(IEnumerable<double> data) {
   double mean = data.Mean();
   double std = 0;
   int m = 0;
   foreach (double d in data){
      double tmp = d - mean;
      std += (tmp*tmp);
      m++;
   }

   return Math.Sqrt(std/(m - 1));
}

NIST provides a set of univariate test data sets of varying difficulty with exact values of their mean and standard deviation.  Running our code against these data sets, we get a the following log relative errors (LRE) [2].

Data Set     LRE
Lottery      15
Lew          15
Mavro        13.1
Michelso     13.8
NumAcc1      15
NumAcc2      14.2
NumAcc3      9.5
NumAcc4      8.3

Those values put us right up there with most statistical software [3].  Notice that for the high difficulty data set NumAcc4, we (and most other software) are only correct to 8 significant digits.

To improve the accuracy we can use the decimal type for internal calculations.

public double StdDeviation(IEnumerable<double> data) {
   decimal mean = data.Mean();
   decimal std = 0;
   int m = 0;
   foreach (decimal d in data){
      decimal tmp = d - mean;
      std += (tmp*tmp);
      m++;
   }

   return Math.Sqrt((double)std/(m - 1));
}

All data sets now return the maximum LRE of 15 - great!  There are two problems though.  First, the decimal version is ten time slower than the double version.  But that probably isn’t too big of an issue since it only takes 20ms to calculate the NumAcc4 standard deviation using decimals.  Second, there is a greater chance of an overflow since the decimal type has a smaller range than a double type.

We’ve added the decimal version to the dnAnalytics’ DescriptiveStatistics class as an option.

[1] See Wikipedia for a discussion on why calculating standard deviation is difficult and the various algorithms used to compute it.
[2] You can interpret the LRE as the number of correct significant digits.
[3] For an accuracy comparison of statistical software see: Kellie B. Keeling, Robert J. Pavur, A comparative study of the reliability of nine statistical software packages, Computational Statistics & Data Analysis, Volume 51, Issue 8, 1 May 2007, Pages 3811-3831.
(http://www.sciencedirect.com/science/article/B6V8V-4JHMGWJ-1/2/77a29a95c2071997f13fcca7267711d1)

I’ve finally gotten around to writing the LaTeX build component for the Sandcastle Help File Builder.  A quick proof of concept only took about 20 lines of code.  It scans the XML comments for <latex> tags and then generates a GIF image based on the latex code in the tag (using MimeTeX).  It then replaces the <latex> tag with an <img> tag that points to the generated image.  So far, it seems to work. The only glitch is that it processes each <latex> tag three times.  I’m not sure why yet.  Anyway, the project is up on Assembla:
http://www.assembla.com/spaces/latex_sandcastle

There are two builds of the plug-in, one for x86 systems and em64t (x64) systems.  For now, there are only MS Windows builds.  If there is demand for Linux versions, I’ll build them.

The build component is pretty simple to use with the Sandcastle Help File Builder (SHFB):

  1. Add the three files from the zip file’s binary (x86 or x64) directory into the SHFB BuildComponent directory.
  2. Add the “LaTeX Build Component” to in ComponentConfiguration in your SHFB config file.
  3. Add <latex> tags to your source code comments. The <latex> tags have to be inside of a regular XML comment tag such as summary, remarks, etc.
  4. Build your help file.

CodePlex only offers a handful of open source licenses by default, but you can use others if you e-mail them and ask for a custom license.  I tried to put the LaTeX build component up on CodePlex since Sandcastle is hosted there along with other extensions to Sandcastle.   The build component needs to be licensed under the GPL v3 since it is built with MimeTeX which is under the GPL v3.  I contacted CodePlex about using a custom license and they said I couldn’t use GPL v3 - it isn’t supported.  err, what?  How can a open source hosting site not support GPL v3 (but does support v2)?  I’m not a fan of the FSF/GPL*, but you have to support one of the fastest growing licenses.

*I have no problem with strong copyleft licenses, I just prefer permission licenses and dislike the FSF politics.

Charting Controls

November 12, 2008

My favorite .NET charting control is Dundas.  It produces the best looking charts, has a great set of features, and a pretty decent API.  But I started needing features from their enterprise edition a couple years ago and it got too expense for the types of projects I work on - Dundas requires runtime fees for their enterprise edition.  So I switched over to ChartFX.  Not as good in my opinion as Dundas, nor is it updated as often, but good enough especially with their extension pack - and no runtime fees.

All was good.  Then I started working on a couple ASP.NET projects. Uh Oh, ChartFX has server fees.  This lead me to ChartDirector.  ChartDirector is a charting component that works with every major programming language and platform, and is relative inexpensive with no runtime or server fees. The problem is that is a pain to use.  Its very configurable, but you need to configure everything. The size of the chart area, axis, etc. and that gets tricky when working with different types of data.  But once everything is set, it works well and produces decent looking charts - not as good as Dundas or ChartFX.

I also have a DevExpress subscription and they added a chart component a while ago.  At first it was pretty much useless, but they have been improving it over the last couple of releases.  A new ASP.NET project came up and I was already going to use DevExpress’ ASP.NET controls, so I thought I’d give their XtraCharts control a spin.  They made some drastic improvements to their chart control.  It worked pretty well for this project.  I still have a couple issues with the component though.  It doesn’t support a hi-lo area/range charts, nor annotations.  Depending on the project, that might not be an issue.  A deal killer though is the anti-aliasing they use for their titles.  It makes fonts 10 points or under almost unreadable and really distracts from the final chart.  I wouldn’t consider this control if I had to use small fonts.

The other day I found out that Microsoft has released their own charting control.  I gave it a spin and it seems to a version of the Dundas control with most of the enterprise features I need with no runtime or server fees - yippy!!.  It lacks the designer found in full Dundas library, but I’ve never be a fan of visual designers.  I’ll now be using the MS/Dundas chart control for all new .NET Windows projects and ChartDirector for any Mono Linux projects.