Jump to content


Sigma Point Kalman Filters


  • Please log in to reply
13 replies to this topic

#1 Haldir

Haldir

    Advanced Member

  • Members
  • PipPipPip
  • 38 posts
  • Country: flag of Germany Germany


Posted 01 July 2010 - 12:52 PM

This thread is supposed to continue the previous discussions on Sigma Point Kalman Filters in http://forums.openpi...kf-coming-in-c/ or http://forums.openpi...c/588-which-kf/

The several Sigma Point Kalman Filters (SPKF) are the most promising development in Kalman Filters today. The goal of this thread is to give an educated suggestion towards the right Sigma Point Kalman Filter for sensor fusion in an embedded system.
The following list of abbrevations should help in the discussion.

Flavors of SPKF:

  • Unscented Kalman Filter (UKF)
  • Central Difference Kalman Filter (CDKF)
  • Scaled Unscented Transform Kalman Filter (SUKF)
  • Spherical Simplex Unscented Kalman Filter (SSUKF)
  • Schmidt Orthogonal Sigma Point Kalman Filter (SOSPF)
  • Marginal Geometric Kalman Filter (MGSPF)

The above flavors mainly differ between the necessary Sigma Points for state estimation. They range between less then the state vector length (n) for special applications to a multiple of the state vector length (n). The most common SR-UKF needs atleast 2n+1 Sigma Points (SP).
The complexity of SPKF is O(n³), so less SP are preferred.

For some of the above SPKF there are special versions for additive noise or for improved numerical stability (Square-Root).

Important subflavors:
  • Square-Root Unscentend Kalman Filter (SRUKF)
  • Square-Root Central Difference Kalman Filter (SRCDKF)
  • Square-Root Spherical Simplex Unscented Kalman Filter (SRSSUKF)

A good overview of the several papers about this topic is located at: http://lewpayne.blogspot.com/

One of the most important papers about SPKF is: http://speech.bme.og...ps/merwe04a.pdf which includes the UKF and the SRUKF for sensor fusion.

#2 Haldir

Haldir

    Advanced Member

  • Members
  • PipPipPip
  • 38 posts
  • Country: flag of Germany Germany


Posted 01 July 2010 - 01:00 PM

One of the main reasons for SPKF is that they have a faster convergence of their estimation compared to Extended Kalman Filters (EKF).

The main reason agains SPKF is that the most common flavor SR-UKF is not computational feasible for high update rates and complete multi-sensor fusion systems on embedded platforms. It's several times slower than a comparable EKF.
Even a modern Desktop CPU has problems with a SR-UKF for a 500Hz+ update rate, most common embedded systems have less than a tenth of the computing power available.

The ideal SPKF would run on an embedded system without a fpu and only a few (state vector length or less) SP.

Preliminary information:
Only the (SR)SSUKF, the SOSPF and the GMSPF have less than 2n+1 SP.
Several  other have additive noise cases with less than 2n+1 SP.

#3 Lew Payne

Lew Payne

    Developer

  • Members
  • PipPipPip
  • 76 posts
  • Country: flag of United States United States

Posted 01 July 2010 - 04:33 PM

Thanks for starting this thread.  I'm glad to see others interested in researching this topic.

I'd say there are two important advantages to the various flavors of UKF's...

1) Faster convergence upon startup (as you've already mentioned).
2) Less risk of divergence resulting from poorly estimated coefficients.

Also worth mentioning is the fact that, for our application, the 2n + 1 typically becomes at least 4n for reasons given in the above-cited threads.

#4 Haldir

Haldir

    Advanced Member

  • Members
  • PipPipPip
  • 38 posts
  • Country: flag of Germany Germany


Posted 05 July 2010 - 09:20 AM

Short update from my side:

I implemented an ukf and sr-ukf (additive noise case) in matlab on the weekend (quite a task at ~40°C in my flat). I tested them against some sinus curve process model (used by Dr. van der Merwe in some of his papers) and apparently it works fine. As excepted the rmse of the sr-ukf is a bit better than the normal ukf, but the speed is roughly 30% slower.

I also tried to implement the spherical simplex algo, but failed, my updated cholesky matrices are apparently bad. I don't know why yet, the main problem is, that there are several different papers about the spherical simplex algo, they all have different spherical simplex matrices. I don't know which is the correct one currently. Maybe the spherical simplex SP selection algo does not work with the additive noise special case (I kinda doubt that though)

The different matrices are: No zero'd first column and row, zero'd column only, zero'd column and row. As far as I understand all that hypercube stuff they are all valid.

I guess I'll continue with the Schmidt Orthogonal SP algo.

As for performance (Intel Core2 Duo E8500, matlab, 1000 iterations):

20 state additive ukf: 1.2s
20 state additive sr-ukf: 1.6s

A few assumptions:
An embedded system is at most 1/10 the speed of the Core2 Duo E8500
An optimized native algorithm is roughly 2 times faster than matlab (to be verified later, hopefully there is more gain here).
Sensor calculations add 25%-50% to the load (to be verified later)

If these assumptions are true, the current sr-ukf is about 20% as fast as we need it for a decent update rate of 500Hz for a 20 state vector.

#5 Haldir

Haldir

    Advanced Member

  • Members
  • PipPipPip
  • 38 posts
  • Country: flag of Germany Germany


Posted 06 July 2010 - 05:24 PM

I'll attach my matlab files anyway, even though I'm not sure if my schmidt-orthogonal ukf is correct, atleast I can't find any error in it.

The ekf and original ukf code is © Yi Cao and the hfun/ffun equations are from one of Dr. van der Merwe's papers.

As for speed, let's assume the soukf is correct, then it's two times faster than the additive noise ukf

Attached Files

  • Attached File  ukf.zip   6.1K   346 downloads


#6 Lew Payne

Lew Payne

    Developer

  • Members
  • PipPipPip
  • 76 posts
  • Country: flag of United States United States

Posted 06 July 2010 - 10:11 PM

View PostHaldir, on 06 July 2010 - 05:24 PM, said:

I'll attach my matlab files anyway, even though I'm not sure if my schmidt-orthogonal ukf is correct, atleast I can't find any error in it.

The ekf and original ukf code is © Yi Cao and the hfun/ffun equations are from one of Dr. van der Merwe's papers.

As for speed, let's assume the soukf is correct, then it's two times faster than the additive noise ukf

Haldir - Thank you for putting in the time to try some of this out.  I am trying to get my garage finished so that I can then set up my electronics workbench and equipment.  It's been a frustrating experience working with contractors (I'm not a carpenter).  I look forward to joining you in your experiments, soon !

#7 Haldir

Haldir

    Advanced Member

  • Members
  • PipPipPip
  • 38 posts
  • Country: flag of Germany Germany


Posted 10 July 2010 - 12:10 PM

I went ahead and ported the matlab soukf to c++ using the Eigen Library (3.0 b1), the code is not fully optimized, nor do I know if the soukf is correct at all, but the motivation is benchmarking and the code should be alright for that. See attached zip for the code.
Performance values (Intel E8500, VC++2010), 20 states, 1000 iterations and rather simple ffun and hfun functions (same as in matlab):
Non-optimized X86: 0.35s
Optimized incl. sse2 X86: 0.2s
Optimized incl. sse2 X64: 0.15s

Current conclusion:
The native (non-optimized) version is roughly two times faster than the matlab version. With the previously assumed embedded platform, 10 times slower than my system, + extra sensor computationally load, I'd say the benchmark would take 5s there. This means, we're looking at a 200Hz update rate.
Maybe, without all the dynamic allocations we get another 10-20%. Still only ~50% of the required 500Hz.

Edit: I added a fully templatized version of the algorithm, the speed is up some 10% without the dynamic allocations, Debug mode is close to impossible to run, since it's extremely slow. As expected the compile time went up sky-high. The templatized version is fyi, it runs into serious stack trouble with large states.

Attached Files



#8 Haldir

Haldir

    Advanced Member

  • Members
  • PipPipPip
  • 38 posts
  • Country: flag of Germany Germany


Posted 12 July 2010 - 06:40 AM

I tried porting my code to my TI Delfino board, but apparently CodeComposer Studio does not support std::complex, even after fixing that, I ran into ~50 errors regarding the eigen lib. Since I just have that delfino board and several slow atmega boards, I can't benchmark my code on an embedded platform for now.

So if would like to benchmark the code on any embedded platform (including slow/old Pentiums, Intel Atom etc.) I'm glad to see the results.
I'm especially interested in Arm Cortex (including Cortex A8 (Beagleboard) with GCC 4.5 (apparently has some optimizations 4.4 didn't have)).

#9 Lew Payne

Lew Payne

    Developer

  • Members
  • PipPipPip
  • 76 posts
  • Country: flag of United States United States

Posted 13 July 2010 - 09:10 PM

View PostHaldir, on 12 July 2010 - 06:40 AM, said:

I tried porting my code to my TI Delfino board, but apparently CodeComposer Studio does not support std::complex, even after fixing that, I ran into ~50 errors regarding the eigen lib. Since I just have that delfino board and several slow atmega boards, I can't benchmark my code on an embedded platform for now.

What version of CCS were you using?  I'm eventually going to work on this, and seeing if there's a work-around (including the rewriting of some libraries).  Thanks for taking the lead!

#10 Haldir

Haldir

    Advanced Member

  • Members
  • PipPipPip
  • 38 posts
  • Country: flag of Germany Germany


Posted 14 July 2010 - 07:26 AM

View PostLew Payne, on 13 July 2010 - 09:10 PM, said:

What version of CCS were you using?  I'm eventually going to work on this, and seeing if there's a work-around (including the rewriting of some libraries).  Thanks for taking the lead!

The most recent one, CCS 4.2 (I think), I took the std::complex class from my MSVC installation (since both use the Dinkumware stdlib/stl implementation), but then it failed with several template related errors in Eigen, errors I haven't seen before (my speciality is certainly not (meta)template programming), then I just assumed it's a compiler compatibility problem.

#11 Haldir

Haldir

    Advanced Member

  • Members
  • PipPipPip
  • 38 posts
  • Country: flag of Germany Germany


Posted 17 July 2010 - 01:59 PM

I wrote a quick Matrix class to replace the Eigen lib, same api, only difference is that I used the normal cholesky decomp instead of the sqroot free one. Pretty much the same speed, it uses the templatized version of the algorithm.
CodeComposerStudio can now compile it, but crashes while uploading it to the eval board.

The code is attached.

Attached Files



#12 Haldir

Haldir

    Advanced Member

  • Members
  • PipPipPip
  • 38 posts
  • Country: flag of Germany Germany


Posted 18 July 2010 - 03:46 PM

I did a quite thorough review of Chunshi Fan's "Highly Efficient Sigma Point Filter for Spacecraft Attitude and Rate Estimation" Article to see if it is possible to port his method to a 20 state vector. Unfortunately his method only works if the sensor bias is not changed in the process or measurement models. Most ins/gps systems do not change the bias, but for a temperature corrected sensor model the bias is changing over time with temperature changes.

Anyway, I think it is possible to port his method to a 9dof ins/gps system (including the restriction above), 10 sigma points would be enough for his method. As for complexity, deriving the algorithm for the 9dof system is quite complicated.

Unfortunately Chunshi Fan didn't reply to my e-mail with a few questions considering his paper, so I can't say if he has already ported his method to 9dof.

As for now, I will not pursue his method any further.

#13 Jonathan Brandmeyer

Jonathan Brandmeyer

    Developer

  • Members
  • PipPipPip
  • 46 posts
  • Country: flag of United States United States

Posted 18 July 2010 - 05:51 PM

View PostHaldir, on 12 July 2010 - 06:40 AM, said:

I tried porting my code to my TI Delfino board, but apparently CodeComposer Studio does not support std::complex, even after fixing that, I ran into ~50 errors regarding the eigen lib. Since I just have that delfino board and several slow atmega boards, I can't benchmark my code on an embedded platform for now.

So if would like to benchmark the code on any embedded platform (including slow/old Pentiums, Intel Atom etc.) I'm glad to see the results.
I'm especially interested in Arm Cortex (including Cortex A8 (Beagleboard) with GCC 4.5 (apparently has some optimizations 4.4 didn't have)).

I noted lower down that you ended up hand-rolling your own Matrix class for basic operations, but if you decide to go back to Eigen, then you should note the following issues that I learned the hard way:

1) #include <Eigen/Core> brings in iostream by default (yuck!), so the resulting firmware will attempt to setup cin, cout, cerr, and clog.  My nearly-bare-silicon test environment triggers a bunch of linker errors when that happens, but CCS might be trying to make it work anyway.  It is safe to just comment out that line in the Eigen headers.

2) You don't have to use .inverse() to perform matrix inversion, and probably shouldn't use it at all.  Instead, what I have done is explicitly choose the decomposition that I wanted in code using .qr(), .ldlt(), .llt(), .svd(), or .lu(), depending on my requirements.  Note that even though the matrix being inverted is symmetric positive definite, it can still be so ill-conditioned that Eigen's Cholesky decomposition function will bail out on you.  Be sure to check the return values of your decomposition functions, because they will at least report when they gave up.

In single-precision, you're going to have even greater difficulty with matrix conditioning if you use native units like meters, radians, meters/second, Guass, etc.  One thing you can do is define the units within in the filter in terms of steady-state sigma, instead of native physical units.  Then convert to physical units at the system boundaries when you read sensors and produce output to an outer control loop.

#14 Haldir

Haldir

    Advanced Member

  • Members
  • PipPipPip
  • 38 posts
  • Country: flag of Germany Germany


Posted 18 July 2010 - 08:49 PM

View PostJonathan Brandmeyer, on 18 July 2010 - 05:51 PM, said:

2) You don't have to use .inverse() to perform matrix inversion, and probably shouldn't use it at all.  Instead, what I have done is explicitly choose the decomposition that I wanted in code using .qr(), .ldlt(), .llt(), .svd(), or .lu(), depending on my requirements.  Note that even though the matrix being inverted is symmetric positive definite, it can still be so ill-conditioned that Eigen's Cholesky decomposition function will bail out on you.  Be sure to check the return values of your decomposition functions, because they will at least report when they gave up.

In single-precision, you're going to have even greater difficulty with matrix conditioning if you use native units like meters, radians, meters/second, Guass, etc.  One thing you can do is define the units within in the filter in terms of steady-state sigma, instead of native physical units.  Then convert to physical units at the system boundaries when you read sensors and produce output to an outer control loop.

Thanks, about the #iostream suggestion, I switched that fast to my own matrix class, that I didn't try to fix eigen lib ;)

Anyway, both eigen lib and my matrix code is using stable code for inverse(), eigen is using full-pivot-lu decomp inverse() and mine is using the llt cholesky decomp. Inverse() is just a wrapper for them.

Before I can go to float, or fixed point math I'll need my own lldt (sqroot free cholesky decom), it's quite a bit more stable than the llt, especially considering float.
or fixed-point later on.