Sunday, December 28, 2008

First bugs found! :)

The good news: I found new bugs in the playback module.
The bad news: I found new bugs in the playback module! :D

When the AVController is paused, I haven't yet implemented a way to signal the main thread that it's paused. See, we have to check not one thread, but four. So that means we can't use a single flag, but four.

The problem is, we need to detect different conditions:

If the thread is invalid,
if the thread is valid but paused,
if the thread is running,
if the thread has stopped.

And this for each thread. But we need to check if the threads haven't aborted abruptly.

At least we're closer to getting this to work.

Update (28/Dec/2008, 11:41PM):

The play/pause bug has been fixed now :). No more deadlocks! Now we're going to test the Demo (the blue colored figurines in the screenshots) using AVController. But that'll be for another day, I need to go to sleep, it's very late already.


Saturday, December 27, 2008

Video Playback controls advancing...

Good news (as usual ;-), I've managed to tie the AVController class (actually, the AVPlayer class, which is a subclass of AVController) with the playback buttons.

Whenever I press "play/stop", the AVPlayer instance receives a play or a stop event (a Saya event, not a wxWidgets event - this is what makes it important, because it makes our core classes still toolkit independent). Same goes when I press "fast forward", "fast rewind" buttons, etc.

I'd post a screenshot, but a screenshot of a messagebox is pretty boring.

In any case, now I have the tools to start testing the playback framework I've been working so hard during these 4 months.

I'm so excited! I feel just like Dr. Frankenstein waiting for the thunder to bring my creature to life, mwahahahahah!

Friday, December 26, 2008

The C++ Source: A wonderful C++ website

The C++ Source is an online journal about C++ programming, it covers many advanced topics including advanced (and often below-the-radar) uses of C++ templates.

I also read two very interesting articles:

Top C++ Aha! moments"
Top C++ non-book publications

Thursday, December 25, 2008

First donation received, and wxSlider skipping problems.

First of all, thanks to Zarxrax from Doom9 for his generous donation to the project.

That said, I'd like to complain about wxSlider. The wxWidgets code for the slider widget has this annoying behavior: If you click on anywhere in the slider, the slider reacts as if a pageup/pagedown had been pressed. I absolutely hate that. Sometimes I want to repeat or advance a DVD scene in VLC player, to have the player skip around 20 minutes or so because I didn't grab the slider control correctly. ARGH!

This can be solved by making the page size to 1/50 of the total slider length. It's still not perfect, but that's the only workaround. But I'm definitely considering writing my own slider control. Now, where was that native widgets implementation I saw the other day on the web?

Tuesday, December 23, 2008

A lot of advancements... behind the scenes.

Merry Christmas everyone! I have good news, and not-so-good news.

The good news is that a lot of work has been done on Saya. Especially the last 2 days, because I got sick and had to stay at home, so that gave me a lot of free time :P

Here are some advancements that have been made:

  1. Since I was asked to code Saya in QT4 (as opposed to wxWidgets), I've been working really hard on getting rid of wxWidgets dependencies. So that means I had to make a lot of wrappers, substitute classes, etc. These include but are not limited to:

    1. A completely new, template-based, event handling system. Saya classes can now emit and receive their own events! Unfortunately there was no way to get around glib's (gtk) main_loop - but since even Qt made a wrapper around it, it's better to use what works. So I had to make a new wxWidgets event type to signal wxWidgets that a new Saya event has been queued. So far the only time I use an event is to signal the Main Frame that the currently-active project has changed.

    2. A completely new syApp class: Saya is becoming more like a cross-platform toolkit (with more and more files in the core/ path) and less like an application, but that doesn't worry me. The point is that I managed to invoke the wxWidgets UI with a function from the plain old main() function. This also means that we could turn Saya into a non-graphical tool.

    3. With the new syApp class (and its subclass wxSayaApp using wxWidgets), I no longer need the sayaEventHandler class, and I no longer need the main window to have its own functions for opening dialog boxes. All dialogs from now on are invoked by syApp. I even added a syFileSelector function. This also helped me remove some wxWidgets-specific includes.

    4. The new syString class (a class encapsulating a const char*) does everything we need from wxString, including efficient concatenation, conversion from various numeric types, printf-like formatting, etc. And because I added some conditional includes, you can convert from/to wxString without having to call a function (the only exception is when a wxWidgets function needs a const wxChar*, you need to do the casting specifically: wxString(mystring).

    5. Along with the new syString class, serialization and de-serialization will become piece of cake when I get to implement it.

    6. The source code tree has been reorganized. Now we have the following directories:

      ui/ (all the code using wxWidgets, and our main application)

      saya/ (The directory for our video editing framework)

      saya/core (The directory for our generic cross-platform application framework)

      saya/timeline (The directory for our timeline-specific classes).

  2. Some bugs in the ioCommon namespace (I really have to put my namespaces in order) have been fixed, like renaming / deleting a file.

  3. Most classes have been moved to the private implementation parts, and this has helped me get rid of TONS of unnecessary includes. With these changes, we can recompile Saya (without -O2) from scratch in only 22 seconds.

  4. Memory handling has been given special care: For temporary objects, now we use std::auto_ptr whenever possible.

  5. I've created the following classes: AVPlayer, InputMonitor, PreviewMonitor (well it's now coded yet, but it'll be a minor adaptation of InputMonitor). AVPlayer also inherits syEvtHandler, meaning it can catch events you send. I'm *very* close to actually being able to reproduce video thanks to this.

So that's the good news. The not-so-good news is that I haven't been able to get video playback to actually work... yet. It's a very steep curve, but I'm sure all my work hasn't been in vain. I promise that next year I'll actually have a workable media player for Saya.

Merry Christmas, and happy new year!

Friday, December 19, 2008

Building Saya with CMake... Coming soon!

Hello everyone. For some time i've been considering building Saya without requiring to install Code::Blocks and wxWidgets (which is a hassle for beta-testers - they just want to run a script and have it compile Saya so they can test it right away).

Here's the forum post which made me consider everything:

After giving a glimpse over various automation tools, two tools called my attention: SCons, and CMake.

SCons is a tool supposedly cross-platform tool which I need to use at work. It uses python scripts. UGH, python! So, why did it call my attention? Because of its complexity and python requirements. In other words, it's a tool I would NOT like to use :)

The alternative was CMake. We know CMake has support for Code::Blocks projects and workspaces (that's a plus!). But I was delighted when I realized that CMake is the tool used to build KDE 4.

The best part of using CMake is that the only thing you need is:
a) Your compiler
b) CMake

Autotools and SCons, on the other hand, require you to install more scripting languages (i.e. perl, python) to make your file. This is REALLY what I like about CMake. It's lean.

Next year I'll start to work on being able to compile Saya with CMake.

Monday, December 8, 2008

Spammers starting to annoy me

There's an idiot who wants to promote his non-free video editing software on my blogs. This means I'll have to moderate all comments from now on. Which is a good thing, because now I'll find out who posts comments.

Oh, just for your amusement, here's a copy of the e-mail I got from blogger:

淑丽 has left a new comment on your post "Saya developers meeting #3 (2008-11-28): Summary o...":

(VideoEditorSoft dot com - link de-httpified to prevent more spamming) provides free easy and popular Video Editor,
Video Joiner,
Video Splitter,
Video Cutter,
Video Clipper,
Video Merger,
DVD Video Editor,
DVD Video Author.

If you are looking for video Editor for Apple Mac OS X program, please refer to Mac Video Editor,
Mac DVD Video Editor.

Posted by 淑丽 to Saya Video Editor official Blog at December 7, 2008 9:49 PM

Just so you know, his profile id is available at If he tries to post again, I'll notify blogger so they can kick him out. :)

By the way, I really *DON'T* recommend downloading from his site, who knows if that thing's got a virus or something.

Sunday, December 7, 2008

Got rid of some of the STL classes, reorganized stuff...

I was talking with the author of the uSTL about how to avoid STL code bloat, and I ended up asking him a question about strings. He told me that to concatenate strings I should use the ostringstream class for concatenations.

So I began to replace the references to std::string in the timeline classes, and designed a new form to serialize them.

I made a C++ wrapper for ostringstream so that I could later switch classes in case I needed to. Guess what, I did switch later. Because I also needed to design a std::string lightweight replacement (I've called it syString), I ended up adding methods to concatenate strings in an efficient way. I use the "size doubling" technique, up to a certain point, then the string only increases by amounts of 64K. I haven't tested this strategy well, but I may change the strategy in the future by adding a static parameter.

I also reorganized the header files and cleaned up a lot of bloat. Some classes included the STL (I still need std::map and std::vector) for fixed types, so I moved the std inclusion to the CPP.

I also fixed the precompiled headers inclusion, and removed a lot of unneeded wxWidgets includes from the GUI files.

As a result of this optimization, I've cut compilation time by 33%, yay! :)

Unfortunately, I stopped working on the playback engine, but I'll start. Playback functionality needs to be finished and tested ASAP.

Saturday, November 22, 2008

Playback widgets status

The playback widgets have been uploaded to SVN, but they haven't been integrated into the project, as they need more work. But soon we'll be able to load and play a "dummy" video file. When playback is done, we'll move on to deal with the codec issues.

Sunday, November 16, 2008

Slow but sure advances in the playback...

I've moved some of the private properties of classes VidProject and ProjectManager to the .cpp. This will allow me to skip recompiling some modules whenever add a private member to the classes.

Also, I've managed to create my first instance of AVController (member of the hidden ProjectManagerData class). I hope to have the demo running on threads in the next few weeks.

Also, Rigo is working on the playback controls using wxFormBuilder. Here's a sneak peek:

When it's done, I'll add this control to the wxVideoPanel, and finally our editor will start to look like... a Video Editor :)

Friday, November 14, 2008

Playback widgets advancing...

Rigo, Jeff and me have been discussing the design and implementation of the widgets for playback control. Rigo's really pushing himself to work soon on it.

I'll let you know when we have something working.

Finished the first "official" week in my new job!

Finally I've been given work to do other than reading manuals and documentation.

I'm currently working for a web filtering company. They develop an all-in-one solution for filtering SPAM, viruses, porn, phishing sites etc.

The good news:

* The environment in here is very friendly. We're not forced to wear specific clothes, the time to eat is at the worker's discretion. It's very relaxing.

* The methodologies followed are pretty neat: We have a wiki, we use Mantis issue tracker, have a subversion repo, there are coding guidelines, and some of the code have very interesting patterns. It's like my dream job! :D

* Now that I've studied some of the code of the company's products, it's like every hacker's dream come true: TOTAL control over internet traffic, sending RST packets to block P2P traffic, (I won't discuss the morality of sending RST packets, that's up to the customers to apply them or not - but the appliances can be used at home, for companies intranets, etc -, man-in-the-middle eavesdropping for https transactions...

Neo: Whoa.

The bad news:

* The schedule. You start working at 7AM :( This leaves me exhausted for the rest of the day, which unfortunately leaves me too tired to work on Saya. I really hope I can advance during the weekends.

But I love it! And the pay's definitely worth it.

Tuesday, November 4, 2008

It's DLL time...

It's time to split the project into its components.

Saya is originally meant to be a wxWidgets application running on top of two shared libraries: "libsaya", which handles all the editing, and "libsayacore" which does the very low-level multimedia stuff. libsaya depends on libsayacore, of course.

For this I've had to do some wizardry and rename the vidproject/ directory into saya/, and move the iomgr/ directory into saya/core/ .

Now, I really don't know how I'll do this shared libraries stuff, but I still can check the codeblocks source code :)

Sunday, November 2, 2008

First rule about the fight club...

you do not talk about the fight club.
Second rule.. you DO NOT talk about the fight club.

So, I joined the fight club. Actually it's more like a secret mailing list whose owners happen to be working on multimedia projects. Those might or might not include video editing. One of the current threads in the mailing list include comments and experiences about some open source multimedia toolkits, like MLT and gstreamer.

Unfortunately I cannot give any more details of who the members are and what projects they're working on. But it's getting very interesting. Hopefully I will gain enough knowledge to perfect my own toolkit.

I'll keep you posted.

Wednesday, October 22, 2008

Back in the saddle again... kinda.

After nearly 3 weeks of not writing any code, I finished making some changes to VideoOutputDevice class. The class methods LoadVideoData and FlushVideoData are lock-free with respect to each other. This means that while I'm writing data into a bitmap, I can write the previous bitmap into the screen.

In any case, tomorrow's my last day at my current job, and I really don't know how late i'll be returning from home. And this Friday I'll go visit my new job. I need to return early home because then we'll have the October meeting for the Saya devs.

Let's hope my new job gives me enough time to work on the project. Wish me luck!

Sunday, October 12, 2008

Finished installing and customizing Mepis. Hurray!

I finished making the necessary modifications and software installations to my distro. You know what? I actually had to read the Saya-VE developers' guide for GNU/Linux :P. In any case, I already managed to compile and run Saya.

Hurray! Now I can continue developing... on GNU/Linux. I won't be able to debug the Windows version until I install VirtualBox, tho.

Friday, October 10, 2008

From PCLinuxOS to Mepis: Painful, but worth it!

If you've installed PCLinuxOS with an advanced partitioning, prepare for a painful surprise. SimplyMEPIS (up to 8.0 beta 2) doesn't let you install /usr, /var and /tmp into the partitions of your preference.

Which means that once you've logged in, you need to enter the console (through ctrl-f1), mount the other partitions, delete their contents, cp -a /usr /your-usr-partition-mount-point, rename the other partition, edit fstab, etc. etc. etc.

It's not something that you can do while in the LiveCD (I've tried to an extent and I failed miserably).

Also, if you've used a non-standard file system such as XFS, just forget about it. You need to mkfs.ext3 the other partition (in this case, /tmp) and then proceed. Speaking about /tmp, an additional step is required: You need to delete /tmp, which is a symlink to /var/tmp.


But after I managed to do all that manual installation work so that I retain the structure of my partitions (which I carefully tuned), I'm actually enjoying it.

For starters, you get nVidia support right out of the box. I booted the LiveCD in 1280x1024 with no problems whatsoever.

The KDE provided with Mepis 8 is awesome. It's not KDE4, but 3.5.9, and that gives you stability. But this KDE is better configured than the one with PCLinuxOS 2007. I can pump the volume up and down with my multimedia keys. In PCLOS, I had to set up the mixer manually (not without the help of some friends on slashdot and irc) and add some lines in some obscure alsa config file.

The kernel: PCLOS2007 ships with 2.6.19, and the latest upgrade is 2.6.22. Mepis 8 ships with 2.6.26, which also means support for tons of more wireless and external devices.

Package management: The Synaptic package manager is also provided as default (yay!), but Mepis provides an excellent addition: The actual applet tells you how many upgrades are available. And finally, there's no "pay-for-upgrade" policy. The latest packages are already available for no additional price (like Firefox 3.0.3, which I just downloaded an hour ago), and the repositories are based on Debian, one of the most community-supported repos. Try to beat that, Tex!

In short:

If you're stuck in PClinuxOS and want to try a debian-based distro, the upgrade is painful (unless you happen to have installed in only one partition), and you need to delete your packages - for compatibility - anyway. But it's really worth the upgrade, provided you manage to install prettier themes. Personally, the default ones suck - but hey, it's a beta version. And no crashes! :)

If you're undecided between PCLinuxOS and Mepis, go for Mepis. Nothing beats getting connected to the default Debian repos.

Thursday, October 9, 2008

Mandriva, distro headaches... and Mepis 8.0 beta 2

Sigh. I only finished downloading the Mandriva 2008 ISO so I could burn it tonight and install it. I chose Mandriva because its configuration is the closest to PCLinuxOS.

First bad news: I realized that I didn't download the most recent 2008.1, but the outdated 2008.0! ACK! 4 hours wasted.

Second bad news: Mandriva 2009 is out, and yet, it has lots of bugs. I'm not going to upgrade to a distro which freezes the GUI every now and then. What is this, Windows?. Also, I couldn't find the 2008.1 isos because the mandriva guys made them unavailable from the web. I had to use Google to find it. Now it's nearly 34% complete.

Third bad news: 2008.1 also has some bugs, and it's definitely annoying that the guys don't have the time to release a 2008.2 with all the bugs fixed - and no new features. No, instead they chose to go for KDE 4, an unstable branch of the Linux kernel, and lots of bells and whistles. I don't want to cut my hand with a bleeding-edge distro. I want something that JUST WORKS!

Therefore, I decided I'll use MEPIS.

Why Mepis?

1) It uses KDE! :D And the 3.5 branch! :D :D

2) At first it used the Ubuntu repository, but now it switched to the latest Debian - can't be any more stable than that!

3) It's a debian distro! Lots of packages are distributed as .debs.

4) According to this Linux discussion, Mepis is even more stable than Mandriva. And this other discussion gives me the impression that Mepis has surpassed Ubuntu.

In fact, I would be going with Ubuntu (most popular distro these days), if not for their negligence towards KDE.

Ah, here's a Mepis 8 review at deviceguru. Just what I was looking for. Hey, what do you know? The review's login screenshot says "Rick". Must be a sign ;-)

For the impatient ones: MEPIS 8 beta 2 (32-bit) torrent - courtesy of

I'm tired of waiting, I'll leave my computer on to finish the download.

Wednesday, October 8, 2008

Switching to Mandriva 2008.1

Tonight I'll make a bold step: I'll switch away from PCLinuxOS 2007, and move to Mandriva 2008.1 "one". The PCLinuxOS repositories are simply too small and too slow. Plus, Mandriva has the newest kernel.

This will help me with a lot of things - for example, I can try out the latest version of Kdenlive (for research purposes and maybe for making 1 or 2 short AMVS ;-) ). I'll also be able to use the latest GIMP. I tried compiling it on PCLinuxOS, but I just can't figure out how to tell it to compile right.

Anyway, I'll need to back up the important Saya stuff. *GASP* I need to back up my media, too! :( Ugh! Good thing I have a fresh 8GB Flash drive around.

If things don't go well, this may stall Saya development for a couple of weeks. Wish me luck! :)

I decided I'll use PortAudio

After a little research (actually a web search and a chat), I decided I'll be going to use PortAudio for the audio output. That should complete the components required for Audio Output.

Speaking of audio, I need to deal with using a different number of channels for the input and the output. With the help of Stefano D'Angelo, of, I managed a way to specify to the audio buffers some parameters for conversion. But I doubt I'll have that ready soon - first I need to complete the playback functionality. Later I'll worry about channel conversion.

Tuesday, October 7, 2008

Google Tech Talks on Youtube!

I hadn't known this, but it turns out that Google has been posting their tech talks for more than a year.

Two particular talks got me interested: The first, is a presentation made nearly a year ago about the new features in the upcoming C++0x standard. (Warning: 1 hour video!)

The next talk concerns us more - it's a fast lock-free implementation of a hash table. Hash tables are used in all kinds of programs, but if many threads try to access a hash table, they need to have exclusive access. With a lock-free hash table, you don't need exclusive access to add or remove an item from the table. In fact, you don't even need exclusive access to read the table! (warning: another one-hour video!). Here's the lock-free hash table presentation in PDF format.

The Java source code is available on Sourceforge:


Thursday, October 2, 2008

Fixed first thread bugs... and improving :)

I'm fond to announce that the threads module NOW WORKS! :D

The first test made by Rigo was pretty simple, and I've added a few things here and there. But I've managed to get the module to be more stable than the wxWidgets implementation!

For example, I've added a function that allows you to know if a thread is detached or not. If the thread was deleting itself, under wxWidgets, you could crash the system. The new implementation simply adds the thread to a detached threads list. So we now keep a joinable list and a detached list.

Also, when deleting a detached thread, we remove it from the list (with carefully-placed mutexes) so that you can't really double-delete a thread, even if you try.

I've also fixed a couple of bugs in the pause/resume methods (a stupid bug, I had used a sentry object but the object was never created, I had just invoked the constructor with no object to construct, doh :P ).

So this means I'll now be able to keep working on the playback base module.

In the future I have planned to make the modules separate shared libraries.

Wednesday, September 24, 2008

Memory barriers and SSE2

After reading that lock-free code would be unstable on multiprocessor machines (dual cores, etc.), I studied that there was a way to prevent such things from happening: Memory barriers.

A memory barrier makes sure that all the data you've written to memory effectively goes to the memory and doesn't stay in the CPU Cache.

Unfortunately, according to a recently reported GCC bug, GCC's instruction for memory barriers, __sync_synchronize(), is flawed in x86-64 processors (Pentium Core Duo, AMD 64 x2, etc) because it doesn't implement the instruction "mfence".

mfence is an SSE2 instruction (SSE2 is available for Pentium-4 and later processors) that implements an efficient memory barrier. There are other implementations to do a memory barrier on x86 processors, which I won't mention here for reasons of space. So I replaced the __sync_synchronize() primitive with qprof's AO_nop_full(), which invokes a memory barrier, using mfence if available. Also, I added this function to all the atomic primitives, and also to the syAudioBuffer class.

Now I can be 100% sure that the code won't crash. At least for multiprocessor-caused race conditions, anyway. ;-)

Tuesday, September 23, 2008

"mutable" keyword, and Major rewrite in the core...

After redesigning the syAudioBuffer and implementing some stuff in the AudioOutputDevice, I realized that my assumption to use only one thread for reading and writing led me to write some very defective code, so I almost have to rewrite the entire thing. Fortunately I have finished doing so with AudioOutputDevice. But VideoOutputDevice, VideoInputDevice and AudioInputDevice are yet to be rewritten. Sigh.

Additionally, I do need to redeclare the majority of functions as const. Fortunately, now I can do so freely because I have discovered the mutable keyword!

mutable allows you to allow some variables in the object to change, even if the modification resides inside a const function, or if the object was passed as "const". This way I can declare the thread synchronization variables as "mutable" and still enjoy the compile-time protection of const.

Update (Sep 23, 2008):

Finally some good results from the rewrite! I discovered a bug in syBitmapCopier that instead of copying the source to the destination, it copied the destination to the source!! :-S Yikes.

It's nice how all these knowledge tidbits contribute to improve a programmers' level as a whole.

Wednesday, September 17, 2008

Happy 17th birthday, Linux!

Some hours ago, Linux became 17 years old! I have to mention that after studying the kernel code (because that's what Linux is, a kernel, NOT an Operating System!), I realized that calling the different distributions "Linux" is wrong.

Probably you'll say "aw man, not that Stallman crap again". Nope. It's actually the opposite. I've been having some annoying problems with the Linux DISTRIBUTION (which includes the kernel, the GNU operating system files, KDE and some other stuff) I'm using right now. But saying "aw man, my Linux isn't working" is WRONG. It's not the Kernel's fault! It's the stupid distro's fault.

Because as I have studied the multithread code, the posix mutexes implementation, and read about the Completely Fair scheduler, I realized:

Man, this kernel is one of the most beautiful things created by men. For starters, it implements posix threads, which have a beautiful and simple API compared to... eeew... Windows - trust me, I've been using this thing.

Another example of Linux' beauty is one of the recent patches for the scheduling code, which was merged in 2.6.25. It implements something called ticket spinlocks. To explain, a spinlock is an item that can only be possessed by only one thread at the same time. And all threads keep competing for it, and they never give up until they get it. Not a nanosecond of rest. Imagine a discounts sale on a women's clothing shop. Yes, it gets that ugly :P.

But the latest patch makes each thread take a "ticket" and wait for its turn. When I read that I was like... whoa.... you just blew my mind. I didn't even SUSPECT things like this could be implemented! Even after 17 years, Linux (yes, the Kernel) is getting more and more beautiful.

Now, repeat after me:

Linux. Is. Beautiful.

So now on my to-do list is to replace all the "Linux" references in Saya's website for "GNU/Linux", to give the Kernel (represented by that cute little penguin we call Tux :) ) the place it deserves.


As a gift, here's a cute paper tux to decorate your desk. By the way, you can edit the image to replace that logo with your favorite distro's logo. Enjoy the party :)

Monday, September 15, 2008

New AVController design: Four threads, audio frames.

After not visiting the Audio Video playback/sync module for a while (due to my work with the threads and the API cleanup in the timeline model),I finally decided to print the source code. Believe it or not, it allows you to concentrate better and see the code as a whole. So it was like 20 to 30 pages, stapled per file for a better organization.

Anyway - I studied the code for avcontroller.cpp, and I saw a fatal flaw in the premilinary design: There was only ONE worker thread, and I need four!


The code does more or less this: It tells the VideoInputDevice to send data to the VideoOutputDevice, which tells VideoOutputDevice to load the data. The problem is that VideoOutputDevice also Renders the data after loading it, instead of invoking a separate thread or something.

So there's a sequence to load the video:

1) VideoInputDevice fetchs the video from a file or whatever. This implies a HEAVY OVERHEAD as it can imply either decoding, or compositing.

2) When the data is available, VideoInputDevice sends the data to the VideoOutputDevice.

3) After receiving the data, VideoOutputDevice sends the data to the display or encoder.

Which means, that while the decoder processes the input, the encoder is blocked. While this can be fine for Video, where you can just play the same frame over and over, it's NOT fine for audio where the buffer can simply go blank.

And then I realized an even WORSE problem: There is no separation between Video and Audio processing time! It's all done with ONE thread!! So while the Video is trying to decode / process the video effects, the audio can run out of data.

To solve this problem, we will use FOUR threads: Video In, Audio In, Video Out, and Audio Out. If one operation is blocked, it won't affect the others. Of course, if the video or audio simply run out of data we'll have to stay silent. But at least the Audio Input won't affect Video Ouput or viceversa.

Another problem is that using Mutexes can get things slow, specially for audio. I've been researching for a while about lock-free data transfer, and I remember one guy saying something about audio FRAMES.

What? O.O *blink blink* Frames in audio? O.O *blink blink*

That's right. By splitting the buffer into separate small frames of time (say, 1000 samples) we can use a fixed buffer of POINTERS to those frames.

This means: Instead of having this:


we have this:

A [.....]
B [.....]
C [.....]
D [.....]
E [.....]
F [.....]
G [.....]
H [.....]
I [.....]
J [.....]


So a thread can reserve one of the N frames of audio for buffering, and instead of having to avoid touching a HUGE GODZILLA chunk of data, we can just meddle around with one of the little godsukis. And the list which tells which order they should be played, can be a lock-free list (this means it will have thread-safe algorithms which do not require mutexes or any of that.

Finally, with the lessons learned during the thread module's development, I can finally delete that awful sleep-based synchronization and start using semaphores and/or condition variables. I think I'll go for condition variables, as I can wake up the thread either when I need to send data, or when I need to stop it.

Edit (Oct 2,2008):

The actual redesign didn't use audio frames, after all. It was much simpler to use a circular buffer with one sample per slot.

Sunday, September 14, 2008

Forward declarations: Not good enough?

It's widely known by C++ programmers that replacing classes instantiations with class pointers in your classes, allows you to do a forward declaration in your headers and save compilation time.




class A {


#include "a.h"
class B {
A myvar;



class A {


class A;
class B {
A* myvar;

Then, in the CPP you do the #include. But this is NOT enough! Since most of my classes are pretty brief, due to the fact that some are template instantiations, I have not only to include in each .cpp the headers for all the classes and ALL the classes contained in those classes ad infinitum. OK, If I followed the first approach I would only "include one file", but that's not the point, since that "one file" would start an include chain. So it doesn't matter what I do, in each .cpp I always end up invoking all the .h files and end up with a gigantic object file.

Let's see what this little code does:

#include "smap.h"
#include "avclip.h"
#include "avtransition.h"
#include "aveffect.h"
#include "aveffects.h"

m_Effects = new AVEffects;
m_EndingTransition = new AVTransition;
m_Markers = new SMapUintUint;

delete m_Markers;
delete m_EndingTransition;
delete m_Effects;

why do I need to include all the headers JUST TO CREATE AN OBJECT? Because by doing "m_Effects = new AVEffects", I call AVEffects::AVEffects(), and for that I need the include file. Which also ends up invoking svector.h, which also includes serializable.h and . Eew!!

Enter forward declarations in the .cpp files!

To solve this problem, I created a class named AVClasses, which encapsulates object creation, deletion, serialization AND deserialization for ALL THE KNOWN CLASSES.



class AVSettings;
class FXParameterList;
class AVEffectParamDeclaration;
class AVEffectParamDeclarations;
// ...

class AVClasses {

static void Create(AVSettings* &ptr);
static void Create(FXParameterList* &ptr);
static void Create(AVEffectParamDeclaration* &ptr);
static void Create(AVEffectParamDeclarations* &ptr);
// ...

static void Delete(AVSettings* &ptr);
static void Delete(FXParameterList* &ptr);
static void Delete(AVEffectParamDeclaration* &ptr);
static void Delete(AVEffectParamDeclarations* &ptr);
// ...

static std::string serialize(AVSettings* &ptr);
static std::string serialize(FXParameterList* &ptr);
static std::string serialize(AVEffectParamDeclaration* &ptr);
static std::string serialize(AVEffectParamDeclarations* &ptr);
// ...

static bool unserialize(AVSettings* &ptr, const std::string& s);
static bool unserialize(FXParameterList* &ptr, const std::string& s);
static bool unserialize(AVEffectParamDeclaration* &ptr, const std::string& s);
static bool unserialize(AVEffectParamDeclarations* &ptr, const std::string& s);
// ...

This little toy hides the object creation, destruction and (un)serialization. Of course, the avclasses.cpp file will include ALL THE HEADERS. It's fun!


#include "avclasses.h"

#include "avsettings.h"
#include "fxparameterlist.h"
#include "aveffectparamdeclaration.h"
#include "aveffectparamdeclarations.h"
// ... etc.

void AVClasses::Create(AVSettings* &ptr) { ptr = new AVSettings; }
void AVClasses::Create(FXParameterList* &ptr) { ptr = new FXParameterList; }
void AVClasses::Create(AVEffectParamDeclaration* &ptr) { ptr = new AVEffectParamDeclaration; }
void AVClasses::Create(AVEffectParamDeclarations* &ptr) { ptr = new AVEffectParamDeclarations; }
// ...

void AVClasses::Delete(AVSettings* &ptr) { delete ptr; ptr = NULL; }
void AVClasses::Delete(FXParameterList* &ptr) { delete ptr; ptr = NULL; }
void AVClasses::Delete(AVEffectParamDeclaration* &ptr) { delete ptr; ptr = NULL; }
void AVClasses::Delete(AVEffectParamDeclarations* &ptr) { delete ptr; ptr = NULL; }
// ...

std::string AVClasses::serialize(AVSettings* &ptr) { return ptr->serialize(); }
std::string AVClasses::serialize(FXParameterList* &ptr) { return ptr->serialize(); }
std::string AVClasses::serialize(AVEffectParamDeclaration* &ptr) { return ptr->serialize(); }
std::string AVClasses::serialize(AVEffectParamDeclarations* &ptr) { return ptr->serialize(); }
// ...

bool AVClasses::unserialize(AVSettings* &ptr, const std::string& s) { return ptr->unserialize(s); }
bool AVClasses::unserialize(FXParameterList* &ptr, const std::string& s) { return ptr->unserialize(s); }
bool AVClasses::unserialize(AVEffectParamDeclaration* &ptr, const std::string& s) { return ptr->unserialize(s); }
bool AVClasses::unserialize(AVEffectParamDeclarations* &ptr, const std::string& s) { return ptr->unserialize(s); }
// ...

I took a look at the filesize of the avclasses.o. It measures 579K! Half a meg!

But now let's see what happens if I use AVClasses inside the previous AVClip.h:

#include "avclip.h"
#include "avclasses.h"



And the results are the following:


avclip.o: 105356 bytes.


avclip.o: 33136 bytes.

Needless to say, compilation is almost instant!

Yes, there is a drawback to all this. That extra call when creating the objects. But if we look at the AVClasses code, it's more like a jumptable, except for the serialization stuff, which would duplicate all the memory transfers. But that can be solved by simply modifying the code to pass a string by parameter instead of returning a string.

So we save a lot of duplicate code in the object files for a tiny runtime penalty. Is it worth it? WAY YES!!!

When I finish changing all the code in my files I'll tell you how much the compilation time disminished.

Update: After changing lots of files to se this approach, I noticed that the gain in compilation time is only marginal. I wonder if the change is really worth it.

Update (Sept 15, 2008): I finally realized that this is nonsense, it's not worth it, and just stupid. Also, I also made another realization: The current STL-based model is optimal in complexity. I may not like the STL bloat, but the fact is that it keeps programming SIMPLE. I don't need t ocomplicate myself with more pointers and implement lots of "exceptions-to-the-rule" routines.

So I reverted my SVN copy to the one before these changes and kept going. However, it was a good experiment to make, and this way I won't have to worry with yet another "what if I had..." thought. Now move along.

Thursday, September 11, 2008

Threads module FINISHED! (for real this time)

After moving all the private data to the CPP, cleaning up the API (that means it's stable now, unless I add new functions) and fixing a few things, I'm glad to announce that the Threads module is Finished!

It costed me many hours of sleep, many e-mail exchanges with linux kernel experts, but finally, it's done.

Now, back to the playback framework.

Wednesday, September 10, 2008

Con Kolivas answers: Yield() is bad. VERY bad.

Con Kolivas, the mind behind the new Linux scheduler, answered a simple question for me.

It's the same Sleep() vs. Yield() issue that has been debated for ages, but this time we've already discarded sleep, so the question was more Yield() vs. wait.

Question: What should I use to put a thread to sleep for a limited time if there's no work to be done? A semaphore, or a yield?


Yield just drops the cpu for other processes and threads to go first. It does not specify who should go next, how many processes should use cpu next, nor does it say when to return. Let's say you yield and one nanosecond later the work is available. On a loaded system it's theoretically possible to yield for up to 10 seconds. That's 9.99* seconds too long that you've yielded. Alternatively, if the thread yields for .5 seconds and the work is available
at .6 seconds, the yielding thread comes back and there is no work available, that thread now wastes cpu uselessly, checking for work and just yielding again for however long.

With semaphores, you rely on the work actually being available before waking up the thread. There is no wasted cpu at all, and the thread is signalled very fast that it should now wake up.

It takes more effort to keep track of where you've put your semaphores, who's waiting on what and so on, but in the long run the results are potentially much better. How big the gain is depends on how parallelisable your workloads are. Nonetheless, wasting cpu doing nothing when there is a signal mechanism to only wake things when there's work available is much better.


Thanks a lot, Con! Now onto rewriting that userspace mutex again... *sigh*

Update: I forgot to thank Joseph Seigh of atomic-ptr-plus, he was the one who suggested using a semaphore in the first place.

Monday, September 8, 2008

Mutexes revisited; Scheduling questions.

I finally completed implementing the sySafeMutex class.

My locking routine was something like this:

bool SafeMutex::Lock() {
bool success = false;
while(!success) {
success = TryLock();
if(success) { break; }
if(aborter && aborter->MustAbort()) return false;
Sleep(1); // Sleep for one millisecond

The problem is that if in that sleep time the other thread uses the mutex, then we fall in a problem called Resource starvation; The thread that begins to sleep never gets the mutex because other threads get the lock while the sleeping thread missed it. So instead of sleeping for a huge millisecond I only sleep for one scheduling cycle:

bool SafeMutex::Lock() {
bool success = false;
while(!success) {
success = TryLock();
if(success) { break; }
if(aborter && aborter->MustAbort()) return false;
Yield(); // Sleep as little as possible
But the problem of resource starvation still threatens us; So what can we do? I added a tight loop that tries to lock the mutex for the first 30 iterations. A tight loop when trying to lock a mutex is called a spinlock. Spinlocks are very CPU-inefficient (they consume too much CPU), but if they're combined with sleeping, they make things more balanced.

bool SafeMutex::Lock() {
bool success = false;
unsigned i = 0;
while(!success) {
success = TryLock();
if(success) { break; }
if(aborter && aborter->MustAbort()) return false;
if(++i>=30) {
i = 0; Yield();
Now I thought: Let's make the last thread that used the Mutex wait more the next time:

bool SafeMutex::Lock() {
bool success = false;
unsigned i = 0;
bool waspenalized = false;
while(!success) {
if(!waspenalized && m_LastThread == GetCurrentThreadId()) {
waspenalized = true;
success = TryLock();
if(success) { m_LastThread = GetCurrentThreadId(); break; }
if(aborter && aborter->MustAbort()) return false;
if(++i>=30) {
i = 0; Yield();
By penalizing the last thread who got the mutex with an additional Yield(), we avoid the situation of one thread getting the lock all the time. Unfortunately, this adds an unnecessary yield() every single time we try to get the lock!

After I studied more, it turns out that mutexes, semaphores and all those synchronization primitives rely much more on thread scheduling than on simple tricks. For example, a Semaphore puts the first waiting thread next on the line for execution - this goes way beyond simple user implementations. But for our purposes, I think the proposed implementation (without the yield) will suffice.

Update: I finally figured it out. Instead of penalizing the last winning thread BEFORE trying to lock the mutex, penalize it AFTER trying to lock it. This way, the other threads will try to spinlock the mutex, while the last winning thread will be sleeping. Let's extend that to the TWO last winning threads, and this is what we get:

bool sySafeMutex::Lock(syAborter* aborter) {
bool result = false;
unsigned int i = 0;
unsigned long id = syThread::GetCurrentId();
for(;;) {
result = TryLock(aborter);
if(result) { break; }
if(aborter && aborter->MustAbort()) { break; }
// Scheduling strategy:
// After the first try, the last owner loses its chance of winning;
// After the 5th try, the previous-to-last owner loses its chance of winning;
// After the 50th try, all threads get the same chance of winning.
if((i >= 50) || (id==m_LastOwner) || (id==m_LastOwner2 && i >= 5)) {
} else {
return result;

void sySafeMutex::Unlock() {
unsigned int id = syThread::GetCurrentId();
if(m_Owner == id) {
if(m_Recursive && m_LockCount > 1) {
} else {
// Set m_LockCount to 0.
m_LockCount = 0;
m_LastOwner2 = m_LastOwner;
m_LastOwner = id;
m_Owner = 0xFFFFFFFF;

The RAII parttern; Sentry classes

I just found an amazing article on a seemingly unknown pattern in programming: RAII ( Resource Acquisition Is Initialization).

Basically, it is a class whose constructor initializes some resources, and whose destructor frees them. This prevents problems such as memory leaking or keeping a file locked when an exception occurs.

The concept can be generalized into "Sentry Classes". These classes perform an action on construction, and on destruction they clean up.

I had wondered what would happen in one of my classes if something had raised an exception (code has to be exception safe, did you know?). We already had prepared for locked mutexes on exceptions with the class wxMutexLocker (or in our case, syMutexLocker, sySafeMutexLocker, etc).

But what about flags? In VideoOutputDevice, there's a flag called m_Playing that tells us whenever we're sending a frame to the output buffer. What happens when (not if) an exception occurs during that?

void VideoOutputDevice::LoadVideoData(syBitmap* bitmap) {
bool result = true;

syMutexLocker mylocker(*m_mutex);
if(m_playing || MustAbort()) {
result = false;
} else {
m_playing = true;

if(result) {
// EXCEPTION OCCURS! The code is aborted and
// m_playing is never set to false! The whole class
// is rendered useless.
m_playing = false;

Eeew. Looks ugly, isn't it? Well. Pay attention to the new, improved version using Sentry classes:

void VideoOutputDevice::LoadVideoData(syBitmap* bitmap) {
sySafeMutexLocker lock(*m_Busy, this);
if(lock.IsLocked()) {
syBoolSetter setter(m_Playing, true);

What? That's it? Yes!! The sySafeMutexLocker tries to lock the safe mutex m_Busy. If successful, we set the m_Playing flag to true, and then proceed to Load the Data from bitmap.

Whether or not an exception occurs in LoadDeviceVideoData, the setter is destroyed, setting m_Playing to its old value (which is false), and on the function exit, the safe mutex m_Busy is unlocked.

Here's the code for syBoolSetter:
/** @brief Class that sets a flag to a specific value during its lifetime. */
class syBoolSetter {
/** Constructor */
syBoolSetter(bool& flag,bool newvalue);

/** Destructor. Sets the flag to its original value. */
bool& m_Flag;
bool m_Old;


syBoolSetter::syBoolSetter(bool& flag,bool newvalue) :
m_Flag(flag) {
m_Old = m_Flag;
m_Flag = newvalue;

syBoolSetter::~syBoolSetter() {
m_Flag = m_Old;

Simple, effective and elegant.

Sunday, September 7, 2008

New addition to threads module: Atomic Operations.

I realized that I could simplify a lot of code in the Saya Core by replacing some mutexes with atomic operations. I decided to encapsulate them in a class named syAtomic. This makes Saya now requiring not only GCC compatibility, but also a 486 or greater CPU (I added -march=486 to the compiler flags). Given the fact that most CPU's right now are Pentiums (and you'd certainly need a Pentium for video editing :P ), i'm sure there will be no problem with this.

Regarding the simplification, here's an example: the VideoInputDevice class has a flag called m_IsBusy, which it sets and unsets whenever it's doing an operation. I realized that for all intents and purposes, this was, in fact, a mutex (with an auxiliary mutex to lock access). I also realized that the only thing I did when waiting for m_IsBusy was to check if an abort signal was sent.

So I made my own class sySafeMutex (now possible thanks to the atomic operations), which, when a thread tries to lock it, instead of just waiting for the mutex to be unlocked, it periodically checks if an abort signal is sent (an syAborter* parameter is sent to the Lock() function). Additionally, I added another flag to optionally run the check in a tight loop for the first 3 milliseconds. This will avoid wasted CPU cycles if the only thing we want is to change a variable.

I'll also make more simplifications to the code of the a/v device classes. Stay tuned.

Saturday, September 6, 2008

Threads module finished!!

Whew. I thought I would never finish this behemot. It's still not tested and some things MIGHT not compile on Windows, but that can be fixed with a tiny bit of code revision.

Later I'll decide whether to add support for atomic functions or not.
If anyone's an expert in multithreading and OS design, please help me debug this thing.

Decided to rely on GCC features.

At first I wanted the project to be compilable by any compiler, but I have decided to stick with GCC (which is cross-platform, anyway), for two reasons:

1) GCC provides 64-bit integers for C++, which I need for the timeline.
2) It provides atomic operations, which are VERY useful for efficient multithreading programming.

I won't say more, since it's kinda late and I need sleep. Good night.

Wednesday, September 3, 2008

Lock-free algorithms part 2: CAS

Hello again. I'll finish this lock-free programming with three links:

The first link is the Compare-and-swap atomic operation. It's atomic because most processors already have an instruction for it. Nothing can be more atomic than one cpu-instruction :)

In computer science, the compare-and-swap CPU instruction ("CAS") (or the Compare & Exchange - CMPXCHG instruction in the x86 and Itanium architectures) is a special instruction that atomically compares the contents of a memory location to a given value and, if they are the same, modifies the contents of that memory location to a given new value. The result of the operation must indicate whether it performed the substitution; this can be done either with a simple boolean response (this variant is often called compare-and-set), or by returning the value read from the memory location (not the value written to it).

CAS is used to implement synchronization primitives like semaphores and mutexes, as well as more sophisticated lock-free and wait-free algorithms. Maurice Herlihy (1991) proved that CAS can implement more of these algorithms than atomic read, write, and fetch-and-add, and that, assuming a fairly large amount of memory, it can implement all of them [1].

The second link is an article from Dr. Dobb's journal implementing a the aforementioned Hazard Pointers.

And finally, an algorithm from AT&T research to implement STL-compatible (well, more or less) vectors using Hazard Pointers:


Lock-free algorithms and hazard pointers

And here I thought mutexes, critical sections, conditions and semaphores were all that would have to be learned about multithreaded programming. Turns out I was wrong.

It all began with a guy on (his nick was dshs) asking how to do inter-thread synchronization for a multimedia project (I invited him to join the team, but he politely declined). He said something about "lock free queues". So I wikipediaed it and here's what I read:

In contrast to algorithms that protect access to shared data with locks, lock-free and wait-free algorithms are specially designed to allow multiple threads to read and write shared data concurrently without corrupting it. "Lock-free" refers to the fact that a thread cannot lock up: every step it takes brings progress to the system. This means that no synchronization primitives such as mutexes or semaphores can be involved, as a lock-holding thread can prevent global progress if it is switched out. "Wait-free" refers to the fact that a thread can complete any operation in a finite number of steps, regardless of the actions of other threads. All wait-free algorithms are lock-free, but the reverse is not necessarily true. An intuitive way of understanding the difference between wait- and lock-free algorithms is that a thread executing an operation of a lock-free algorithm may not be impeded if another thread's execution is prematurely halted, whereas if the algorithm was wait-free, the first thread may not be impeded even if the second thread is aggressively interfering with the shared state.

Lock-free algorithms are one kind of non-blocking synchronization.

As I kept reading, I learned that algorithms for lock-free stacks, queues and even vectors are already out there. It means that you can read and write all the stuff you want in a multithreaded environment, without having to use a single semaphore, mutex or condition. Wow.

And that wasn't all. I just learned that a NEW algorithm called "Hazard pointers" was created for non-blocking synchronization, in 2004.

So here's the description of a Hazard Pointer, right from Wikipedia:

In a multithreaded computing environment, a hazard pointer is an element used by a methodology that allows the memory allocated to the nodes of lock-free dynamic shared objects to be reclaimed. Using the methodology, each thread keeps a list of hazard pointers indicating which nodes the thread may later access. This list can only be written to by the particular thread but can be read by any thread. When a thread wishes to remove a node, it places it on a private list and periodically scans the lists of all other threads for pointers referencing that node. If no such pointers are found the memory occupied by the node can be safely freed.

This was JUST four years ago! It's a revolution that could be comparable to fast genetic sequencing, and I'm only learning about it right now! Where have I been these years? Oh, right. Working.

If you want to know more about Hazard pointers, you can read the Hazard Pointers Full Article (PDF), courtesy of IBM Research.

I don't know if this will be of use later for Saya, where we'll have to quickly deal with low-latency stuff. Probably I may need it for later.