Showing posts tagged #VCVRack - Show all

Details and comparison of miRack and VCV Rack multithreading implementation and performance

June 9, 2020

I implemented multitheading for miRack audio engine back in 2018 when it was a project targeting single-board computers because with slower CPUs it's essentially a requirement to be able to run any decently-sized patches. The implementation (available here and is used in miRack app with minor modifications) is based on idea of having arrays for module input and output values, and a lock-free concurrent work queue implementation by Cameron Desrochers.

For each rendering cycle, say 512 samples (steps), all rack modules are pushed into to a work queue, also with start and end steps to process (so initially that is 1 to 512), then worker threads are woken up. The worker threads dequeue modules from the work queue and check that values for the step being processed are present for all module inputs (for disconnected inputs this is always true). If all values are available, the module is processed and output values are saved straight in input arrays of modules connected to each output for the next step number. The process continues until the end step is reached or until any of the input values are not available, in which case the module is pushed back into the work queue (updating the start step if needed) and another module is pulled from the work queue. Once there is no more modules in the work queue, the workers pause, and the rendering cycle completes. This implementation ensures that workers don't wait unless they have to.

Until recently I never looked at the multithreading implementation that later appeared in VCV Rack (available here), but wanted to run some benchmarks at some point.

During normal opearation, VCV Rack implementation uses spinlocks only. For each step in a rendering cycle, workers process only that single step for each module. Once there are no more modules for a worker to pick up, it will spinwait until all workers have finished, then values are transferred from outputs to connected inputs, and the workers are woken up to process the next single step. This implementation causes the workers to wait a lot instead of possibly processing next steps for some modules.

Now to the benchmark. I used the current miRack code and the latest VCV Rack code. All graphics rendering was disabled, as well as audio output. For VCV Rack, updating port lights was also disabled - it involves a lot of computations that substantially affect the results while not being related to audio processing.

The audio engines were told to process 1024 samples (steps) as fast as they can, and it was repeated 1000 times for a single thread then for 2, 3, and 4 worker threads. The tests were performed on a CPU with 4 physical cores. The following patches (by VCV Rack Ideas were used):

1st Patch Results

ThreadsmiRack TimemiRack %VCV Rack TimeVCV Rack %Ideal %
14242ms100.00%6313ms100.00%100.00%
22236ms52.71%5179ms82.04%50.00%
31620ms38.19%4604ms72.93%33.33%
41341ms31.61%4312ms68.30%25.00%

2nd Patch Results

ThreadsmiRack TimemiRack %VCV Rack TimeVCV Rack %Ideal %
14904ms100.00%6203ms100.00%100.00%
23054ms62.28%4944ms79.70%50.00%
32575ms52.51%4578ms73.80%33.33%
42357ms48.06%4455ms71.82%25.00%

"%" column shows time difference to the single-threaded case, and "Ideal %" shows the best theoretically achievable improvement of N times for N threads.

Also I should note that initially it was about comparing multithreaded speed increase, not absolute values (at least because miRack and VCV Rack use different versions of some of the patch modules), but absolute values turned out to be quite interesting as well. As I mentioned above, port lights update code adds about another second to VCV Rack results.

·

Support for closed-source/commercial VCVRack plugins in miRack

June 26, 2018

Great news! I've managed to implement loading of closed-source VCVRack plugins into miRack. This means it now makes sense to build packages for desktop operating systems. And soon you will be able to enjoy all the benefits of miRack, including lower CPU usage, more responsive UI and multithreaded processing - and still use all the same plugins you have, including commercial ones you purchased.

Of course this does not affect miRack running on ARM boards - only open-source plugins can be used in that case because they need to be compiled for ARM in first place.

·

Just to give you an idea what’s wrong with VCVRack

May 1, 2018

Here is a windowRun() function which is an event handling/rendering loop. It runs at VSync rate (if it's supported/enabled or at 90 FPS otherwise). Let's assume it's 60 FPS. Each frame, cursorPosCallback() is called. That functions doesn't check whether the cursor position has actually changed or not. It does number of things but we're now interested in this line where onDragMove() is called for a gDraggedWidget if it's not NULL. gDraggedWidget is set here whenever you press left mouse button over a widget. Suppose, you pressed it over a module background (not a control), then requestModuleBoxNearest() will be called here to find a new position for the module so that it doesn't overlap other modules.

So far we see that some function is being called even when it's not really neccessary - that's bad but not the end of the world, right? Then now on to the interesting part. I'll provide full code of this function here:

bool RackWidget::requestModuleBoxNearest(ModuleWidget *m, Rect box) {
    // Create possible positions
    int x0 = roundf(box.pos.x / RACK_GRID_WIDTH);
    int y0 = roundf(box.pos.y / RACK_GRID_HEIGHT);
    std::vector positions;
    for (int y = max(0, y0 - 8); y < y0 + 8; y++) {
        for (int x = max(0, x0 - 400); x < x0 + 400; x++) {
            positions.push_back(Vec(x * RACK_GRID_WIDTH, y * RACK_GRID_HEIGHT));
        }
    }

    // Sort possible positions by distance to the requested position
    std::sort(positions.begin(), positions.end(), [box](Vec a, Vec b) {
        return a.minus(box.pos).norm() < b.minus(box.pos).norm();
    });

    // Find a position that does not collide
    for (Vec position : positions) {
        Rect newBox = box;
        newBox.pos = position;
        if (requestModuleBox(m, newBox))
            return true;
    }
    return false;
}

Isn't this wonderful? Create an array of 12800 vectors, then sort it, computing more than 100k square roots (not counting other operations), then iterate over these vectors again until we find a non-overlapping position (requestModuleBox() will iterate over all the modules too each time) - and all this at 60 FPS when you just pressed a mouse button and didn not even move the cursor (and of course it can be done much faster and simpler when you do move)!

Of course, this is not the only place with such... um... coding style. On the bright side, now I know why modules are moving not quite as smooth as desired (on Tinker Board this mess drops FPS to less than 20 when dragging a module).

And even better news is that the overall low FPS I was experiencing on Tinker Board can be fixed by just using full-screen window. There's a bug in GPU driver, Xorg or somewhere. Anyway, I'll just default to fullscreen on ARM (Raspberry Pi doesn't have this problem but it won't hurt anyway) and won't have to work on running without an X server for now.


miRack - an optimised fork of VCVRack primarily targeting Raspberry Pi, ASUS Tinker Board and similar hardware. But will keep your desktop cooler too.

·

miRack is now available

May 1, 2018

miRack is a virtual modular synthesizer and is a fork of VCVRack with optimisations and tweaks primarily to run on Raspberry Pi 3, ASUS Tinker Board and similar hardware. Although it can also be used on macOS or Linux on desktop. See the announcement post for background information about this project.

Now miRack repository is available on GitHub - github.com/mi-rack/Rack.

Rack itself is an engine/plugin host, and the actual modules are implemented in plugins written by other developers. There's a lot of open-source plugins available which can be used with miRack (commercial and closed-source plugins will obviously not work on ARM CPU). These plugins have to be checked and optimised as well, and there are already some of them forked in miRack organisation on GitHub. The Rack repository includes a list of plugins I know about with their compatibility/optimisation status. Plugins from this list can be installed automatically, ensuring that an optimised version is used if it's available.

In general, I've made installation on Debian-based Linux (Raspbian and Tinker OS) and macOS as simple as possible, but please study README carefully. There are important differences from VCVRack and some platform-specific notes.

The plan now is to fix more stuff in Rack, check and optimise more plugins, and start thinking about hardware for a portable synth. Need to buy a touchscreen/case and some knobs I guess, tweak UI for (multi-)touch, and get rid of X server which affects UI performance too much on Tinker Board.

·

Introducing miRack – an optimised fork of VCVRack for Raspberry Pi and others

April 20, 2018

Lately I got distracted from Dwarf Fortress Remote development by discovering VCVRack - an open-source virtual Eurorack-style modular synthesizer written by Andrew Belt. I don't even remember how it happened, also I'm not really a musician but I have a soft spot for toying with music apps (and developing ones as you know - SoundGrid, SoundGrid Live!). But then I, also accidentally, noticed a discussion related to running it on Raspberry Pi, basically stating that was unusable on such hardware. I could not resist changing that, especially considering that I finally got a reason to buy an RPi.

Important note: in fact, I started development o RPi 3 Model B, but then ended up buying an ASUS Tinker Board instead (because it looks cool). In tests, Tinker Board is about twice faster than Raspberry Pi. However, without active cooling, my board reduces frequency to at most about 1.2-1.4 GHz (from original 1.8 GHz), so it's not that much faster than RPi when I'm running Rack on it. Below when I say RPi I will be referring to all similar single board computers, just take into account that your results may be different from mine.

VCVRack is a wonderful project, and especially the community that formed around it quite quickly with many developers making plugins implementing various modules (the Rack application itself is just a host and doesn't include any modules apart from MIDI and audio in/out). Unfortunately it sucks performanse-wise - the choice of frameworks, the base Rack code itself, the fact that plugins are created by different people with different level of expertise and care for performance - all this causes people to complain about performance even on desktop hardware.

Interestingly, UI part (as opposed to computational part) was the worst and took the most time to get decent FPS on RPi hardware. Rack uses SVG for all UI elements and NanoVG for rendering, and with improper use and without additional optimisations both to NanoVG and Rack, this led to each frame requiring up to a thousand and more (depending on the number of modules added) of OpenGL calls - this is a bad practice and unacceptable even on desktop hardware and just could not work on RPi. I've reduced this number to just several dozens when idle, and another several dozens when adjusting knobs and sliders (not counting frequently updated widgets like scopes that just have to be fully rendered every time).

I did remove some of the visual effects like shadows and light halos - because the performance is more important on weak hardware than visual effects after all. Some of them may be enabled again later (and I definitely need to bring back different colours for cables) when I assess their impact on FPS. Also, I'm planning to add support to work without X server into GLFW (the library used by Rack for OpenGL window creation and mouse/keyboard events). In my tests I noticed significant FPS increase without an X server on my hardware.

Then, I've implemented multi-threading for the actual signal processing. This isn't strictly neccessary on desktop but vital on multi-core hardware with much slower individual cores. Also, I made the engine run only on callback from audio driver - this is probably temporary but I've noticed some performance issues related to thread synchronisation otherwise. I need to see what side effects this change has apart from not being able to route output to a DAW (via VCVRack's Bridge plugin), but that's not important when running on RPi anyway.

And then there were various changes to individual modules - to avoid sample rate conversion, to tweak quality vs performance, to fix rendering, to cache values instead of calculating each step, or to at least not to compute values for module outputs that are not connected.

Of course, this project has limitations when running on RPi and similar hardware, and it obviously depends on the number of modules used in a patch and the modules themselves.

The VCVRack license disallows the use of some of its graphics resources and of the "VCV" name part in derivative works, so now I need to take care of this (and of some occasional crashes) before a public release. My software will be named miRack. Once it's ready, I'll explain in more detail the differences between it and VCVRack, its limitations, and compatibility with existing plugins (in short, plugins are source-compatible). By the way, of course, it can also be used on desktop hardware (at least on Mac and Liux as I haven't tried building on Windows), and UI performance improvements are noticeable.

·