MAJIQ 2 is Out!

MAJIQ 2 is out! – What’s all the excitement?

So…. We *finally* released MAJIQ 2.0, our tool for splicing detection, quantification, and visualization. What’s the excitement all about?

ImSoExcited

I mean, it’s not like there is a shortage of tools/pipelines for splicing quantifications from RNA-Seq…. But there are many excellent reasons to get excited about MAJIQ in general, and about the 2.0 version in particular. I discuss some of these reasons below.

*Hint: Those of you who already know what MAJIQ can do can skip straight to Reason 3

Reason 1: Ability to detect, quantify, and visualize complex and de-novo splicing variations from RNASeq.

Most software for splicing quantification rely solely on an annotated transcriptome given as input. This means you are bound to miss everything not in the annotation. “That’s minor” you say? Well, we have shown that even if you just compare normal mouse tissues and use Ensembl you gain ~30% more significant (>20% change inclusion) differentially spliced events that are just as reproducible as the annotated ones. And if you are studying disease (cancer, anyone? Cryptic splicing, maybe?) or other perturbations you may not want to overlook those de-novo variants…

Also, most software tools only study “classic” splicing events like exon skipping, or 3’/5’ alternative splice site. We have shown that complex events, i.e. those involving 3 or more alternative junctions, are extremely common (> 30%!) in human, mouse, and other species. And yes, that complexity is actually used even in normal tissues (see, our Elife paper for details). So, again, if you are studying splicing and are only using annotated and “classical” events you may be missing a lot of the things you are looking for….

Dealing with complex splicing variations requires the ability to visualize them, validate them, connect them back to the raw read rate etc. Fortunately, we have taken care of these things with a visualization package, VOILA, and a web-tool for automated RT-PCR primer design for LSVs, called MAJIQ-SPEL (we kept with the magic theme…)

Recently many tools have been released that advertise the ability to detect de-novo and complex splicing. Examples include Whippet and LeafCutter. A comprehensive comparison is beyond the scope of this blog post (don’t worry, we will have comparisons in an upcoming paper…) but at this point we only would like to point out that each of those other methods lack some features/capabilities compared to MAJIQ. For example, Whippet only detects connections between known junctions, LeafCutter is unable to detect intron retention etc. The bottom line is that we are not aware of a tool that offers such a package of tools/capabilities for detection, quantification, visualization, and validation of de-novo, complex, events including intron retention (IR), de-novo junctions, and de-novo exons.

Here is an illustration of what MAJIQ can help you get: Splice graphs of Col11a2, a key gene in ear development. The splice graphs, the visualization of the event and the quantification of the change are all part of the output. Green arcs/rectangles are de-novo junctions/exons capturing misplacing caused by KD of splice factor ESRP1. This beautiful study by led by Alex Rohacek originated from a case of a deaf child with rare mutations in ESRP.

Col11a2.Rohacek2017

Reason 2: MAJIQ’s accuracy compares favorably to other algorithms.

OK, so maybe you are convinced MAJIQ might be useful for your data analysis, but is it actually accurate??

Good question. We did extensive testing with both real and “realistic” synthetic data using a variety of metrics and found MAJIQ compares favorably to other tools. Here is an illustration of those evaluations:

The graph on the left shows reproducibility ratio (RR) plots. This is the ratio of events called as differentially spliced between two conditions and reproduced (y-axis) as a function of their relative ranking/confidence (x-axis). The two conditions are cerebellum vs. liver 5 samples each, from GTEx. Reproducibility is when you repeat this with a different set of samples (same set size, same tissues). The right plot, called IIR (intra to inter ratio), is a proxy for the ratio of putative false positive events (percent, y-axis). We previously performed extensive comparisons to other algorithms as well, such ad DEXSeq, SUPPA2, and MISO – including evaluation on “realistic” synthetic data and comparison to triplicates of RT-PCRs, the golden standard in the RNA field (see here, here, and here). Again, a more extensive analysis will appear in a paper we are writing. I will only add that in the only case where claims were made that MAJIQ does not perform well we showed, clearly and extensively, that those claims were false and based on critical misuse of our software (see here).

NOTE: while we are obviously promoting MAJIQ here and we think it generally performs well it’s still possible that for *your* data/usage case MAJIQ really sucks…. That’s why we advocate in our papers/talks the usage of several metrics such as RR and IIR described above and make our code/data available so you can try those evaluations on your data and see for yourself what works….

Reason 3: MAJIQ 2 is *way* faster, more memory and I/O efficient

MAJIQ’s main disadvantage compared to alternative methods was speed. Let me emphasize that we believe the added ability to detect complex, de-novo, variations and intron retention events are in many cases worth the extra time, even more so given the improved accuracy. Or, to put the accuracy vs. speed into a visual summary:

FastVsAccurate

Nonetheless, we do admit that speed is important and we worked hard to make 2.0 much faster. How much faster? Well, they say a picture is worth a thousand words so here is comparative analysis we did with GTEX samples:

As you can see, compared to MAJIQ 1.1, MAJIQ 2.0 is ~10 times faster, and is now comparable to rMATS and LeafCutter, and it was all done with very efficient implementation in terms of memory consumption (e.g. in all of the above tests peak memory consumption hovers stably around 2GB RAM, i.e. you can run on a laptop).

So, with 2.0 we brought MAJIQ to be similar, and in some cases faster, than the current most efficient methods. To be clear, we are not aiming to build the fastest tool. There is substantial overhead (which results in additional compute time) in modeling de-novo events, especially de-novo intron retention. That overhead is not going away. But, we are aiming to make 2.0 as useful as possible, and for that speed is important.

Btw, the speed improvements are due not only to code improvements and porting to C++. Amongst other things we significantly revised the algorithm for intron retention. To be honest, IR has been a *huge* headache for us. It was basically *the* thing that prevented us from releasing 2.0 last year. So much so, that Jordi, the main developer behind MAJIQ, announced on the group chat that if anyone else reports another bug with intron retention he will be doing this:

JordiIntronRetention

Good news: He didn’t. And we got it to work 😁

Reason 4: New visualization (VOILA 2.0)

VOILA 2.0 has changed significantly. Much of it is right now only under the hood (though see below about new features coming soon..). The two main differences you will observe now is that VOILA is run as a local service. This allows for additional features like a search which were not available before. The second is that the improved implementation requires less disc space and is more responsive for large datasets.

Here is an example of how it looks (notice the search box on the top left)

Reason 5: Ability to analyze hundreds and thousands of samples

The new speed and new visualization also mean MAJIQ supports the analysis of hundreds and thousands of samples for large datasets such as GTEX and TCGA. There are more features and algorithmic improvements that will build on it and improve such analysis even more (see below).

Reason 6: Why so negative? (Support for a confident negative set)

One feature we found missing in practically all other tools is the ability to get a confident set of events that are *not* changing. Most tools focus on protecting from false positives, trying to find things that are confidently changing. However, as RNA researchers, we and our collaborators have found it very useful to be able to get a set to compare against – a very “clean” set of things we are confident are not changing.

Reason 7: Wait! There is more!

Finally, a major reason we are excited about MAJIQ 2.0 is that it sets the code base for many new exciting algorithmic and visualization improvements, with application to new research questions so stay tuned!

WereWaiting

I want to thank all the BioCiphers Lab who worked hard on this as a great team effort led by the relentless Jordi Vaquero. We also thank all MAJIQ users for their input that help us make MAJIQ better. We hope you all will enjoy the new MAJIQ 2.0 and don’t forget to subscribe to the user forum!