Looking at "General Statistics" tables in the SL4 and SL5 report.
Because the Web page is difficult for me to read, I imported the tables into an Excel spreadsheet, named: Ravi_2021_SL4_v_SL5_GeneralStatistics_multiqc_report.xlsx and added to the repository.
I note right off the bat that "M Reads Mapping" (millions of reads mapped) is larger for SL5 in 7 out of 10 samples. To keep track of which samples had more or less "M Reads Mapping" values, I color-coded cells: pale blue if the value was larger and pale orange if the value was smaller. Thus, the SL5 table contained many more pale blue cells than the SL4 table.
Next, I did sanity-checking on the pipelines. The "% BP trimmed", "% Dups", "%GC", "Length" and "M Seqs" columns should be the same in the SL4 and SL5 pipeline results, as these steps merely process the reads themselves, independently from the genome alignment target. Thus, for these metrics, using a different reference genome assembly (SL4 versus SL5) should not matter. I scanned the values manually, using my eyes. No differences spotted. Sanity check passes.
As [~molly] noted during scrum meetings, two samples evoked a "WARNING: Fail Strand Check" alert in the SL4 mapping but elicited no such warning in the SL5 mapping. WTF? I suppose this could have something to do with the difference in the annotations used, possibly. We cannot really tell at this level what is going on, so I recommended we proceed with deploying the data for genome browser visualization and then manually inspect these samples for a possible error. We might find something cool. Who knows?
I will also make sure to wait for the pull request in the future! Thanks!