Skip to content

MiXCR v4.7.0

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 07 Aug 19:37
976ba14

❗ Breaking changes

  • Starting from version 4.7.0 of MiXCR, users are required to specify the assembling feature for all presets in cases where it's not defined by the protocol. This can be achieved using either the option --assemble-clonotypes-by [feature]or --assemble-contigs-by [feature] for fragmented data (such as RNA-seq or 10x VDJ data). This ensures consistency in assembling features when integrating various samples or types of samples, such as 10x single-cell VDJ and AIRR sequencing data, for downstream analyses like inferring alleles or building SHM trees. The previous behavior for fragmented data, which aimed to assemble as long sequences as possible, can still be achieved with either the option --assemble-contigs-by-cell for single-cell data or --assemble-longest-contigs for RNA-seq/Exom-seq data.

🚀 Major fixes and upgrades

  • Fixed assemble behavior for single-cell data, before the fix, in rare cases consensuses were assembled from reads coming from different cells. Now reads from different cells are strictly isolated.
  • Significant improvement of V genes assignment precision. To facilitate this improvement assemble and assembleContigs steps now have individual relativeMeanScore and maxHits parameters.
  • Improved robustness against expression level differences between TCR/IG chains. Consensus assembly in assemble now is performed separately for each chain. This change is specifically important for single-cell presets with cell-level assembly (most of the MiXCR presets for single-cell data).
  • Now options --dont-correct-tag-with-name <tag_name> or --dont-correct-tag-type (Molecule|Cell|Sample) can be specified to skip tag correction. It will trade off some analysis quality and error correction performance, for significantly lower memory and analysis time requirements, in deeply sequenced datasets with many Cell and Molecular barcodes.
  • Ability to trigger realignments of left or right reads boundaries with global alignment algorithm using parameters rightForceRealignmentTrigger or leftForceRealignmentTrigger in cases where reads do not cover the CDR3 regions (rescue alignments in case of fragmented data, like single-cell).
  • MiTool-based contig pre-assembly step integrated into 10x-sc-xcr-vdj preset, significantly improving overall analysis performance.

🛠️ Other improvements & fixes

  • Default input quality filter in assemble (badQualityThreshold) stage was decreased to 10, improving total analysis yield
  • Added validation for assembleCells that input files should be assembled by fixed feature
  • Export of trees and tree nodes now support imputed features
  • Fixed parsing of optional arguments for exportShmTreesWithNodes: -nMutationsRelative, -aaMutations, -nMutations, -aaMutationsRelative, -allNMutations, -allAAMutations, -allNMutationsCount, -allAAMutationsCount.
  • Fixed parsing of optional arguments for exportClones and exportAlignments: -allNMutations, -allAAMutations, -allNMutationsCount, -allAAMutationsCount.
  • Fixed possible errors on exporting amino acid mutations in exportShmTreesWithNodes
  • Fixed list of required options in listPresets command
  • Fixed error on building trees in case of JBeginTrimmed started before CDR3Begin
  • Fixed usage --remove-step qc
  • Added --remove-qc-check option
  • Remove -topChains field from exportShmTreesWithNodes command. Use -chains instead
  • Removed default splitting clones by V and J for presets where clones are assembled by full-length.
  • Fixed NullPointerException in some cases of building trees by SC+bulk data
  • Fixed java.lang.IllegalArgumentException: While adding VEndTrimmed in exportClones
  • Fixed combination trees step in findShmTrees: in some cases trees weren't combined even if it could be
  • Fixed NoSuchElementException in some cases of SC combining of trees
  • Fixed export of -jBestIdentityPercent in exportShmTreesWithNodes
  • Added validation on export -aaFeature for features containing UTR
  • Fixed usage of command exportPlots shmTrees
  • Fixed topology of trees: before common V and J mutations were included in the root node, now root includes only reconstructed NDN. Previous behavior lead to underestimated distance from the germline. Now sequence for the germline exports with common mutations. To fully apply the fix to previously analyzed data, rerun the pipeline starting from findShmTrees
  • Fixed IllegalStateException on removing unnecessary genes on findAlleles
  • Added --dont-remove-unused-genes option to findAlleles command
  • Adjustment consensus assembly (in assemble) parameters for single cell presets
  • Command groupClones was renamed to assembleCells. Old name is working, but it's hidden from help. Also report and output file names in analyze step were renamed accordingly.
  • Fixed calculation of germline for VCDR3Part and JCDR3Part in case of indels inside CDR3
  • Fixed export of trees if data assembled by a feature with reference point having offset
  • Export of VJJunction gemline in shmTrees exports now export mrca as most plausible content
  • Fixed parsing and alignment of reads longer than 30 Kbase
  • downsample now supports molecule variant in --downsampling option
  • Fixed naming of output files of downsample command
  • --output-not-used-reads of analyze command now works with bam input files too, alongside --not-aligned-(R1|R2) and --not-parsed-(R1|R2) of align command
  • Fix replaceWildcards behaviour on parsing BAM. Previous behaviour resulted in discarding of the quality scores on align
  • v_call, d_call, j_call and c_call columns in AIRR now output only best hit, not the whole list
  • Stable behavior of replaceWildcards. Before it depended on the position of read in a file, now it depends on read content only
  • If sample sheet supplied by --sample-sheet[-strict] option has * symbol after tag name, it will be preserved

🧬 Reference gene library changes

  • IG reference for new species:
    • Rabbit (IGH, IGK, IGL)
    • Sheep (IGH, IGK, IGL)
  • Human reference corrections:
    • Duplicated entries removed: IGHV1-69*00, IGHV1-69*01, IGHV3-23*00, IGHV3-23*01
    • Fix for CDR3Begin position in IGHV4-30-4
    • Fix for FR1Begin position in TRBV21-1
    • Names of the following human TRAV genes were changed:
      • TRAV14DV4 -> TRAV14/DV4
      • TRAV23DV6 -> TRAV23/DV6
      • TRAV29DV5 -> TRAV29/DV5
      • TRAV36DV7 -> TRAV36/DV7
      • TRAV38-1DV8 -> TRAV38-1/DV8
  • Correct mapping of V-gene UTRs in Alpaca reference

📚 New Presets

  • Added preset takara-mouse-rna-bcr-umi-smarseq for new Takara SMART-Seq Mouse BCR (with UMIs) kit
  • Added preset idt-human-rna-bcr-umi-archer and idt-human-rna-tcr-umi-archer for IDT Archer kits
  • Presets for Cellecta kits that include TCR/BCR Spike-in mix QC metrics: cellecta-human-rna-xcr-umi-drivermap-air-bcr-spikein-1-1-1, cellecta-human-rna-xcr-umi-drivermap-air-bcr-spikein-16-4-1, cellecta-human-rna-xcr-umi-drivermap-air-tcr-spikein-1-1-1,cellecta-human-rna-xcr-umi-drivermap-air-tcr-spikein-16-4-1