Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an MN:i tag (number of SEQ bases at time of modification tag update). #714

Merged
merged 1 commit into from
Sep 9, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions SAMtags.tex
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ \section{Standard tags}
{\tt MI} & Z & Molecular identifier; a string that uniquely identifies the molecule from which the record was derived \\
{\tt ML} & B,C & Base modification probabilities \\
{\tt MM} & Z & Base modifications / methylation \\
{\tt MN} & i & Length of sequence at the time {\tt MM} and {\tt ML} were produced \\
{\tt MQ} & i & Mapping quality of the mate/next segment \\
{\tt NH} & i & Number of reported alignments that contain the query in the current record \\
{\tt NM} & i & Edit distance to the reference \\
Expand Down Expand Up @@ -625,6 +626,17 @@ \subsection{Base modifications}
{\tt ML} values for ambiguity codes give the probability that the modification is one of the possible codes compatible with that ambiguity code.
For example {\tt MM:Z:C+C,10; ML:B:C,229} indicates a C call with a probability of 90\% of having some form of unspecified modification.

\item[MN:i:\tagvalue{length}]
\hfill\\
The length of the {\sf SEQ} field at the time the {\tt MM} value was last written.

Some processing of aligned data, such as the use of hard-clipping tools, may alter {\sf SEQ} sequence data.
If the sequence is shortened in this manner then the base offsets in {\tt MM} and {\tt ML} become invalid unless they are also updated accordingly.

Some hard-clipping tools will update {\tt MM}/{\tt ML} but others do not, so the {\tt MN} tag offers a simple sanity check.
Software that wishes to validate {\tt MM} should compare the length of the {\sf SEQ} field with the contents of the {\tt MN} tag---if they differ, the {\tt MM}~and {\tt ML}~values should be considered out-of-date.
The tag is optional, but recommended, and if it is absent then there is an implicit assumption that the {\tt MM} data is valid unless evidence implies otherwise (e.g., by having coordinates beyond the end of the sequence).

\end{description}

\section{Draft tags}
Expand Down Expand Up @@ -671,6 +683,10 @@ \section{Tag History}
\setlength{\parindent}{0pt}
\newcommand*{\gap}{\vspace*{2ex}}

\subsubsection*{September 2024}

Added the MN tag for validating base modification tag consistency.

\subsubsection*{February 2022}

Base modification tags changed to use the predefined standard names MM and~ML, as their review period has finished.
Expand Down
Loading