Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability for indexed sequence file to be uploaded and used in an assembly #432

Merged
merged 29 commits into from
Sep 21, 2024

Conversation

garrettjstevens
Copy link
Contributor

@garrettjstevens garrettjstevens commented Aug 30, 2024

Fixes #405, fixes #406, fixes #407, fixes #424

Copy link
Contributor Author

@garrettjstevens garrettjstevens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I didn't think about when we were discussing this before is that any Change can't have nodejs-specific code in it (i.e. no imports from node:something). This is because this code has to run both in the browser and on the server. I think the nodejs-specific code that got introduced is what is causing the tests to fail.

What we'll need to do instead is that in packages/apollo-collaboration-server/src/files/files.service.ts we'll add a getFileHandle method that returns a file handle from a file document, very similar to how getFileStream in that same service file returns a file stream from a file document. Then the implementation of LocalFileGzip will be in collaboration server (e.g. in packages/apollo-collaboration-server/src/files/filesUtil.ts) instead of the shared code.

Then the code in the Change will look something like this:

const { filesService } = backend
const { getFileHandle } = filesService
new IndexedFasta({
  fasta: getFileHandle(fastaDoc),
  fai: getFileHandle(faiDoc),
})

(Don't forget to add the definition for getFileHandle in packages/apollo-common/src/Operation.ts)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be combined with the add-fasta command, they should not be separate commands.

Also, add-fasta and add-gff are unclear as to their function based on the command name. So what we need to do is:

  • Rename add-gff to add-from-gff
  • Rename add-fasta to add-from-fasta
  • Combine add-file into add-from-fasta

Another thing to consider, maybe the argument to add-from-fasta should be positional and not a flag, since it's required in all cases. What do you think of this for example commands:

apollo assembly add-from-fasta ./my-genome.fa
apollo assembly add-from-fasta ./my-genome.fa.gz
apollo assembly add-from-fasta 507f1f77bcf86cd799439011
apollo assembly add-from-fasta ./my-genome.fa.gz --not-editable
apollo assembly add-from-fasta ./my-genome.fa.gz --not-editable --fai ./my-genome.fa.gz.fai --gzi ./my-genome.fa.gz.gzi
apollo assembly add-from-fasta 507f1f77bcf86cd799439011 --not-editable --fai 507f1f77bcf86cd799486197 --gzi 507f1f77bcf86cd785963292

description:
'File type or "autodetect" for automatic detection.\n\
NB: There is no check for whether the file complies to this type',
options: ['text/x-fasta', 'text/x-gff3', 'autodetect'],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add new options here, since we want to be able to tell the type later. Types I can think of are:

  • text/x-fasta
  • text/x-gff3
  • application/x-bgzip-fasta
  • text/x-fai
  • application/x-gzi

Authorization: `Bearer ${accessToken}`,
'Content-Type': type,
'Content-Length': String(size),
'Content-Encoding': contentEncoding,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave out Content-Encoding altogether unless it's gzip, maybe something like:

const headers = new Headers({
  Authorization: `Bearer ${accessToken}`,
  'Content-Type': type,
  'Content-Length': String(size),
})
if (isGzip) {
  headers.append('Content-Encoding', 'gzip')
}

})

let contentEncoding = ''
if (file.endsWith('.gz')) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass a param to uploadFile instead of checking the file name.

const gz = createGzip()
await pipeline(stream, gz, fileWriteStream)

if (originalname.endsWith('.gz')) {
Copy link
Contributor Author

@garrettjstevens garrettjstevens Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the Content-Encoding Content-Type instead of the filename.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be folded in to the existing AddAssemblyFromFileChange instead of being a new change.

}),
force: Flags.boolean({
char: 'f',
description: 'Delete existing assembly, if it exists',
}),
'no-db': Flags.boolean({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no-db should be something like not-editable, since that what is relevant to the user (the database is an internal detail they don't need to worry about)

description: wrapLines(
"Do not load the fasta sequence into the Apollo database. \
This option assumes the fasta file is bgzip'd with `bgzip` and indexed with `samtools faidx`.\
Indexes should be named <my.fasta.gz>.gzi and <my.fasta.gz>.fai",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-detection is good, but there should also be a way to explicitly pass the fai and gzi files.

dariober added a commit that referenced this pull request Sep 17, 2024
dariober added a commit that referenced this pull request Sep 18, 2024
But most likely there will be changes to clunky code
@garrettjstevens
Copy link
Contributor Author

@dariober the issue with the GZI filehandle we were seeing has been fixed

dariober added a commit that referenced this pull request Sep 19, 2024
All tests except those for chekcs passing
@garrettjstevens garrettjstevens merged commit 89ccf49 into main Sep 21, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants