Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build(deps): Bump unstructured from 0.11.2 to 0.11.8 #37

Merged
merged 1 commit into from
Jan 18, 2024

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Jan 17, 2024

Bumps unstructured from 0.11.2 to 0.11.8.

Release notes

Sourced from unstructured's releases.

0.11.8

Enhancements

  • Add SaaS API User Guide. This documentation serves as a guide for Unstructured SaaS API users to register, receive an API key and URL, and manage your account and billing information.

0.11.7

Enhancements

  • Add intra-chunk overlap capability. Implement overlap for split-chunks where text-splitting is used to divide an oversized chunk into two or more chunks that fit in the chunking window. Note this capability is not yet available from the API but will shortly be made accessible using a new overlap kwarg on partition functions.
  • Update encoders to leverage dataclasses All encoders now follow a class approach which get annotated with the dataclass decorator. Similar to the connectors, it uses a nested dataclass for the configs required to configure a client as well as a field/property approach to cache the client. This makes sure any variable associated with the class exists as a dataclass field.

Features

  • Add Qdrant destination connector. Adds support for writing documents and embeddings into a Qdrant collection.
  • Store base64 encoded image data in metadata fields. Rather than saving to file, stores base64 encoded data of the image bytes and the mimetype for the image in metadata fields: image_base64 and image_mime_type (if that is what the user specifies by some other param like pdf_extract_to_payload). This would allow the API to have parity with the library.

Fixes

  • Fix table structure metric script Update the call to table agent to now provide OCR tokens as required
  • Fix element extraction not working when using "auto" strategy for pdf and image If element extraction is specified, the "auto" strategy falls back to the "hi_res" strategy.
  • Fix a bug passing a custom url to partition_via_api Users that self host the api were not able to pass their custom url to partition_via_api.

0.11.6

Enhancements

  • Update the layout analysis script. The previous script only supported annotating final elements. The updated script also supports annotating inferred and extracted elements.
  • AWS Marketplace API documentation: Added the user guide, including setting up VPC and CloudFormation, to deploy Unstructured API on AWS platform.
  • Azure Marketplace API documentation: Improved the user guide to deploy Azure Marketplace API by adding references to Azure documentation.
  • Integration documentation: Updated URLs for the staging_for bricks

Features

  • Partition emails with base64-encoded text. Automatically handles and decodes base64 encoded text in emails with content type text/plain and text/html.
  • Add Chroma destination connector Chroma database connector added to ingest CLI. Users may now use unstructured-ingest to write partitioned/embedded data to a Chroma vector database.
  • Add Elasticsearch destination connector. Problem: After ingesting data from a source, users might want to move their data into a destination. Elasticsearch is a popular storage solution for various functionality such as search, or providing intermediary caches within data pipelines. Feature: Added Elasticsearch destination connector to be able to ingest documents from any supported source, embed them and write the embeddings / documents into Elasticsearch.

Fixes

  • Enable --fields argument omission for elasticsearch connector Solves two bugs where removing the optional parameter --fields broke the connector due to an integer processing error and using an elasticsearch config for a destination connector resulted in a serialization issue when optional parameter --fields was not provided.

0.11.5

Enhancements

Features

Fixes

  • Fix partition_pdf() and partition_image() importation issue. Reorganize pdf.py and image.py modules to be consistent with other types of document import code.

... (truncated)

Changelog

Sourced from unstructured's changelog.

0.11.8

Enhancements

  • Add SaaS API User Guide. This documentation serves as a guide for Unstructured SaaS API users to register, receive an API key and URL, and manage your account and billing information.
  • Add inter-chunk overlap capability. Implement overlap between chunks. This applies to all chunks prior to any text-splitting of oversized chunks so is a distinct behavior; overlap at text-splits of oversized chunks is independent of inter-chunk overlap (distinct chunk boundaries) and can be requested separately. Note this capability is not yet available from the API but will shortly be made accessible using a new overlap_all kwarg on partition functions.

Features

Fixes

0.11.7

Enhancements

  • Add intra-chunk overlap capability. Implement overlap for split-chunks where text-splitting is used to divide an oversized chunk into two or more chunks that fit in the chunking window. Note this capability is not yet available from the API but will shortly be made accessible using a new overlap kwarg on partition functions.
  • Update encoders to leverage dataclasses All encoders now follow a class approach which get annotated with the dataclass decorator. Similar to the connectors, it uses a nested dataclass for the configs required to configure a client as well as a field/property approach to cache the client. This makes sure any variable associated with the class exists as a dataclass field.

Features

  • Add Qdrant destination connector. Adds support for writing documents and embeddings into a Qdrant collection.
  • Store base64 encoded image data in metadata fields. Rather than saving to file, stores base64 encoded data of the image bytes and the mimetype for the image in metadata fields: image_base64 and image_mime_type (if that is what the user specifies by some other param like pdf_extract_to_payload). This would allow the API to have parity with the library.

Fixes

  • Fix table structure metric script Update the call to table agent to now provide OCR tokens as required
  • Fix element extraction not working when using "auto" strategy for pdf and image If element extraction is specified, the "auto" strategy falls back to the "hi_res" strategy.
  • Fix a bug passing a custom url to partition_via_api Users that self host the api were not able to pass their custom url to partition_via_api.

0.11.6

Enhancements

  • Update the layout analysis script. The previous script only supported annotating final elements. The updated script also supports annotating inferred and extracted elements.
  • AWS Marketplace API documentation: Added the user guide, including setting up VPC and CloudFormation, to deploy Unstructured API on AWS platform.
  • Azure Marketplace API documentation: Improved the user guide to deploy Azure Marketplace API by adding references to Azure documentation.
  • Integration documentation: Updated URLs for the staging_for bricks

Features

  • Partition emails with base64-encoded text. Automatically handles and decodes base64 encoded text in emails with content type text/plain and text/html.
  • Add Chroma destination connector Chroma database connector added to ingest CLI. Users may now use unstructured-ingest to write partitioned/embedded data to a Chroma vector database.
  • Add Elasticsearch destination connector. Problem: After ingesting data from a source, users might want to move their data into a destination. Elasticsearch is a popular storage solution for various functionality such as search, or providing intermediary caches within data pipelines. Feature: Added Elasticsearch destination connector to be able to ingest documents from any supported source, embed them and write the embeddings / documents into Elasticsearch.

Fixes

  • Enable --fields argument omission for elasticsearch connector Solves two bugs where removing the optional parameter --fields broke the connector due to an integer processing error and using an elasticsearch config for a destination connector resulted in a serialization issue when optional parameter --fields was not provided.
  • Add hi_res_model_name Adds kwarg to relevant functions and add comments that model_name is to be deprecated.

0.11.5

... (truncated)

Commits
  • 8e2bfca Unstructured SaaS API subscription guide (#2341)
  • 91b892c fix: Fix api_url param to partition_via_api (#2342)
  • 1b70ea8 fix: update table structure eval to use new table inference interface (#2306)
  • dd1443a feat: add Qdrant ingest destination connector (#2338)
  • 9459af4 Fix: element extraction not working when using "auto" strategy for pdf (#2324)
  • dd14445 Feat: return base64 encoded images for PDF's (#2310)
  • 8ba9fad feat: improve dataclass use for encoders (#2318)
  • bfef183 feat: update encoders to be dataclasses (#2313)
  • eb1b022 feat(chunking): add overlap on chunk-splits (#2305)
  • 5c0043a chore: add hi_res_model_name kwarg (#2289)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Jan 17, 2024
Copy link

coderabbitai bot commented Jan 17, 2024

Important

Auto Review Skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@dependabot dependabot bot force-pushed the dependabot/pip/unstructured-0.11.8 branch from 192c5ad to 2f14cbe Compare January 17, 2024 19:43
@rodneyosodo
Copy link
Contributor

@dependabot rebase

Bumps [unstructured](https://github.com/Unstructured-IO/unstructured) from 0.11.2 to 0.11.8.
- [Release notes](https://github.com/Unstructured-IO/unstructured/releases)
- [Changelog](https://github.com/Unstructured-IO/unstructured/blob/main/CHANGELOG.md)
- [Commits](Unstructured-IO/unstructured@0.11.2...0.11.8)

---
updated-dependencies:
- dependency-name: unstructured
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot force-pushed the dependabot/pip/unstructured-0.11.8 branch from 2f14cbe to 3885034 Compare January 18, 2024 10:22
@rodneyosodo rodneyosodo merged commit 0abdbdd into main Jan 18, 2024
9 checks passed
@rodneyosodo rodneyosodo deleted the dependabot/pip/unstructured-0.11.8 branch January 18, 2024 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant