Add a few hot fixes to the documentation

superduper-io · Nov 28, 2023 · a7b32f3 · a7b32f3
1 parent b8d5e8a
commit a7b32f3
Show file tree

Hide file tree

Showing 16 changed files with 140 additions and 117 deletions.
diff --git a/docs/hr/content/docs/data_integrations/sql.md b/docs/hr/content/docs/data_integrations/sql.md
@@ -4,6 +4,10 @@ sidebar_position: 3
 
 # SQL
 
+`superduperdb` supports SQL databases via the [`ibis` project](https://ibis-project.org/).
+With `superduperdb`, queries may be built which conform to the `ibis` API, with additional 
+support for complex data-types and vector-searches.
+
 ## Setup
 
 The first step in working with an SQL table, is to define a table and schema

diff --git a/docs/hr/content/docs/walkthrough/ai_apis.md b/docs/hr/content/docs/walkthrough/ai_apis.md
@@ -10,15 +10,17 @@ these providers is similar to instantiating a `Model`:
 
 ## OpenAI
 
-Supported:
+**Supported**
 
-- Embeddings
-- Chat models
-- Image generation models
-- Image edit models
-- Audio transcription models
+| Description | Class-name |
+| --- | --- |
+| Embeddings | `OpenAIEmbedding` |
+| Chat models | `OpenAIChatCompletion` |
+| Image generation models | `OpenAIImageCreation` |
+| Image edit models | `OpenAIImageEdit` |
+| Audio transcription models | `OpenAIAudioTranscription` |
 
-The general pattern is:
+**Usage**
 
 ```python
 from superduperdb.ext.openai import OpenAI<ModelType> as ModelCls
@@ -28,6 +30,15 @@ db.add(Modelcls(identifier='my-model', **kwargs))
 
 ## Cohere
 
+**Supported**
+
+| Description | Class-name |
+| --- | --- |
+| Embeddings | `CohereEmbedding` |
+| Chat models | `CohereChatCompletion` |
+
+**Usage**
+
 ```python
 from superduperdb.ext.cohere import Cohere<ModelType> as ModelCls
 
@@ -36,6 +47,14 @@ db.add(Modelcls(identifier='my-model', **kwargs))
 
 ## Anthropic
 
+**Supported**
+
+| Description | Class-name |
+| --- | --- |
+| Chat models | `AnthropicCompletions` |
+
+**Usage**
+
 ```python
 from superduperdb.ext.anthropic import Anthropic<ModelType> as ModelCls
 

diff --git a/docs/hr/content/docs/walkthrough/ai_models.md b/docs/hr/content/docs/walkthrough/ai_models.md
@@ -4,7 +4,7 @@ sidebar_position: 18
 
 # AI Models via `Model` and Descendants
 
-AI models may be wrapped and used in `superduperdb` by the `Model` class and descendants.
+AI models may be wrapped and used in `superduperdb` with the `Model` class and descendants.
 
 ### Creating AI Models in a Range of Frameworks
 
@@ -43,7 +43,8 @@ from superduperdb import superduper
 db.add(Pipeline(task='sentiment-analysis'))
 ```
 
-There is also support for building the pipeline in separate stages with a high degree of customization:
+There is also support for building the pipeline in separate stages with a high degree of customization.
+The following is a speech-to-text model published by [facebook research](https://arxiv.org/abs/2010.05171) and shared [on Hugging-Face](https://huggingface.co/facebook/s2t-small-librispeech-asr):
 
 ```python
 from superduperdb.ext.transformers import Pipeline
@@ -91,4 +92,3 @@ db.add(model)
 | `postprocess` | `Callable` applied to individual rows/items or output |
 | `encoder` | An `Encoder` instance applied to the model output to save that output in the database |
 | `schema` | A `Schema` instance applied to a model's output, whose rows are dictionaries |
-```
diff --git a/docs/hr/content/docs/walkthrough/apply_models.md b/docs/hr/content/docs/walkthrough/apply_models.md
@@ -8,18 +8,13 @@ sidebar_position: 21
 
 ## Procedural API
 
-Applying a model to data, is straightforward with `Model.predict`:
-
-import Tabs from '@theme/Tabs';
-import TabItem from '@theme/TabItem';
-
+Applying a model to data, is straightforward with `Model.predict`.
 
 ### Out-of-database prediction
 
 As is standard in `sklearn` and other AI libraries and frameworks, such as `tensorflow.keras`,
-all `superduperdb` models, support `.predict`, predicting directly on datapoints:
-
-To use this functionality, supply the datapoint directly to the `Model`:
+all `superduperdb` models, support `.predict`, predicting directly on datapoints.
+To use this functionality, supply the datapoints directly to the `Model`:
 
 ```python
 my_model = ...  # code to instantiate model
@@ -35,15 +30,11 @@ my_model.predict(X=<input_datum>, one=True)
 
 ### In-database, one-time model prediction
 
-
 It is possible to apply a model directly to the database with `Model.predict`.
-The parameter `X` refers to the field/column of data which is passed to the model.
+In this context, the parameter `X` refers to the field/column of data which is passed to the model.
 `X="_base"` passes all of the data (all columns/ fields).
 
-```mdx-code-block
-<Tabs>
-<TabItem value="mongodb" label="MongoDB">
-```
+#### MongoDB
 
 ```python
 my_model = ...  # code to instantiate model
@@ -55,10 +46,7 @@ my_model.predict(
 )
 ```
 
-```mdx-code-block
-</TabItem>
-<TabItem value="sql" label="SQL">
-```
+#### SQL
 
 ```python
 table = db.load('my-table', 'table_or_collection')
@@ -72,11 +60,6 @@ my_model.predict(
 )
 ```
 
-```mdx-code-block
-</TabItem>
-</Tabs>
-```
-
 ### In database, daemonized model predictions with `listen=True`
 
 If is also possible to apply a model to create predictions, and also

diff --git a/docs/hr/content/docs/walkthrough/daemonizing_models_with_listeners.md b/docs/hr/content/docs/walkthrough/daemonizing_models_with_listeners.md
@@ -5,10 +5,10 @@ sidebar_position: 22
 # Daemonizing `.predict` with listeners
 
 In many AI applications, it's important that a catalogue of predictions is maintained for 
-all data in the database, or even all changed data.
+all data in the database, updated as soon after data-updates and streaming inserts as possible.
 
 In order to allow developers to implement this functionality, `superduperdb` offers
-the `Listener` abstraction:
+the `Listener` abstraction.
 
 ## Creating listeners in-line with `.predict`
 
@@ -41,10 +41,13 @@ db.add(
 )
 ```
 
-## Explanation
+## Outcome
 
 If a `Listener` has been created, whenever new data is added to `db`, 
 the `Predictor` instance is loaded and predictions are evaluated on the inserted data.
 
-If change-data-capture has been activated, this process applies, even if the data is inserted
-from a client other than `superduperdb`.
+:::info
+In MongoDB, if [change-data-capture (CDC)](../production/change_data_capture.md) has been configured, 
+data may even be inserted from third-party clients such as `pymongo`, and is nonetheless still processed
+by configured `Listeners` via the CDC service.
+:::
diff --git a/docs/hr/content/docs/walkthrough/data_encodings_and_schemas.md b/docs/hr/content/docs/walkthrough/data_encodings_and_schemas.md
@@ -47,24 +47,32 @@ audio = Encoder('audio', encoder=encoder, decoder=decoder)
 
 It's completely open to the user how exactly the `encoder` and `decoder` arguments are set.
 
-You may include this `Encoder` in models, data-insers and more. You can also directly 
-register `audio` in the system, using:
+You may include these `Encoder` instances in models, data-inserts and more. You can also directly 
+register the `Encoder` instances in the system, using:
 
 ```python
+db.add(my_array)
 db.add(audio)
 ```
 
+To reload (for instance in another session) do:
+
+```python
+my_array_reloaded = db.load('encoder', 'my_array')
+audio_reloaded = db.load('encoder', 'audio')
+```
+
 ## Schemas for SQL
 
-In SQL, one needs to define a schemas to work with tables in `superduperdb`. The `superduperdb.Schema` 
-builds on top of `Encoder` and allows developers to combine standard data-types used in standard 
-use-cases, with bespoke data-types via `Encoder`, as defined by, for instance, `audio` above.
+For SQL databases, one needs to define a schemas to work with tables in `superduperdb`. The `superduperdb.Schema` 
+builds on top of `Encoder` and allows developers to combine standard data-types traditionall used in SQL data-bases,
+with bespoke data-types via `Encoder`, as defined by, for instance, `audio` above.
 
 To register/ create a `Table` with a `Schema` in `superduperdb`, one uses `superduperdb.backends.ibis.Table`:
 
 ```python
 from superduperdb.backends.ibis import Table, dtype
-from superduperdb import Table
+from superduperdb import Schema 
 
 db.add(
     Table(

diff --git a/docs/hr/content/docs/walkthrough/encoding_special_data_types.md b/docs/hr/content/docs/walkthrough/encoding_special_data_types.md
@@ -4,20 +4,20 @@ sidebar_position: 14
 
 # Inserting images, audio, video and other special data
 
-We discovered earlier, that an initial step in working with `superduperdb`
+An initial step in working with `superduperdb`
 is to establish the data-types one wishes to work with, create `Encoder` instances for
-those data-types, and potentially
-`Schema` objects for SQL tables.
+those data-types, and potentially `Schema` objects for SQL tables. See [here](./data_encodings_and_schemas.md) for 
+this information.
 
-If these have been created, data may be inserted which use these data-types.
-
-A previously defined `Encoder` may be used directly to insert data to the database.
+If these have been created, data may be inserted which use these data-types, including previously defined `Encoder` instances.
 
 ## MongoDB
 
 ```python
 from superduperdb import Document
 
+my_array = db.load('encoder', 'my_array')
+
 files = ... # list of paths to audio files
 
 db.execute(
@@ -33,7 +33,12 @@ db.execute(
 
 ## SQL
 
+With SQL tables, it's important to acknowledge
+
 ```python
+files = ... # list of paths to audio files
+
+table = db.load('table', 'my-table')
 
 df = pandas.DataFrame([
     {
@@ -44,6 +49,4 @@ df = pandas.DataFrame([
 ])
 
 db.execute(table.insert(df))
-```
-
-
+```
diff --git a/docs/hr/content/docs/walkthrough/inserting_data.md b/docs/hr/content/docs/walkthrough/inserting_data.md
@@ -6,7 +6,7 @@ sidebar_position: 2
 
 After configuring and connecting, you're ready to insert some data.
 
-In SuperDuperDB, data may be inserted using the SuperDuperDB connection `db`, 
+In `superduperdb`, data may be inserted using the connection `db`, 
 or using a third-parth client.
 
 ## SuperDuperDB data insertion
@@ -17,6 +17,7 @@ Here's a guide to using `db` to insert data.
 
 ```python
 from superduperdb.backends.mongodb import Collection
+from superduperdb import Document
 
 db.execute(
     Collection('<collection-name>')
@@ -27,6 +28,20 @@ db.execute(
 The `records` may be any dictionaries supported by MongoDB, as well as dictionaries
 containing items which may converted to `bytes` strings.
 
+Other MongoDB clients may also be used for insertion. Here, one needs to explicitly 
+take care of conversion of data to `bytes` wherever `Encoder` instances have been used.
+For instance, using `pymongo`, one may do:
+
+```python
+from superduperdb import Document
+
+collection = pymongo.MongoClient(uri='<your-database-uri>').my_database['<collection-name>']
+collection.insert_many([
+    Document(record).encode() for record in records
+])
+
+```
+
 ### SQL
 
 Similarly
@@ -52,4 +67,17 @@ db.execute(
     Table('<table-name>')
         .insert(pandas.DataFrame(records))
 )
+```
+
+Native clients may also be used to insert data. Here, one needs to explicitly 
+take care of conversion of data to `bytes` wherever `Encoder` instances have been used. 
+For instance, in DuckDB, one may do:
+
+```python
+import duckdb
+import pandas
+
+my_df = pandas.DataFrame([Document(r).encode() for r in records])
+
+duckdb.sql("INSERT INTO <table-name> SELECT * FROM my_df")
 ```
diff --git a/docs/hr/content/docs/walkthrough/linking_interdependent_models.md b/docs/hr/content/docs/walkthrough/linking_interdependent_models.md
@@ -49,4 +49,4 @@ l2 = Listener(
 ```
 
 This implies that whenever data is inserted to `collection`, `model_1` will compute outputs on that data first, 
-which we subsequently be consumed by `model_2` as inputs; it's outputs will then also be saved to SuperDuperDB.
+which will subsequently be consumed by `model_2` as inputs; its outputs will then also be saved to `db`.
diff --git a/docs/hr/content/docs/walkthrough/referring_to_data_from_diverse_sources.md b/docs/hr/content/docs/walkthrough/referring_to_data_from_diverse_sources.md
@@ -4,14 +4,17 @@ sidebar_position: 15
 
 # Working with external data sources
 
-Using the MongoDB query API, `superduperdb` supports data added from external data-sources.
-The trick is to pass the `uri` parameter to an encoder, instead of the raw-data:
+:::warning
+This functionality is currently supported for MongoDB only
+:::
 
+Using the MongoDB query API, `superduperdb` supports data added from external data-sources.
 When doing this, `superduperdb` supports:
 
 - web URLs
 - URIs of objects in `s3` buckets
 
+The trick is to pass the `uri` parameter to an encoder, instead of the raw-data.
 Here is an example where we add a `.pdf` file directly from a location 
 on the public internet.
 
@@ -22,21 +25,23 @@ from superduperdb.backends.mongodb import Collection
 
 collection = Collection('pdf-files')
 
-
 def load_pdf(bytes):
     text = []
     for page in PdfReader(io.BytesIO(bytes)).pages:
         text.append(page.extract_text())
     return '\n----NEW-PAGE----\n'.join(text)
 
-
+# no `encoder=...` parameter required since text is not converted to `.pdf` format
 pdf_enc = Encoder('my-pdf-encoder', decoder=load_pdf)
 
 PDF_URI = (
     'https://papers.nips.cc/paper_files/paper/2012/file/'
     'c399862d3b9d6b76c8436e924a68c45b-Paper.pdf'
 )
 
+# This command inserts a record which refers to this URI
+# and also downloads the content from the URI and saves
+# it in the record
 db.execute(
     collection.insert_one(Document({'txt': pdf_enc(uri=PDF_URI)}))
 )