Increase superset userbase by allowing upload csv out of the box #23399
Replies: 8 comments 2 replies
-
This does require having some sort of datasource connected (i.e. a database), to which you would upload these CSV resources. While Superset does ship with a metadata database (for storing internal objects, e.g. dashboards/charts), it does not ship with a default analytical database. Typically it'd be an anti-pattern to overload the metadata database as an analytical database, and typically anyone deploying superset will have their own business reasons to choose from a whole range of possible analytical databases. So that's essentially why this isn't a default thing... it would add to the maintenance footprint, as well as just being a bit of a sidecar that most people would likely override with their own DB anyway, which puts us back in the current state of affairs. That said, there are commercial offerings of Superset (e.g. Preset) that do include such a feature - Preset lets you upload a CSV directly upon signup, for exactly the reasons you propose here, namely that it's the first thing many users do to "kick the tires" on the product. |
Beta Was this translation helpful? Give feedback.
-
I totally agree, this is a big friction point in getting people to use Superset. At Preset we did two things to improve this, but both are not trivial to set up:
Additionally, I have a PR open (since April 2021) to create a "native Superset database": #14225. I wonder if this could become a standard database that is installed by default, and it could be used to handle file uploads, in addition to other functionality (like connecting Superset to APIs, like Google Calendar). These are all overengineered (and complex) solutions, though. I think a good solution would be to have the default installation contain 2 SQLite databases: examples and a database where users can upload CSVs. That would be easy to implement, though we'd have to change the default value of |
Beta Was this translation helpful? Give feedback.
-
I'm just guessing and might be completely wrong. Why not just define another PostgreSQL database service in |
Beta Was this translation helpful? Give feedback.
-
I agree this can potentially be a friction point, I personally like @betodealmeida reference to Gsheets file upload and future smooth/easy OAuth2 databases setup. @sebastianliebscher proposal is an easy win, I guess it depends on how most of our users do their first Superset install, do they mainly use One of the main problems of allowing SQLLite db's is that they allow users to create a database that is a file on the web server, this can potentially cause multiple problems some identified some not identified. We do aim for a Secure by default Superset. |
Beta Was this translation helpful? Give feedback.
-
I'm most concern with expansion of superset usage.
Goal: Bring another 10,000 new users of superset within 6 months.
User story: Lucas S. Has received import data in CSV. He wants to install
superset on his Linux desktop, and analyze the data. He wants to be ready
to analyze within 5 min. He understand that if data is valuable he will
refer to production installation guide to setup production ready version.
He wants out of the box working system that is open source and has a lot of
features, rather then closed.
To accomplish this: I think current default of sqllite is sufficient just
need write access enabled. If the analytics part also need database,I think
sqllite is also sufficient.
Thanks
Lucas
…On Fri, Oct 6, 2023, 6:21 PM Sam Firke ***@***.***> wrote:
This sounds great to me and to the point made by @sebastianliebscher
<https://github.com/sebastianliebscher> in #25546
<#25546> we could also put the
examples in that separate Postgres container. Like @dpgaspar
<https://github.com/dpgaspar> asked, I'm not sure what that means for
users doing pip install.
I'd be fine with having this superior (IMO) experience for docker compose
even if we couldn't match it for pip, as long as it didn't introduce
excessive maintenance load. And if we could use a new container to solve
this problem, it would also enable #25446
<#25446> which would be a
2-for-1.
—
Reply to this email directly, view it on GitHub
<#23399 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAL4YEKDHQZQMDJW334NC2DX6CHABAVCNFSM6AAAAAAV5RN73OVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TEMJUGIZTQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Relevant to this thread: the Superset community voted to remove support for SQLite as a metadata database, this is scheduled to happen at the next major release (4.0). But I think @lszyba1 your goal of .csv upload supported out-of-the-box is just as possible without SQLite. At that point everyone installing Superset will have to use a more robust metadata DB. That won't change with docker-compose, the fastest way to get up and running; it already uses a Postgres container. I guess for Either way, given that requirement, could we just put examples in a separate DB in the same service as the metadata DB? It could hold examples and accept ad-hoc .csv uploads. |
Beta Was this translation helpful? Give feedback.
-
Hello.
Thanks. Can you send me to the vote details.
Yes, I don't necessarily advocating sqllite, but I do think requiring CSV
write and building analysis on a file out of the box is a must.
I also don't think docer requirements are inline. I think installing
superset via pip install or brew install should be level 1.
Installing docer with Postgres level 2.
Installing app, separate server for database, HIPPA, Must,CO IT, PII,
compliant installation level 3.
I don't think we need to combine levels of security and therefore sacrifice
product's ease of use. If we can separate the level /use cases or even
better add a new parameter during installation or configuration that turns
on level 1 out of box analysis and CSV writting capabilities that could
work as well.
Thanks
Lucas
…On Mon, Oct 9, 2023, 8:28 AM Sam Firke ***@***.***> wrote:
Relevant to this thread: the Superset community voted to remove support
for SQLite as a metadata database
<#8874>, this is scheduled to
happen at the next major release
<https://github.com/orgs/apache/projects/292/views/1?pane=issue&itemId=40454778>
(4.0). But I think @lszyba1 <https://github.com/lszyba1> your goal of
.csv upload supported out-of-the-box is just as possible without SQLite.
At that point everyone installing Superset will have to use a more robust
metadata DB. That won't change with docker-compose, the fastest way to get
up and running; it already uses a Postgres container. I guess for pip
install deployment, folks will need to supply a Postgres or MySQL
database, either running on the same machine or elsewhere?
Either way, given that requirement, could we just put examples in a
separate DB in the same service as the metadata DB? It could hold examples
and accept ad-hoc .csv uploads.
—
Reply to this email directly, view it on GitHub
<#23399 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAL4YEJBO34FGTAVCRM5KQDX6P3Y3AVCNFSM6AAAAAAV5RN73OVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TEMZQGUZDC>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hello
I want to be upfront. The choice of database for me is indifferent but
"easy to use" for a user who might be: "aspiring data scientist, aspiring
programmer, or just a medium advanced data user" needs to be at level1 =
easy.
If we/superset can provide a writable database out of the box, then that
solves the problem because all features work out of the box.
I went over the reason in #8874
<#8874> which really comes down to
people who are serious about superset will start with sqlite and grow and
never move to a proper database causing more issues during upgrades. I
agree, if you are serious maybe increasing entries to the proper database
is correct to make sure you don't fail later. So how can we allow users to
use superset in "easy" mode. pip install superset, done
Maybe that page:
https://superset.apache.org/docs/installation/installing-superset-from-scratch/
should be modified right now to use postgre, and one should be able to
configure superset in under 4min with helper commands?
Sometimes when you can't solve a problem, we can use "Charlie Munger" brain
trick and ask the reverse question:
Mental exercise here:
How can we make sure superset doesn't grow in userbase?:
- Make the process of installation so complicated that more than 33% of
users fail to get it up and running.
- Make the process so that even if they install it, not all features are
enabled, and they need to spend hours to figure out how.
- Make sure the process requires sudo, or requires company admin to approve
the installation.
- Not allowing him to control his environment. (Offer a magic in a box
solution similar to docker, where for sure a user that is serious will not
know how it was configured, nor how it works.)
- Not provide easy steps to go from "demo to prod v1, to 55 users to 144,
to 233 users).
How do we solve this so that we can gain 10,000 more users in 6 months?
It seems to me if SIP33 fastest incremental value to end user would be to
add documentation on :
I can't speak of sqllite support as I don't have enough info, but maybe if
everyone one has gone through "how to get started" and followed proper
database setup, then there would be no need to remove it.
(So who has a postgre requirement? and what can we learn from them during
installation? (wordpress?))
- Under installing from scratch
<https://superset.apache.org/docs/installation/installing-superset-from-scratch>
modify section to talk about using postgre
<https://superset.apache.org/docs/databases/db-connection-ui>
So I guess the process might look like :
1. pip install superset
2. decide what database you want...follow docs [here]: or pick our
recommended postgre for new users.
3. Give superset your selection of database you will use.
4. enter admin username/password in our installation process. (or refer to
quick docs how to install postgre on linux for first time users)
Thanks
Lucas
…On Tue, Oct 10, 2023 at 9:39 AM Sam Firke ***@***.***> wrote:
Sure the votes are here:
***@***.***:2020-1:sip-33 There
was minimal discussion.
I appreciate your perspective on pip install being the easiest way to
install. Right now there's no documentation here about creating a database:
https://superset.apache.org/docs/installation/installing-superset-from-scratch/
Is that because this method creates a SQLite database and handles that for
the user? In which case we are going to want some documentation on that
page re: how to configure the DB, maybe providing a sample script the way
e.g., OpenMetadata does
<https://github.com/open-metadata/OpenMetadata/blob/main/docker/postgresql/postgres-script.sql>
.
—
Reply to this email directly, view it on GitHub
<#23399 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAL4YEKUBOSQW6VPRDQPUVDX6VMZJAVCNFSM6AAAAAAV5RN73OVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TENBSGAZDK>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Currently there is no way to upload csv and start analyzing using superset with default installation, even by changing one config file.
To make this feature available with ease we have one optimal options:
and modify setting that currently prevents user from connecting to sqlite and make it default. (optimal)
An error occurred while fetching databases: {"sqlalchemy_uri":["SQLiteDialect_pysqlite cannot be used as a data source for security reasons."]}
set:
PREVENT_UNSAFE_DB_CONNECTIONS = False
Why this is optimal solution:
Not optimal solutions:
2. Allow main database "superset.db" to allow uploads. (not optimal)
3. create a new database mydata.db and make it default to upload csv. (not optimal)(in production this would mean we need to remove this database on each install)
17 votes ·
Beta Was this translation helpful? Give feedback.
All reactions