Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add quoteReserved (SQL KEYWORDS) option when ingesting data from CSV #4344

Open
munkeholler opened this issue Dec 11, 2024 · 1 comment
Open

Comments

@munkeholler
Copy link

For consideration:

When importing data, incoming column names that match SQL reserved keywords (e.g., DESC, GROUP, SELECT) currently require manual quoting to avoid parsing errors. This typically throws newcomers or in scenarios where schemas are automatically generated.
Propose adding an optional parameter, quoteReserved=True, to loading methods (e.g., read_csv_auto) to automatically quote such column names during schema creation and to subsequent queries where ingested data tables are involved.

Proposed Solution
Introduce a parameter, quoteReserved (default: False), to functions like read_csv_auto and read_csv.

When == True, this would:
Detect column names that are SQL reserved keywords.
Automatically add quotes around those column names during ingestion.
Pros:

  • Improved Usability: Simplifies ingestion of datasets with column names that conflict with SQL reserved keywords.
  • Backward Compatibility: By default, the feature is disabled (quoteReserved=False), ensuring no gotchas for existing codebase.
  • Ingestion of new/foreign datasets have 1 less snafu to handle.

Considerations
The quoteReserved flag would apply only during schema creation and not affect runtime query behaviour.
Users would still need to be aware of quoted column names when writing queries.

example usage:

input.txt
KEY|DESC|VALUE
1|description|100
2|description|200

con.execute("""
    CREATE TABLE my_table AS  SELECT * FROM read_csv_auto('input.txt', delim='|', header=True, quoteReserved=True)
""")

here the DESC field is automatically quoted to prevent an exception being thrown

in the following scenario where quoteReserved is True

SELECT KEY, DESC FROM my_table;

becomes

SELECT KEY, "DESC" FROM my_table;

likewise

SELECT A.KEY,A.DESC from my_table a;

becomes

SELECT A.KEY,A."DESC" from my_table a;

@Mytherin
Copy link
Collaborator

Thanks for the thoughts! Perhaps the normalize_names option already achieves what you're looking for? This prefixes reserved keywords with an underscore:

D from read_csv('input.txt', normalize_names=True);
┌───────┬─────────────┬───────┐
│  key  │    _desc    │ value │
│ int64 │   varchar   │ int64 │
├───────┼─────────────┼───────┤
│     1 │ description │   100 │
│     2 │ description │   200 │
└───────┴─────────────┴───────┘
D select _desc from read_csv('input.txt', normalize_names=True);
┌─────────────┐
│    _desc    │
│   varchar   │
├─────────────┤
│ description │
│ description │
└─────────────┘

@szarnyasg szarnyasg changed the title Add quoteReserved (SQL KEYWORDS) option when ingesting data Add quoteReserved (SQL KEYWORDS) option when ingesting data from CSV Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants