Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-36941][docs] Update DATE_FORMAT Doc and Python Tests #25828

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yiyutian1
Copy link

@yiyutian1 yiyutian1 commented Dec 20, 2024

What is the purpose of the change

To update the documentation for DATE_FORMAT, which is not accurate now. The current function behaviors are:

  • DATE_FORMAT(timestamp, string) eventually calls formatTimestamp(TimestampData ts, String format, ZoneId zoneId) in DateTimeUtils.java, which uses DateTimeFormatter.

  • DATE_FORMAT(string, string) eventually calls formatTimestampStringWithOffset(String dateStr, String fromFormat, String toFormat, TimeZone tz, long offsetMills) in DateTimeUtils.java, which uses SimpleDateFormat.

Brief change log

  • Update documentation for DATE_FORMAT.
  • Add additional python tests for specific DATE_FORMAT functions.

Verifying this change

Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.

Added python tests for for specific DATE_FORMAT functions.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@yiyutian1 yiyutian1 marked this pull request as ready for review December 20, 2024 09:42
@flinkbot
Copy link
Collaborator

flinkbot commented Dec 20, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@snuyanzin
Copy link
Contributor

fyi: it can not be both [hotfix] and marked with jira issue.
So please change commit message with something like [FLINK-36941][docs] ...

@gustavodemorais
Copy link
Contributor

Should it be [FLINK-36941][docs]?

Copy link
Contributor

@gustavodemorais gustavodemorais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from changing the title change - lgtm!
You still need a ✅ from someone else tho ;)

@@ -642,7 +642,10 @@ temporal:
description: Returns TRUE if two time intervals defined by (timepoint1, temporal1) and (timepoint2, temporal2) overlap. The temporal values could be either a time point or a time interval. E.g., (TIME '2:55:00', INTERVAL '1' HOUR) OVERLAPS (TIME '3:30:00', INTERVAL '2' HOUR) returns TRUE; (TIME '9:00:00', TIME '10:00:00') OVERLAPS (TIME '10:15:00', INTERVAL '3' HOUR) returns FALSE.
- sql: DATE_FORMAT(timestamp, string)
table: dateFormat(TIMESTAMP, STRING)
description: Converts timestamp to a value of string in the format specified by the date format string. The format string is compatible with Java's SimpleDateFormat.
description: Converts timestamp to a value of string in the format specified by the date format string. The format string is compatible with Java's DateTimeFormatter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: to a value of string -> to a string

@@ -350,14 +350,19 @@ def temporal_overlaps(left_time_point,
def date_format(timestamp, format) -> Expression:
"""
Formats a timestamp as a string using a specified format.
The format must be compatible with MySQL's date formatting syntax as used by the
date_parse function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this has been removed - do we not need to deprecate first before replacing?

description: Converts timestamp to a value of string in the format specified by the date format string. The format string is compatible with Java's DateTimeFormatter.
- sql: DATE_FORMAT(string, string)
table: dateFormat(STRING, STRING)
description: Re-format a timestamp string to another string, using a custom format. The format string is compatible with Java's SimpleDateFormat.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do the same for UNIX_TIMESTAMP(string1[, string2]) (https://github.com/apache/flink/pull/25828/files#diff-539fb22ee6aeee4cf07230bb4155500c6680c4cc889260e2c58bfa9d63fb7de5R662) please?

public static long unixTimestamp(String dateStr, String format, TimeZone tz) {
calls into
private static long internalParseTimestampMillis(String dateStr, String format, TimeZone tz) {
too - so the format is also using SimpleDateFormat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming I've read

to mean it invokes unixTimestamp(dateStr, format, null) as there is no tz third parameter in the SQL function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nictownsend sounds like a reasonable idea, @snuyanzin @yiyutian1 if you agree we could raise this separately.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I hadn't appreciated the scope of the underlying Flink issue, happy to raise a new one if it's not appropriate to fix here

Copy link
Contributor

@snuyanzin snuyanzin Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slightly different thought
Since it is Flink 2.0 and anyway it brings some breaking changes. May be we can completely switch to DateTimeFormatter in 2.0 while for 1.19 and 1.20 we just fix documentation.
However this option requires a discussion in ML
i can start one if no objections

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snuyanzin I like the idea.
Before that happens, should we fix the current doc?
If to only commit for 1.19 and 1.20, should I create PR against release-1.19 and release-1.20?

For example `date_format(col("time"), "%Y, %d %M")` results in strings formatted as
"2017, 05 May".
Supported functions:
1. date_format(TIMESTAMP, STRING) -> STRING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should mention the difference between the supported format strings.

@@ -299,6 +299,10 @@ def test_expressions(self):
lit(2).hours)))
self.assertEqual("dateFormat(time, '%Y, %d %M')",
str(date_format(col("time"), "%Y, %d %M")))
self.assertEqual("dateFormat(toTimestamp('1970-01-01 08:01:40'), 'yyyy-MM-dd HH:mm:ss')",
str(date_format(to_timestamp('1970-01-01 08:01:40'), "yyyy-MM-dd HH:mm:ss")))
self.assertEqual("dateFormat('1970-01-01 08:01:40', 'yyyy-MM-dd HH:mm:ss')",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a test with the format that the DateTimeFormatter could cope with - with precision and timezone

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @davidradl , sure I can add that.
Also want to point out that these test_expressions don't test any implementations, they just test the "signature" of defined functions.

@@ -642,7 +642,10 @@ temporal:
description: Returns TRUE if two time intervals defined by (timepoint1, temporal1) and (timepoint2, temporal2) overlap. The temporal values could be either a time point or a time interval. E.g., (TIME '2:55:00', INTERVAL '1' HOUR) OVERLAPS (TIME '3:30:00', INTERVAL '2' HOUR) returns TRUE; (TIME '9:00:00', TIME '10:00:00') OVERLAPS (TIME '10:15:00', INTERVAL '3' HOUR) returns FALSE.
- sql: DATE_FORMAT(timestamp, string)
table: dateFormat(TIMESTAMP, STRING)
description: Converts timestamp to a value of string in the format specified by the date format string. The format string is compatible with Java's SimpleDateFormat.
description: Converts timestamp to a value of string in the format specified by the date format string. The format string is compatible with Java's DateTimeFormatter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious what happens if you supply a TIMESTAMP_TZ into this function, is this expected to work?

@yiyutian1 yiyutian1 changed the title [FLINK-36941][hotfix] Update DATE_FORMAT Doc and Python Tests [FLINK-36941][docs] Update DATE_FORMAT Doc and Python Tests Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants