Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASSANDRA-19546: Add to_human_size and to_human_duration function #3741

Open
wants to merge 4 commits into
base: trunk
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
574 changes: 372 additions & 202 deletions .circleci/config.yml

Large diffs are not rendered by default.

153 changes: 153 additions & 0 deletions doc/modules/cassandra/pages/developing/cql/functions.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,159 @@ A number of functions allow to obtain the similarity score between vectors of fl

include::cassandra:partial$vector-search/vector_functions.adoc[]

[[human-helper-functions]]
==== Human helper functions

For user's convenience, there are currently two functions which are converting values to more human-friendly
represetations.

==== format_bytes

This function looks at values in a column as if it was in bytes, and it will convert it to whatever a user pleases. There are three ways how to call this function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: final .


Let's have this table:

[source,cql]
----
cqlsh> select * from ks.tb ;

id | val
----+----------------
5 | 60000
1 | 1234234
2 | 12342341234234
4 | 60001
7 | null
6 | 43
3 | 123423

----

with schema

[source,cql]
----
CREATE TABLE ks.tb (
id int PRIMARY KEY,
val bigint
)
----

Imagine that we wanted to look at `val` values as if they were in mebibytes. We would like to have more human-friendly output in order to not visually divide the values by 1024 in order to get them in respective bigger units. The following function call may take just a column itself as an argument, and it will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL what a mebibyte is :-)
https://simple.wikipedia.org/wiki/Mebibyte

automatically convert it.

[NOTE]
====
the default source unit for `format_bytes` function is _bytes_, (`B`).
====

[source,cql]
----
cqlsh> select format_bytes(val) from ks.tb ;

system.format_bytes(val)
---------------------------
58 KiB
1 MiB
11494 GiB
58 KiB
null
43 B
120 KiB
----

The second way to call `format_bytes` functions is to specify into what size unit we would like to see all
values to be converted to. For example, we want all size to be represented in mebibytes, hence we do:

[source,cql]
----
cqlsh> select format_bytes(val, 'MiB') from ks.tb ;

system.format_bytes(val, 'MiB')
----------------------------------
0 MiB
1 MiB
11770573 MiB
0 MiB
null
0 MiB
0 MiB
----

Lastly, we can specify a source unit and a target unit. A source unit tells what unit that column is logically of, the target unit tells what unit we want these values to be converted to. For example,
if we know that our column is logically in kibibytes and we want them to be converted into mebibytes, we would do:

[source,cql]
----
cqlsh> select format_bytes(val, 'Kib', 'MiB') from ks.tb ;

system.format_bytes(val, 'Kib', 'MiB')
-----------------------------------------
58 MiB
1205 MiB
12053067611 MiB
58 MiB
null
0 MiB
120 MiB
----

==== format_time

Similarly to `format_bytes`, we can do transformations on duration-like columns.

[NOTE]
====
the default source unit for `format_time` function is _milliseconds_, (`ms`).
====

[source,cql]
----
cqlsh> select format_time(val) from ks.tb ;

system.format_time(val)
-------------------------------
1 m
20 m
142851 d
1 m
null
43 ms
2 m
----

We may specify what unit we want that value to be converted to, give the column's values are in millisecods:

[source,cql]
----
system.format_time(val, 'm')
------------------------------------
1 m
20 m
205705687 m
1 m
null
0 m
2 m
----

Lastly, we can specify both source and target values:

[source,cql]
----
cqlsh> select format_time(val, 's', 'h') from ks.tb ;

system.format_time(val, 's', 'h')
-----------------------------------------
16 h
342 h
3428428120 h
16 h
null
0 h
34 h
----

[[user-defined-scalar-functions]]
=== User-defined functions

Expand Down
5 changes: 5 additions & 0 deletions src/java/org/apache/cassandra/config/DataStorageSpec.java
Original file line number Diff line number Diff line change
Expand Up @@ -627,6 +627,11 @@ public static DataStorageUnit fromSymbol(String symbol)
this.symbol = symbol;
}

public String getSymbol()
{
return symbol;
}

public long toBytes(long d)
{
throw new AbstractMethodError();
Expand Down
2 changes: 1 addition & 1 deletion src/java/org/apache/cassandra/config/DurationSpec.java
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ public TimeUnit unit()
* @param symbol the time unit symbol
* @return the time unit associated to the specified symbol
*/
static TimeUnit fromSymbol(String symbol)
public static TimeUnit fromSymbol(String symbol)
{
switch (symbol.toLowerCase())
{
Expand Down
Loading