Skip to content

Commit

Permalink
Expose stream parameter in public strings convert APIs (#14255)
Browse files Browse the repository at this point in the history
Add stream parameter to public APIs:

- `cudf::strings::to_booleans()`
- `cudf::strings::from_booleans()`
- `cudf::strings::to_timestamps()`
- `cudf::strings::from_timestamps()`
- `cudf::strings::is_timestamp()`
- `cudf::strings::to_durations()`
- `cudf::strings::from_durations()`
- `cudf::strings::to_fixed_point()`
- `cudf::strings::from_fixed_point()`
- `cudf::strings::to_floats()`
- `cudf::strings::from_floats()`
- `cudf::strings::is_float()`
- `cudf::strings::to_integers()`
- `cudf::strings::from_integers()`
- `cudf::strings::is_integer()`
- `cudf::strings::hex_to_integers()`
- `cudf::strings::integers_to_hex()`
- `cudf::strings::is_hex()`
- `cudf::strings::ipv4_to_integers()`
- `cudf::strings::integers_to_ipv4()`
- `cudf::strings::is_ipv4()`
- `cudf::strings::url_encode()`
- `cudf::strings::url_decode()`
- `cudf::strings::format_list_column()`

Also cleaned up some of the doxygen comments and removed some default parameters.

Reference #13744

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - MithunR (https://github.com/mythrocks)
  - Nghia Truong (https://github.com/ttnghia)

URL: #14255
  • Loading branch information
davidwendt authored Oct 16, 2023
1 parent 6e00ad0 commit d590e0b
Show file tree
Hide file tree
Showing 24 changed files with 487 additions and 230 deletions.
32 changes: 18 additions & 14 deletions cpp/include/cudf/strings/convert/convert_booleans.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2022, NVIDIA CORPORATION.
* Copyright (c) 2019-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -35,14 +35,16 @@ namespace strings {
*
* Any null entries will result in corresponding null entries in the output column.
*
* @param strings Strings instance for this operation.
* @param true_string String to expect for true. Non-matching strings are false.
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return New BOOL8 column converted from strings.
* @param input Strings instance for this operation
* @param true_string String to expect for true. Non-matching strings are false
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
* @return New BOOL8 column converted from strings
*/
std::unique_ptr<column> to_booleans(
strings_column_view const& strings,
string_scalar const& true_string = string_scalar("true"),
strings_column_view const& input,
string_scalar const& true_string,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
Expand All @@ -53,16 +55,18 @@ std::unique_ptr<column> to_booleans(
*
* @throw cudf::logic_error if the input column is not BOOL8 type.
*
* @param booleans Boolean column to convert.
* @param true_string String to use for true in the output column.
* @param false_string String to use for false in the output column.
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return New strings column.
* @param booleans Boolean column to convert
* @param true_string String to use for true in the output column
* @param false_string String to use for false in the output column
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
* @return New strings column
*/
std::unique_ptr<column> from_booleans(
column_view const& booleans,
string_scalar const& true_string = string_scalar("true"),
string_scalar const& false_string = string_scalar("false"),
string_scalar const& true_string,
string_scalar const& false_string,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of doxygen group
Expand Down
34 changes: 20 additions & 14 deletions cpp/include/cudf/strings/convert/convert_datetime.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -77,16 +77,18 @@ namespace strings {
*
* @throw cudf::logic_error if timestamp_type is not a timestamp type.
*
* @param strings Strings instance for this operation.
* @param timestamp_type The timestamp type used for creating the output column.
* @param format String specifying the timestamp format in strings.
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return New datetime column.
* @param input Strings instance for this operation
* @param timestamp_type The timestamp type used for creating the output column
* @param format String specifying the timestamp format in strings
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
* @return New datetime column
*/
std::unique_ptr<column> to_timestamps(
strings_column_view const& strings,
strings_column_view const& input,
data_type timestamp_type,
std::string_view format,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
Expand Down Expand Up @@ -124,14 +126,16 @@ std::unique_ptr<column> to_timestamps(
* This will return a column of type BOOL8 where a `true` row indicates the corresponding
* input string can be parsed correctly with the given format.
*
* @param strings Strings instance for this operation.
* @param format String specifying the timestamp format in strings.
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return New BOOL8 column.
* @param input Strings instance for this operation
* @param format String specifying the timestamp format in strings
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
* @return New BOOL8 column
*/
std::unique_ptr<column> is_timestamp(
strings_column_view const& strings,
strings_column_view const& input,
std::string_view format,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
Expand Down Expand Up @@ -231,19 +235,21 @@ std::unique_ptr<column> is_timestamp(
* @throw cudf::logic_error if the `format` string is empty
* @throw cudf::logic_error if `names.size()` is an invalid size. Must be 0 or 40 strings.
*
* @param timestamps Timestamp values to convert.
* @param timestamps Timestamp values to convert
* @param format The string specifying output format.
* Default format is "%Y-%m-%dT%H:%M:%SZ".
* @param names The string names to use for weekdays ("%a", "%A") and months ("%b", "%B")
* Default is an empty `strings_column_view`.
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return New strings column with formatted timestamps.
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
* @return New strings column with formatted timestamps
*/
std::unique_ptr<column> from_timestamps(
column_view const& timestamps,
std::string_view format = "%Y-%m-%dT%H:%M:%SZ",
strings_column_view const& names = strings_column_view(column_view{
data_type{type_id::STRING}, 0, nullptr, nullptr, 0}),
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of doxygen group
Expand Down
26 changes: 15 additions & 11 deletions cpp/include/cudf/strings/convert/convert_durations.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
* Copyright (c) 2020-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -65,16 +65,18 @@ namespace strings {
*
* @throw cudf::logic_error if duration_type is not a duration type.
*
* @param strings Strings instance for this operation.
* @param duration_type The duration type used for creating the output column.
* @param format String specifying the duration format in strings.
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return New duration column.
* @param input Strings instance for this operation
* @param duration_type The duration type used for creating the output column
* @param format String specifying the duration format in strings
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
* @return New duration column
*/
std::unique_ptr<column> to_durations(
strings_column_view const& strings,
strings_column_view const& input,
data_type duration_type,
std::string_view format,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
Expand Down Expand Up @@ -115,15 +117,17 @@ std::unique_ptr<column> to_durations(
*
* @throw cudf::logic_error if `durations` column parameter is not a duration type.
*
* @param durations Duration values to convert.
* @param durations Duration values to convert
* @param format The string specifying output format.
* Default format is ""%d days %H:%M:%S".
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return New strings column with formatted durations.
* Default format is ""%D days %H:%M:%S".
* @param mr Device memory resource used to allocate the returned column's device memory
* @param stream CUDA stream used for device memory operations and kernel launches
* @return New strings column with formatted durations
*/
std::unique_ptr<column> from_durations(
column_view const& durations,
std::string_view format = "%D days %H:%M:%S",
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of doxygen group
Expand Down
30 changes: 18 additions & 12 deletions cpp/include/cudf/strings/convert/convert_fixed_point.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
* Copyright (c) 2021-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -53,14 +53,16 @@ namespace strings {
*
* @throw cudf::logic_error if `output_type` is not a fixed-point decimal type.
*
* @param input Strings instance for this operation.
* @param output_type Type of fixed-point column to return including the scale value.
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return New column of `output_type`.
* @param input Strings instance for this operation
* @param output_type Type of fixed-point column to return including the scale value
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
* @return New column of `output_type`
*/
std::unique_ptr<column> to_fixed_point(
strings_column_view const& input,
data_type output_type,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
Expand All @@ -83,12 +85,14 @@ std::unique_ptr<column> to_fixed_point(
*
* @throw cudf::logic_error if the `input` column is not a fixed-point decimal type.
*
* @param input Fixed-point column to convert.
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return New strings column.
* @param input Fixed-point column to convert
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
* @return New strings column
*/
std::unique_ptr<column> from_fixed_point(
column_view const& input,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
Expand All @@ -111,14 +115,16 @@ std::unique_ptr<column> from_fixed_point(
*
* @throw cudf::logic_error if the `decimal_type` is not a fixed-point decimal type.
*
* @param input Strings instance for this operation.
* @param decimal_type Fixed-point type (with scale) used only for checking overflow.
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return New column of boolean results for each string.
* @param input Strings instance for this operation
* @param decimal_type Fixed-point type (with scale) used only for checking overflow
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
* @return New column of boolean results for each string
*/
std::unique_ptr<column> is_fixed_point(
strings_column_view const& input,
data_type decimal_type = data_type{type_id::DECIMAL64},
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of doxygen group
Expand Down
30 changes: 18 additions & 12 deletions cpp/include/cudf/strings/convert/convert_floats.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
* Copyright (c) 2021-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -39,14 +39,16 @@ namespace strings {
*
* @throw cudf::logic_error if output_type is not float type.
*
* @param strings Strings instance for this operation.
* @param output_type Type of float numeric column to return.
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return New column with floats converted from strings.
* @param strings Strings instance for this operation
* @param output_type Type of float numeric column to return
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
* @return New column with floats converted from strings
*/
std::unique_ptr<column> to_floats(
strings_column_view const& strings,
data_type output_type,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
Expand All @@ -62,12 +64,14 @@ std::unique_ptr<column> to_floats(
*
* @throw cudf::logic_error if floats column is not float type.
*
* @param floats Numeric column to convert.
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return New strings column with floats as strings.
* @param floats Numeric column to convert
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
* @return New strings column with floats as strings
*/
std::unique_ptr<column> from_floats(
column_view const& floats,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
Expand All @@ -86,12 +90,14 @@ std::unique_ptr<column> from_floats(
*
* Any null row results in a null entry for that row in the output column.
*
* @param strings Strings instance for this operation.
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return New column of boolean results for each string.
* @param input Strings instance for this operation
* @param stream CUDA stream used for device memory operations and kernel launches
* @param mr Device memory resource used to allocate the returned column's device memory
* @return New column of boolean results for each string
*/
std::unique_ptr<column> is_float(
strings_column_view const& strings,
strings_column_view const& input,
rmm::cuda_stream_view stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of doxygen group
Expand Down
Loading

0 comments on commit d590e0b

Please sign in to comment.