Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented capability to separate diff logs via log4j2 #315

Merged
merged 3 commits into from
Oct 4, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Note:
--class com.datastax.cdm.job.DiffData cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
```

- Validation job will report differences as “ERRORS” in the log file as shown below
- Validation job will report differences as “ERRORS” in the log file as shown below.

```
23/04/06 08:43:06 ERROR DiffJobSession: Mismatch row found for key: [key3] Mismatch: Target Index: 1 Origin: valueC Target: value999)
Expand All @@ -79,6 +79,17 @@ Note:

- Please grep for all `ERROR` from the output log files to get the list of missing and mismatched records.
- Note that it lists differences by primary-key values.
- If you would like to redirect such logs into a separate file, you could use the `log4j2.properties` file [provided here](./src/resources/log4j2.properties) as shown below

```
./spark-submit --properties-file cdm.properties \
--conf spark.cdm.schema.origin.keyspaceTable="<keyspacename>.<tablename>" \
--conf "spark.executor.extraJavaOptions='-Dlog4j.configurationFile=log4j2.properties'" \
--conf "spark.driver.extraJavaOptions='-Dlog4j.configurationFile=log4j2.properties'" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be the name of the 2 log files that will get generated? Could we show that as an example? TY

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the details of the log file are in the log4j2.properties file. CDM users can change the details as needed, but if they use the file as-is, the logs will be directed 3 ways as below

  • All non app related logs (i.e. logs generated by libraries & not CDM code) will continue to be written to console
  • All logs generated by CDM code except for the diff related logs (i.e. details of missing & mismatched rows) will be recorded in ./cdm_logs/cdm.log file.
  • All diff related logs (i.e. details of missing & mismatched rows) will be recorded in ./cdm_logs/cdm_diff.log file.

Note: All log files will be rolled when they cross the set size limit (default is 10MB), there can be 100 rollovers by default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, most customers will not use this feature, but some customers who plan to run only validation & expect to have a lot of diff rows want a separate file for such records in a separate file instead of having to grep & find it.

--master "local[*]" --driver-memory 25G --executor-memory 25G \
--class com.datastax.cdm.job.DiffData cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
```

- The Validation job can also be run in an AutoCorrect mode. This mode can
- Add any missing records from origin to target
- Update any mismatched records between origin and target (makes target same as origin).
Expand All @@ -102,7 +113,7 @@ Note:
```

# Perform large-field Guardrail violation checks
- The tool can be used to identify large fields from a table that may break you cluster guardrails (e.g. AstraDB has a 10MB limit for a single large field) `--class com.datastax.cdm.job.GuardrailCheck` as shown below
- The tool can be used to identify large fields from a table that may break you cluster guardrails (e.g. AstraDB has a 10MB limit for a single large field), use class option `--class com.datastax.cdm.job.GuardrailCheck` as shown below

```
./spark-submit --properties-file cdm.properties \
Expand Down
3 changes: 2 additions & 1 deletion RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Release Notes
## [4.4.2] - 2024-10-TBD
## [4.4.2] - 2024-10-03
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## [4.4.2] - 2024-10-03
## [4.5.0] - 2024-10-03

- Upgraded to use log4j 2.x and included a template properties file that will help separate general logs from CDM class specific logs including a separate log for rows identified by `DiffData` (Validation) errors.
- Upgraded to use Spark `3.5.3`.

## [4.4.1] - 2024-09-20
Expand Down
12 changes: 12 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,18 @@
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-1.2-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -173,8 +173,6 @@ public List<Number> getNumberList(String propertyName) {

@Override
public List<Integer> getIntegerList(String propertyName) {
List<Integer> intList = new ArrayList<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch 👍🏼

Integer i;
if (null == propertyName || PropertyType.NUMBER_LIST != getType(propertyName)
|| null == getNumberList(propertyName))
return null;
Expand All @@ -188,7 +186,6 @@ public Boolean getBoolean(String propertyName) {

@Override
public String getAsString(String propertyName) {
String rtn;
if (null == propertyName)
return null;
PropertyType t = getType(propertyName);
Expand Down
22 changes: 0 additions & 22 deletions src/resources/log4j.properties

This file was deleted.

16 changes: 0 additions & 16 deletions src/resources/log4j.xml

This file was deleted.

55 changes: 55 additions & 0 deletions src/resources/log4j2.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Copyright DataStax, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

appender.0.type = Console
appender.0.name = CONSOLE
appender.0.layout.type = PatternLayout
appender.0.layout.pattern = %d %-5p [%t] %c{1}:%L - %m%n

appender.1.type = RollingFile
appender.1.name = MAIN
appender.1.fileName = cdm_logs/cdm.log
appender.1.filePattern = cdm_logs/cdm.%d{yyyy-MM-dd-HHmm}.%i.log
appender.1.layout.type = PatternLayout
appender.1.layout.pattern = %d %-5p [%t] %c{1}:%L - %m%n
appender.1.policy.type = Policies
appender.1.policy.0.type = OnStartupTriggeringPolicy
appender.1.policy.1.type = SizeBasedTriggeringPolicy
appender.1.policy.1.size = 10m

appender.2.type = RollingFile
appender.2.name = DIFF
appender.2.fileName = cdm_logs/cdm_diff.log
appender.2.filePattern = cdm_logs/cdm_diff.%d{yyyy-MM-dd-HHmm}.%i.log
appender.2.layout.type = PatternLayout
appender.2.layout.pattern = %d %-5p [%t] %c{1}:%L - %m%n
appender.2.policy.type = Policies
appender.2.policy.0.type = OnStartupTriggeringPolicy
appender.2.policy.1.type = SizeBasedTriggeringPolicy
appender.2.policy.1.size = 10m

rootLogger.level = INFO
rootLogger.appenderRef.0.ref = CONSOLE
rootLogger.appenderRef.0.level = INFO

logger.0.name = com.datastax.cdm
logger.0.level = INFO
logger.0.additivity = false
logger.0.appenderRef.0.ref = MAIN

logger.1.name = com.datastax.cdm.job.DiffJobSession
logger.1.level = ERROR
logger.1.additivity = false
logger.1.appenderRef.0.ref = DIFF
Loading