Skip to content

Commit

Permalink
feat: JA4 fingerprinting (#4669)
Browse files Browse the repository at this point in the history
  • Loading branch information
lrstewart authored Aug 19, 2024
1 parent e2fcfab commit 3baca01
Show file tree
Hide file tree
Showing 21 changed files with 2,450 additions and 31 deletions.
26 changes: 22 additions & 4 deletions api/unstable/fingerprint.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,17 @@
* and marked as stable after an initial customer integration and feedback.
*/

/* Available fingerprinting methods.
*
* The current recommendation is to use JA4. JA4 sorts some of the lists it includes
* in the fingerprint, making it more resistant to the list reordering done by
* Chrome and other clients.
*/
typedef enum {
/*
* The current standard open source fingerprinting method.
* See https://engineering.salesforce.com/tls-fingerprinting-with-ja3-and-ja3s-247362855967.
*/
/* See https://engineering.salesforce.com/tls-fingerprinting-with-ja3-and-ja3s-247362855967 */
S2N_FINGERPRINT_JA3,
/* See https://github.com/FoxIO-LLC/ja4/tree/main */
S2N_FINGERPRINT_JA4,
} s2n_fingerprint_type;

struct s2n_fingerprint;
Expand Down Expand Up @@ -99,6 +104,11 @@ S2N_API int s2n_fingerprint_get_hash_size(const struct s2n_fingerprint *fingerpr
* - See https://engineering.salesforce.com/tls-fingerprinting-with-ja3-and-ja3s-247362855967
* - Example: "c34a54599a1fbaf1786aa6d633545a60"
*
* JA4: A string consisting of three parts, separated by underscores: the prefix,
* and the hex-encoded truncated SHA256 hashes of the other two parts of the raw string.
* - See https://github.com/FoxIO-LLC/ja4/blob/v0.18.2/technical_details/JA4.md
* - Example: "t13i310900_e8f1e7e78f70_1f22a2ca17c4"
*
* @param fingerprint The s2n_fingerprint to be used for the hash
* @param max_output_size The maximum size of data that may be written to `output`.
* If `output` is too small, an S2N_ERR_T_USAGE error will occur.
Expand Down Expand Up @@ -134,6 +144,14 @@ S2N_API int s2n_fingerprint_get_raw_size(const struct s2n_fingerprint *fingerpri
* 49188-49192-107-49187-49191-103-49162-49172-57-49161-49171-51-157-
* 156-61-60-53-47-255,11-10-35-22-23-13-43-45-51,29-23-30-25-24,0-1-2"
*
* JA4: A string consisting of three parts: a prefix, and two lists of hex values.
* - See https://github.com/FoxIO-LLC/ja4/blob/v0.18.2/technical_details/JA4.md
* - Example: "t13i310900_002f,0033,0035,0039,003c,003d,0067,006b,009c,009d,009e,
* 009f,00ff,1301,1302,1303,c009,c00a,c013,c014,c023,c024,c027,c028,
* c02b,c02c,c02f,c030,cca8,cca9,ccaa_000a,000b,000d,0016,0017,0023,
* 002b,002d,0033_0403,0503,0603,0807,0808,0809,080a,080b,0804,0805,
* 0806,0401,0501,0601,0303,0301,0302,0402,0502,0602"
*
* @param fingerprint The s2n_fingerprint to be used for the raw string
* @param max_output_size The maximum size of data that may be written to `output`.
* If `output` is too small, an S2N_ERR_T_USAGE error will occur.
Expand Down
2 changes: 2 additions & 0 deletions bindings/rust/s2n-tls/src/fingerprint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,14 @@ use core::ptr::NonNull;
#[derive(Copy, Clone)]
pub enum FingerprintType {
JA3,
JA4,
}

impl From<FingerprintType> for s2n_tls_sys::s2n_fingerprint_type::Type {
fn from(value: FingerprintType) -> Self {
match value {
FingerprintType::JA3 => s2n_tls_sys::s2n_fingerprint_type::FINGERPRINT_JA3,
FingerprintType::JA4 => s2n_tls_sys::s2n_fingerprint_type::FINGERPRINT_JA4,
}
}
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# JA4: TLS Client Fingerprinting

![JA4](https://github.com/FoxIO-LLC/ja4/blob/main/technical_details/JA4.png)

JA4 looks at the TLS Client Hello packet and builds a fingerprint of the client based on attributes within the packet.

### JA4 Algorithm:
(QUIC=”q” or TCP=”t”)
(2 character TLS version)
(SNI=”d” or no SNI=”i”)
(2 character count of ciphers)
(2 character count of extensions)
(first and last characters of first ALPN extension value)
_
(sha256 hash of the list of cipher hex codes sorted in hex order, truncated to 12 characters)
_
(sha256 hash of (the list of extension hex codes sorted in hex order)_(the list of signature algorithms), truncated to 12 characters)

The end result is a fingerprint that looks like:
t13d1516h2_8daaf6152771_b186095e22b6

## Details:
The program needs to ignore GREASE values anywhere it sees them: (https://datatracker.ietf.org/doc/html/draft-davidben-tls-grease-01#page-5)

### QUIC:
https://en.wikipedia.org/wiki/QUIC
“q” or “t”, which denotes whether the hello packet is for QUIC or TCP. QUIC is the protocol which the new HTTP/3 standard utilizes, encapsulating TLS 1.3 into UDP packets. As QUIC was developed by Google, if an organization heavily utilizes Google products, QUIC could make up half of their network traffic, so this is important to capture.

If the protocol is QUIC then the first character of the fingerprint is “q” if not, it’s “t”.

### TLS Version:
TLS version is shown in 3 different places. If extension 0x002b exists (supported_versions), then the version is the highest value in the extension. Remember to ignore GREASE values. If the extension doesn’t exist, then the TLS version is the value of the Protocol Version. Handshake version (located at the top of the packet) should be ignored.

0x0304 = TLS 1.3 = “13”
0x0303 = TLS 1.2 = “12”
0x0302 = TLS 1.1 = “11”
0x0301 = TLS 1.0 = “10”
0x0300 = SSL 3.0 = “s3”
0x0200 = SSL 2.0 = “s2”
0x0100 = SSL 1.0 = “s1”

Unknown = “00”

### SNI:
If the SNI extension (0x0000) exists, then the destination of the connection is a domain, or “d” in the fingerprint. If the SNI does not exist, then the destination is an IP address, or “i”.

### Number of Ciphers:
2 character number of cipher suites, so if there’s 6 cipher suites in the hello packet, then the value should be “06”. If there’s > 99, which there should never be, then output “99”. Remember, ignore GREASE values. They don’t count.

### Number of Extensions:
Same as counting ciphers. Ignore GREASE. Include SNI and ALPN.

### ALPN Extension Value:
The first and last characters of the ALPN (Application-Layer Protocol Negotiation) first value.
List of possible ALPN Values (scroll down): https://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml



In the above example, the first ALPN value is h2 so the first and last characters to use in the fingerprint are “h2”. IF the first ALPN listed was http/1.1 then the first and last characters to use in the fingerprint would be “h1”.

In Wireshark this field is located under tls.handshake.extensions_alpn_str

If there are no ALPN values or no ALPN extension then we print “00” as the value in the fingerprint.

### Cipher hash:
A 12 character truncated sha256 hash of the list of ciphers sorted in hex order, first 12 characters. The list is created using the 4 character hex values of the ciphers, lower case, comma delimited, ignoring GREASE.
Example:
```
1301,1302,1303,c02b,c02f,c02c,c030,cca9,cca8,c013,c014,009c,009d,002f,0035
```
Is sorted to:
```
002f,0035,009c,009d,1301,1302,1303,c013,c014,c02b,c02c,c02f,c030,cca8,cca9 = 8daaf6152771
```

### Extension hash:
A 12 character truncated sha256 hash of the list of extensions, sorted by hex value, followed by the list of signature algorithms, in the order that they appear (not sorted).

The extension list is created using the 4 character hex values of the extensions, lower case, comma delimited, sorted (not in the order they appear). Ignore the SNI extension (0000) and the ALPN extension (0010) as we’ve already captured them in the _a_ section of the fingerprint. These values are omitted so that the same application would have the same _b_ section of the fingerprint regardless of if it were going to a domain, IP, or changing ALPNs.

For example:
```
001b,0000,0033,0010,4469,0017,002d,000d,0005,0023,0012,002b,ff01,000b,000a,0015
```
Is sorted to:
```
0005,000a,000b,000d,0012,0015,0017,001b,0023,002b,002d,0033,4469,ff01
```
(notice 0000 and 0010 is removed)

The signature algorithm hex values are then added to the end of the list in the order that they appear (not sorted) with an underscore delimiting the two lists.
For example the signature algorithms:
```
0403,0804,0401,0503,0805,0501,0806,0601
```
Are added to the end of the previous string to create:
```
0005,000a,000b,000d,0012,0015,0017,001b,0023,002b,002d,0033,4469,ff01_0403,0804,0401,0503,0805,0501,0806,0601
```
Hashed to:
```
e5627efa2ab19723084c1033a96c694a45826ab5a460d2d3fd5ffcfe97161c95
```
Truncated to first 12 characters:
```
e5627efa2ab1
```

If there are no signature algorithms in the hello packet, then the string ends without an underscore and is hashed.
For example:
```
0005,000a,000b,000d,0012,0015,0017,001b,0023,002b,002d,0033,4469,ff01 = 6d807ffa2a79
```

### Example

JA4 fingerprint:
t (TLS over TCP)
13 (TLS version 1.3)
d (SNI exists so it’s going to a domain)
15 (15 cipher suites ignoring grease)
16 (16 extensions ignoring grease)
h2 (first and last characters of the first ALPN extension value)
_
8daaf6152771 (truncated sha256 hash of the list of ciphers sorted)
_
e5627efa2ab1 (truncated sha256 hash of the list of extensions sorted, SNI and ALPN removed, followed by the list of signature algorithms)
```
JA4 = t13d1516h2_8daaf6152771_e5627efa2ab1
```
### Raw Output
The program should allow for raw outputs either sorted or original.
-r (raw fingerprint) -o (original)

The raw fingerprint for JA4 would look like this:
```
JA4_r = t13d1516h2_002f,0035,009c,009d,1301,1302,1303,c013,c014,c02b,c02c,c02f,c030,cca8,cca9_0005,000a,000b,000d,0012,0015,0017,001b,0023,002b,002d,0033,4469,ff01_0403,0804,0401,0503,0805,0501,0806,0601
```

The "o" option includes the original values in the original order, less GREASE values. This means SNI (0000) and ALPN (0010) are included.

The raw fingerprint with the original ordering (-o) would look like this:
```
JA4_ro = t13d1516h2_1301,1302,1303,c02b,c02f,c02c,c030,cca9,cca8,c013,c014,009c,009d,002f,0035_001b,0000,0033,0010,4469,0017,002d,000d,0005,0023,0012,002b,ff01,000b,000a,0015_0403,0804,0401,0503,0805,0501,0806,0601
```
When ‘-o’ flag is specified, ‘ja4’ field must be renamed to ‘ja4_o’:
```
JA4_o = t13d1516h2_acb858a92679_18f69afefd3d
```

Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
target = "https://raw.githubusercontent.com/FoxIO-LLC/ja4/v0.18.2/technical_details/JA4.md#alpn-extension-value"

# ### ALPN Extension Value:
#
# The first and last characters of the ALPN (Application-Layer Protocol Negotiation) first value.
# List of possible ALPN Values (scroll down): https://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml
#
#
#
# In the above example, the first ALPN value is h2 so the first and last characters to use in the fingerprint are “h2”. IF the first ALPN listed was http/1.1 then the first and last characters to use in the fingerprint would be “h1”.
#
# In Wireshark this field is located under tls.handshake.extensions_alpn_str
#
# If there are no ALPN values or no ALPN extension then we print “00” as the value in the fingerprint.
#
[[spec]]
level = "MUST"
quote = '''
The first and last characters of the ALPN (Application-Layer Protocol Negotiation) first value.
'''

[[spec]]
level = "MUST"
quote = '''
If there are no ALPN values or no ALPN extension then we print “00” as the value in the fingerprint.
'''
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
target = "https://raw.githubusercontent.com/FoxIO-LLC/ja4/v0.18.2/technical_details/JA4.md#cipher-hash"

# ### Cipher hash:
#
# A 12 character truncated sha256 hash of the list of ciphers sorted in hex order, first 12 characters. The list is created using the 4 character hex values of the ciphers, lower case, comma delimited, ignoring GREASE.
# Example:
# ```
# 1301,1302,1303,c02b,c02f,c02c,c030,cca9,cca8,c013,c014,009c,009d,002f,0035
# ```
# Is sorted to:
# ```
# 002f,0035,009c,009d,1301,1302,1303,c013,c014,c02b,c02c,c02f,c030,cca8,cca9 = 8daaf6152771
# ```
#

[[spec]]
level = "MUST"
quote = '''
A 12 character truncated sha256 hash of the list of ciphers sorted in hex order, first 12 characters.
'''

[[spec]]
level = "MUST"
quote = '''
The list is created using the 4 character hex values of the ciphers, lower case, comma delimited, ignoring GREASE.
'''

Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
target = "https://raw.githubusercontent.com/FoxIO-LLC/ja4/v0.18.2/technical_details/JA4.md#details"

# ## Details:
#
# The program needs to ignore GREASE values anywhere it sees them: (https://datatracker.ietf.org/doc/html/draft-davidben-tls-grease-01#page-5)
#

[[spec]]
level = "MUST"
quote = '''
The program needs to ignore GREASE values anywhere it sees them
'''
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
target = "https://raw.githubusercontent.com/FoxIO-LLC/ja4/v0.18.2/technical_details/JA4.md#example"

# ### Example
#
# JA4 fingerprint:
# t (TLS over TCP)
# 13 (TLS version 1.3)
# d (SNI exists so it’s going to a domain)
# 15 (15 cipher suites ignoring grease)
# 16 (16 extensions ignoring grease)
# h2 (first and last characters of the first ALPN extension value)
# _
# 8daaf6152771 (truncated sha256 hash of the list of ciphers sorted)
# _
# e5627efa2ab1 (truncated sha256 hash of the list of extensions sorted, SNI and ALPN removed, followed by the list of signature algorithms)
# ```
# JA4 = t13d1516h2_8daaf6152771_e5627efa2ab1
# ```

Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
target = "https://raw.githubusercontent.com/FoxIO-LLC/ja4/v0.18.2/technical_details/JA4.md#extension-hash"

# ### Extension hash:
#
# A 12 character truncated sha256 hash of the list of extensions, sorted by hex value, followed by the list of signature algorithms, in the order that they appear (not sorted).
#
# The extension list is created using the 4 character hex values of the extensions, lower case, comma delimited, sorted (not in the order they appear). Ignore the SNI extension (0000) and the ALPN extension (0010) as we’ve already captured them in the _a_ section of the fingerprint. These values are omitted so that the same application would have the same _b_ section of the fingerprint regardless of if it were going to a domain, IP, or changing ALPNs.
#
# For example:
# ```
# 001b,0000,0033,0010,4469,0017,002d,000d,0005,0023,0012,002b,ff01,000b,000a,0015
# ```
# Is sorted to:
# ```
# 0005,000a,000b,000d,0012,0015,0017,001b,0023,002b,002d,0033,4469,ff01
# ```
# (notice 0000 and 0010 is removed)
#
# The signature algorithm hex values are then added to the end of the list in the order that they appear (not sorted) with an underscore delimiting the two lists.
# For example the signature algorithms:
# ```
# 0403,0804,0401,0503,0805,0501,0806,0601
# ```
# Are added to the end of the previous string to create:
# ```
# 0005,000a,000b,000d,0012,0015,0017,001b,0023,002b,002d,0033,4469,ff01_0403,0804,0401,0503,0805,0501,0806,0601
# ```
# Hashed to:
# ```
# e5627efa2ab19723084c1033a96c694a45826ab5a460d2d3fd5ffcfe97161c95
# ```
# Truncated to first 12 characters:
# ```
# e5627efa2ab1
# ```
#
# If there are no signature algorithms in the hello packet, then the string ends without an underscore and is hashed.
# For example:
# ```
# 0005,000a,000b,000d,0012,0015,0017,001b,0023,002b,002d,0033,4469,ff01 = 6d807ffa2a79
# ```
#

[[spec]]
level = "MUST"
quote = '''
A 12 character truncated sha256 hash of the list of extensions, sorted by hex value, followed by the list of signature algorithms, in the order that they appear (not sorted).
'''

[[spec]]
level = "MUST"
quote = '''
The extension list is created using the 4 character hex values of the extensions, lower case, comma delimited, sorted (not in the order they appear).
'''

[[spec]]
level = "MUST"
quote = '''
Ignore the SNI extension (0000) and the ALPN extension (0010) as we’ve already captured them in the _a_ section of the fingerprint.
'''

[[spec]]
level = "MUST"
quote = '''
The signature algorithm hex values are then added to the end of the list in the order that they appear (not sorted) with an underscore delimiting the two lists.
'''

[[spec]]
level = "MUST"
quote = '''
If there are no signature algorithms in the hello packet, then the string ends without an underscore and is hashed.
'''
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
target = "https://raw.githubusercontent.com/FoxIO-LLC/ja4/v0.18.2/technical_details/JA4.md#ja4-algorithm"

# ### JA4 Algorithm:
#
# (QUIC=”q” or TCP=”t”)
# (2 character TLS version)
# (SNI=”d” or no SNI=”i”)
# (2 character count of ciphers)
# (2 character count of extensions)
# (first and last characters of first ALPN extension value)
# _
# (sha256 hash of the list of cipher hex codes sorted in hex order, truncated to 12 characters)
# _
# (sha256 hash of (the list of extension hex codes sorted in hex order)_(the list of signature algorithms), truncated to 12 characters)
#
# The end result is a fingerprint that looks like:
# t13d1516h2_8daaf6152771_b186095e22b6
#

Loading

0 comments on commit 3baca01

Please sign in to comment.