Jump to content

KDE PIM/KItinerary/SBB Barcode: Difference between revisions

From KDE Community Wiki
Vkrause (talk | contribs)
Created page with "= General Observations = * QR code, content has variable length at around 330 bytes. * Contains readable strings as well as binary parts. * Strings seem to be UTF-8 encoded...."
 
Vkrause (talk | contribs)
No edit summary
(22 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= General Observations =
= Ticket Structure =
 
* The QR code content is encoded using protobuf.
* There is a trailing signature, unclear whether that is part of the protobuf structure or just concatenated.
 
Protobuf structure based on running existing samples through the generic protobuf decoder (names are therefore all guesses or placeholders):
 
<pre>
message Time {
    uint64 msecsSinceEpoch = 1;
}
 
message Block1_2_1 {
    varint _1 = 1; // "4" in all samples from SBB, "2" in "Geneva Transport Card", "1" for "Veloplatzreservierung"
    string tariffName = 2; // "Point-to-point Ticket", "Supersaver Ticket"
}
 
message Block1_2_13 {
    varint _2 = 2; // numeric zone id?
    varint _3 = 3;
}
 
message TripData {
    Block1_2_1 _1 = 1;
    optional string departureStation = 2; // missing for zoned tickets
    optional string arrivalStation = 3;
    varint classOfTransport = 4;
    varint _5 = 5;
    string via = 6; // or covered zone in case of zoned tickets
    varint _7 = 7:
    Time departureTime = 8; // or valid from for unbound tickets
    Time arrivalTime = 9; // or valid until for unbound tickets
    uint articleNumber = 12;
    optional Block1_2_13 _13 = 13; // present for zoned tickets
    varint _14 = 14;
    string tariff = 15; // parenthesis enclosed abbreviations, string is also printed on the ticket
    varint _17 = 17;
    varint _19 = 19;
    varint _20 = 20;
}
 
message TravelerData {
    string sbbCustomerId = 1; // only the first 8 digits (of 10) though
    string customerId = 2; // UUID, field tkid of https://www.swisspass.ch/private/api/benutzer/v1/benutzer
    string firstName = 3;
    string lastName = 4;
    Time birthDay = 5;
    string discountProgram = 7; // "HALBTAX"
}
 
message Block1_5 {
    Time issueTime = 1;
    varint _2 = 2;
    varint _3 = 3;
    varint _4 = 4;
}
 
message PaymentData {
    string paymentMethod = 1; // "PCD", "MC", "VIS"
    string currency = 2; // "CHF"
    string price = 3;
}
 
message TrainData {
    string trainName = 11;
    string coach = 12;
    string seat = 13; // or "space" for bike reservation
    varint _14 = 14;
    varint _15 = 15;
}
 
message Block1 {
    uint64 ticketId = 1;
    TripData tripData = 2;
    TravelerData traveler = 3;
    Block1_5 _5 = 5;
    PaymentData payment = 6;
    string _7 = 7;
    optional repeated TrainData trainData = 8; // not present in unbound tickets
    varint _9 = 9;
}
 
message Block2 {
    varint _1 = 1;
}
 
message Block4 { // maybe signing key selector??
    string _1 = 1; // 4 digit number (3342 in all samples)
    string _2 = 2; // 5 digit number (00001 in all samples)
}
 
message Ticket { // the top-level element
    Block1 _1 = 1;
    Block2 _2 = 2;
    Block4 _4 = 4;
    Block5 _5 = 5; // signature, might be mis-decoded/mis-detected
}
</pre>
 
= General Observations (obsolete) =


* QR code, content has variable length at around 330 bytes.
* QR code, content has variable length at around 330 bytes.
* Contains readable strings as well as binary parts.
* Contains readable strings as well as binary parts.
* Seems to be a sequence of variable length records, rather than a fixed binary layout, could be some form of TLV encoding (there are some similarities to ASN.1 BER/DER for example).
* Records consist of a 1 byte type field, a length field and N content bytes.
* The type field does not seem to describe semantics, as values repeat for different values (rather than a data type of some form?). This would suggest that the record order defines their semantics. The presence of null records suggests that too, as well as the same record sequence in all samples.
* "European Union Agency For Railways - Technical Document - Digital Security Elements For Rail Passenger Ticketing - TAP TSI TD B.12 - §11 FCB - Flexible Content Barcode" describes something that seems similar, using ASN.1 UPER encoding.
= Record Structure (obsolete) =
* 1 byte type, although no idea what that indicates. Largely doesn't match the BER/DER type byte.
* Length:
** for values <= 127 byte this is just one byte
** for value larger than 127 this is encoded in two bytes. This is quite different from BER/DER multi-byte length encoding however. It looks like a little endian layout , with the most significant bit of the first length byte being removed and the second byte being shifted by one bit to fill that space.
* N bytes of content, with varying data types.
= Data Types =
== Strings ==
* Strings seem to be UTF-8 encoded.
* Strings seem to be UTF-8 encoded.
* Seems to be a sequence of variable length records, rather than a fixed binary layout, could be some form of TLV encoding (there are some similarities to ASN.1 BER/DER for example).
* The length is the amount of bytes needed to represent the UTF-8 string, not the amount of characters.
* Records consist of a 1 byte type field, a 1 byte length field and N content bytes. For strings, the length is the amount of bytes needed to represent the UTF-8 string, not the amount of characters.
 
* The type field however does not seem to describe semantics, as values repeat for different values (rather than a data type of some form?). This could suggest that the record order defines their semantics? The presence of null records suggests that too, as well as the same record sequence in all samples.
== Date/Time ==
* It is possible that there is a multi-byte length encoding comparable to BER/DER.
 
* The first 10 bytes don't seem to match the pattern yet, but that might be due to unknown multi-byte length encoding (could be a record containing the entire ticket).
This is largely speculation at this point!


= Record Sequence =
There's 5 7 byte sequences included that could be date/time values. These could be date of purchase/issue, begin of validity, end of validity, traveler birth date. All of those are printed on the ticket.
* the first byte is 0x08 in all samples
* the first half of the second byte seems to be 0x8 for date fields and 0xC for date/time fields. This would seem consistent with optional ASN.1 UPER field encoding.
* there's a surprising amount of entropy in those values, esp. when looking at differences at close-by dates.
* differential view shows no obvious correlation beyond multiple tickets, but close-by values do "look" similar nevertheless.
* the suspected birthday field is the same in all samples for the same traveler
* this does not seem to use a UNIX timestamp
* this does not seem to use BCD encoding
* this does not seem to use sub-byte per-component encoding


starting at offset 10:
= Record Sequence (obsolete) =


{| class="wikitable"
{| class="wikitable"
! Nesting Depth !! Type Id !! Content Type !! Meaning !! Notes
! Nesting Depth !! Type Id !! Content Type !! Meaning !! Notes
|-
|-
| 1 || 0x12 || || ||
| 0 || 0x0A || date/time, nested record || ? || speculative, given the 7 byte field there matches suspected date/time values below
|-
| 1 || 0x12 || nested record || ||
|-
|-
| 2 || 0x0A || 2bytes followed by nested records || ? || 0x08 04
| 2 || 0x0A || 2bytes followed by nested records || ? || 0x08 04
Line 23: Line 149:
| 3 || 0x12 || string || ticket type? || "Point-to-point Ticket", "Supersaver Ticket"
| 3 || 0x12 || string || ticket type? || "Point-to-point Ticket", "Supersaver Ticket"
|-
|-
| 3 || 0x12 || string || Departure station ||
| 2 || 0x12 || string || Departure station ||
|-
|-
| 3 || 0x1A || string || Arrival station ||
| 2 || 0x1A || string || Arrival station ||
|-
|-
| 3 || 0x20 || 2 byte || ? || 0x2801 in all samples
| 2 || 0x20 || 2 byte || ? || 0x2801 in all samples
|-
|-
| 3 || 0x32 || string || Via ||
| 2 || 0x32 || string || Via ||
|-
|-
| 3 || 0x38 || 1 byte || ? || 0x42 in all samples
| 2 || 0x38 || 1 byte || ? || 0x42 in all samples
|-
|-
| 3 || 0x42 || 7 byte || ? || date/time?
| 2 || 0x42 || 7 byte || ? || date/time?
|-
|-
| 3 || 0x4A || 7 byte || ? || date/time?
| 2 || 0x4A || 7 byte || ? || date/time?
|-
|-
| 3 || 0x60 || variable, null terminated || ? || 2-3 bytes in all samples
| 2 || 0x60 || variable, null terminated || ? || 2-3 bytes in all samples, breaks TLV structure
|-
|-
| 3 || 0x7A || string || ticket type or tariff  || parenthesis enclosed abbreviations, string is also printed on the ticket
| 2 || 0x7A || string || ticket type or tariff  || parenthesis enclosed abbreviations, string is also printed on the ticket
|-
|-
| 3 || 0x88 || 1 byte || ? || null
| 2 || 0x88 || 1 byte || ? || null, optional
|-
|-
| 3 || 0x98 || 1 byte || ? || 0x02
| 2 || 0x98 || 1 byte || ? || 0x02
|-
|-
| 3 || 0xA0 || 1 byte || ? || null
| 2 || 0xA0 || 1 byte || ? || null
|-
|-
| 2 || 0x1A || nested record || ? ||
| 1 || 0x1A || nested record || traveler information || could also be loyalty program info?
|-
|-
| 3 || 0x0A || string || ? || 8 digit number
| 2 || 0x0A || string || SBB customer id || only the first 8 digits (of 10) though
|-
|-
| 3 || 0x12 || string || ? || 36 char uuid
| 2 || 0x12 || string || Customer identifier || 128 bit as a 36 byte hex string with for separator dashes (uuid-like formatting) Field tkid of https://www.swisspass.ch/private/api/benutzer/v1/benutzer
|-
|-
| 3 || 0x1A || string || family name ||
| 2 || 0x1A || string || family name ||
|-
|-
| 3 || 0x22 || string || given name ||
| 2 || 0x22 || string || given name ||
|-
|-
| 3 || 0x2A || 7 byte || ? || date/time?
| 2 || 0x2A || 7 byte || ? || date/time? - possibly traveler birth date
|-
|-
| 3 || 0x3A || string || tariff information? || "HALBTAX"
| 2 || 0x3A || string || tariff information? || "HALBTAX"
|-  
|-  
| 2 || 0x2A || nested record || ? ||
| 1 || 0x2A || nested record || ? ||
|-
| 2 || 0x0A || 7 byte || ? || date/time?
|-
| 2 || 0x10 || 4 byte || ? || followed by 0x200B outside of TLV structures?
|-
|-
| 3 || 0x0A || 7 byte || ? || date/time?
| 1 || 0x32 || nested record || price information? ||  
|-
|-
| 3 || 0x10 || 4 byte || ? || followed by 0x200B outside of TLV structures?
| 2 || 0x0A || string || payment method? || "PCD", "MC", "VIS"
|-
|-
| 2 || 0x32 || nested record || price information? ||  
| 2 || 0x12 || string || currency || "CHF"
|-
|-
| 3 || 0x0A || string || ? || "PCD"
| 2 || 0x1A || string || ticket price ||
|-
|-
| 3 || 0x12 || string || currency || "CHF"
| 1 || 0x3A || null || ? ||
|-
|-
| 3 || 0x1A || string || ticket price ||
| 1 || 0x42 || nested record || train information || not present in unbound tickets
|-
|-
| 2 || 0x3A || null || ? ||
| 2 || 0x5A || string || train number ||  
|-
|-
| 2 || 0x42 || nested record || train information || not present in unbound tickets
| 2 || 0x70 || null || ||
|-
|-
| 3 || 0x5A || string || train number ||  
| 2 || 0x78 || null || ||
|-
|-
| 3 || 0x70 || null || ||
| 1 || 0x48 || 1 byte, no length byte!?! || ? ||  
|-
|-
| 3 || 0x78 || null || ||
| 0 || 0x12 || 2 byte || ? || fixed 0x0801
|-
|-
| 2 || 0x48 || 7byte, no length byte!?! || ? || breaks the overall TLV structure?? almost fixed value in all samples
| 0 || 0x22 || nested record || ? ||
|-
|-
| 2 || 0x0A || string || ? || 4 digit number (3342 in all samples)
| 1 || 0x0A || string || ? || 4 digit number (3342 in all samples)
|-
|-
| 2 || 0x12 || string || ? || 5 digit number (00001 in all samples)
| 1 || 0x12 || string || ? || 5 digit number (00001 in all samples)
|-
|-
| 2 || 0x2A || nested record || ||
| 1 || 0x2A || nested record || ||
|-
|-
| 3 || 0x30 || nested record || ||
| 2 || 0x30 || nested record || || DSA signature in ASN.1/BER encoding, as found in UIC 918.3 containers as well
|-
|-
| 4 || 0x02 || 20-21 byte || ? || signature? this is somewhat similar to Thalys
| 3 || 0x02 || 20-21 byte || r || DSA signature r value
|-
|-
| 4 || 0x02 || 20-21 byte || ? ||
| 3 || 0x02 || 20-21 byte || s || DSA signature s value
|}
|}

Revision as of 14:58, 1 October 2024

Ticket Structure

  • The QR code content is encoded using protobuf.
  • There is a trailing signature, unclear whether that is part of the protobuf structure or just concatenated.

Protobuf structure based on running existing samples through the generic protobuf decoder (names are therefore all guesses or placeholders):

message Time {
    uint64 msecsSinceEpoch = 1;
}

message Block1_2_1 {
    varint _1 = 1; // "4" in all samples from SBB, "2" in "Geneva Transport Card", "1" for "Veloplatzreservierung"
    string tariffName = 2; // "Point-to-point Ticket", "Supersaver Ticket" 
}

message Block1_2_13 {
    varint _2 = 2; // numeric zone id?
    varint _3 = 3;
}

message TripData {
    Block1_2_1 _1 = 1;
    optional string departureStation = 2; // missing for zoned tickets
    optional string arrivalStation = 3;
    varint classOfTransport = 4;
    varint _5 = 5;
    string via = 6; // or covered zone in case of zoned tickets
    varint _7 = 7:
    Time departureTime = 8; // or valid from for unbound tickets
    Time arrivalTime = 9; // or valid until for unbound tickets
    uint articleNumber = 12;
    optional Block1_2_13 _13 = 13; // present for zoned tickets
    varint _14 = 14;
    string tariff = 15; // parenthesis enclosed abbreviations, string is also printed on the ticket
    varint _17 = 17;
    varint _19 = 19;
    varint _20 = 20;
}

message TravelerData {
    string sbbCustomerId = 1; // only the first 8 digits (of 10) though 
    string customerId = 2; // UUID, field tkid of https://www.swisspass.ch/private/api/benutzer/v1/benutzer
    string firstName = 3;
    string lastName = 4;
    Time birthDay = 5;
    string discountProgram = 7; // "HALBTAX"
}

message Block1_5 {
    Time issueTime = 1;
    varint _2 = 2;
    varint _3 = 3;
    varint _4 = 4;
}

message PaymentData {
    string paymentMethod = 1; // "PCD", "MC", "VIS" 
    string currency = 2; // "CHF"
    string price = 3;
}

message TrainData {
    string trainName = 11;
    string coach = 12;
    string seat = 13; // or "space" for bike reservation
    varint _14 = 14;
    varint _15 = 15;
}

message Block1 {
    uint64 ticketId = 1;
    TripData tripData = 2;
    TravelerData traveler = 3;
    Block1_5 _5 = 5;
    PaymentData payment = 6;
    string _7 = 7;
    optional repeated TrainData trainData = 8; // not present in unbound tickets
    varint _9 = 9;
}

message Block2 {
    varint _1 = 1;
}

message Block4 { // maybe signing key selector??
    string _1 = 1; // 4 digit number (3342 in all samples) 
    string _2 = 2; // 5 digit number (00001 in all samples) 
}

message Ticket { // the top-level element
    Block1 _1 = 1;
    Block2 _2 = 2;
    Block4 _4 = 4;
    Block5 _5 = 5; // signature, might be mis-decoded/mis-detected
}

General Observations (obsolete)

  • QR code, content has variable length at around 330 bytes.
  • Contains readable strings as well as binary parts.
  • Seems to be a sequence of variable length records, rather than a fixed binary layout, could be some form of TLV encoding (there are some similarities to ASN.1 BER/DER for example).
  • Records consist of a 1 byte type field, a length field and N content bytes.
  • The type field does not seem to describe semantics, as values repeat for different values (rather than a data type of some form?). This would suggest that the record order defines their semantics. The presence of null records suggests that too, as well as the same record sequence in all samples.
  • "European Union Agency For Railways - Technical Document - Digital Security Elements For Rail Passenger Ticketing - TAP TSI TD B.12 - §11 FCB - Flexible Content Barcode" describes something that seems similar, using ASN.1 UPER encoding.

Record Structure (obsolete)

  • 1 byte type, although no idea what that indicates. Largely doesn't match the BER/DER type byte.
  • Length:
    • for values <= 127 byte this is just one byte
    • for value larger than 127 this is encoded in two bytes. This is quite different from BER/DER multi-byte length encoding however. It looks like a little endian layout , with the most significant bit of the first length byte being removed and the second byte being shifted by one bit to fill that space.
  • N bytes of content, with varying data types.

Data Types

Strings

  • Strings seem to be UTF-8 encoded.
  • The length is the amount of bytes needed to represent the UTF-8 string, not the amount of characters.

Date/Time

This is largely speculation at this point!

There's 5 7 byte sequences included that could be date/time values. These could be date of purchase/issue, begin of validity, end of validity, traveler birth date. All of those are printed on the ticket.

  • the first byte is 0x08 in all samples
  • the first half of the second byte seems to be 0x8 for date fields and 0xC for date/time fields. This would seem consistent with optional ASN.1 UPER field encoding.
  • there's a surprising amount of entropy in those values, esp. when looking at differences at close-by dates.
  • differential view shows no obvious correlation beyond multiple tickets, but close-by values do "look" similar nevertheless.
  • the suspected birthday field is the same in all samples for the same traveler
  • this does not seem to use a UNIX timestamp
  • this does not seem to use BCD encoding
  • this does not seem to use sub-byte per-component encoding

Record Sequence (obsolete)

Nesting Depth Type Id Content Type Meaning Notes
0 0x0A date/time, nested record ? speculative, given the 7 byte field there matches suspected date/time values below
1 0x12 nested record
2 0x0A 2bytes followed by nested records ? 0x08 04
3 0x12 string ticket type? "Point-to-point Ticket", "Supersaver Ticket"
2 0x12 string Departure station
2 0x1A string Arrival station
2 0x20 2 byte ? 0x2801 in all samples
2 0x32 string Via
2 0x38 1 byte ? 0x42 in all samples
2 0x42 7 byte ? date/time?
2 0x4A 7 byte ? date/time?
2 0x60 variable, null terminated ? 2-3 bytes in all samples, breaks TLV structure
2 0x7A string ticket type or tariff parenthesis enclosed abbreviations, string is also printed on the ticket
2 0x88 1 byte ? null, optional
2 0x98 1 byte ? 0x02
2 0xA0 1 byte ? null
1 0x1A nested record traveler information could also be loyalty program info?
2 0x0A string SBB customer id only the first 8 digits (of 10) though
2 0x12 string Customer identifier 128 bit as a 36 byte hex string with for separator dashes (uuid-like formatting) Field tkid of https://www.swisspass.ch/private/api/benutzer/v1/benutzer
2 0x1A string family name
2 0x22 string given name
2 0x2A 7 byte ? date/time? - possibly traveler birth date
2 0x3A string tariff information? "HALBTAX"
1 0x2A nested record ?
2 0x0A 7 byte ? date/time?
2 0x10 4 byte ? followed by 0x200B outside of TLV structures?
1 0x32 nested record price information?
2 0x0A string payment method? "PCD", "MC", "VIS"
2 0x12 string currency "CHF"
2 0x1A string ticket price
1 0x3A null ?
1 0x42 nested record train information not present in unbound tickets
2 0x5A string train number
2 0x70 null
2 0x78 null
1 0x48 1 byte, no length byte!?! ?
0 0x12 2 byte ? fixed 0x0801
0 0x22 nested record ?
1 0x0A string ? 4 digit number (3342 in all samples)
1 0x12 string ? 5 digit number (00001 in all samples)
1 0x2A nested record
2 0x30 nested record DSA signature in ASN.1/BER encoding, as found in UIC 918.3 containers as well
3 0x02 20-21 byte r DSA signature r value
3 0x02 20-21 byte s DSA signature s value