How To Serialize Data Effectively with Protocol Buffers – Grape Up
5 min readIn a world of microservices, we often have to pass information between applications. We serialize data into a format that can be retrieved by both sides. One of the serialization solutions is Protocol Buffers (Protobuf) – Google’s language-neutral mechanism. Messages can be interpreted by a receiver using the same or different language than a producer. Many languages are supported, such as Java, Go, Python, and C++.
A data structure is defined using neutral language through .proto
files. The file is then compiled into code to be used in applications. It is designed for performance. Protocol Buffers encode data into binary format, which reduces message size and improves transmission speed.
Defining Message Format
This .proto
the file represents geolocation information for a given vehicle.
1 syntax = "proto3";
2
3 package com.grapeup.geolocation;
4
5 import "google/type/latlng.proto";
6 import "google/protobuf/timestamp.proto";
7
8 message Geolocation
9 string vin = 1;
10 google.protobuf.Timestamp occurredOn = 2;
11 int32 speed = 3;
12 google.type.LatLng coordinates = 4;
13
1 syntax = "proto3";
Syntax refers to Protobuf version, it can be proto2
or proto3
.
1package com.grapeup.geolocation;
Package declaration prevents naming conflicts between different projects.
1 message Geolocation
2 string vin = 1;
3 google.protobuf.Timestamp occurredOn = 2;
4 int32 speed = 3;
5 google.type.LatLng coordinates = 4;
6
Message definition contains a name and a set of typed fields. Simple data types are available, such as bool, int32, double, string, etc. You can also define your own types or import them.
1google.protobuf.Timestamp occurredOn = 2;
The = 1
, = 2
markers identify the unique tag. Tags are a numeric representation for the field and are used to identify the field in the message binary format. They have to be unique in a message and should not be changed once the message is in use. If a field is removed from a definition that is already used, it must be reserved
.
Field types
Aside from scalar types, there are many other type options when defining messages. Here are few, but you can find all of them in the Language Guide Language Guide (proto3) | Protocol Buffers | Google Developers .
Well Known Types
1 import "google/type/latlng.proto";
2 import "google/protobuf/timestamp.proto";
3
4 google.protobuf.Timestamp occurredOn = 2;
5 google.type.LatLng coordinates = 4;
There are predefined types available to use Overview | Protocol Buffers | Google Developers . They are known as Well Know Types and have to be imported into .proto
.
LatLng
represents a latitude and longitude pair.
Timestamp
is a specific point in time with nanosecond precision.
Custom types
1 message SingularSearchResponse
2 Geolocation geolocation = 1;
3
You can use your custom-defined type as a field in another message definition.
Lists
1 message SearchResponse
2 repeated Geolocation geolocations = 1;
3
You can define lists by using repeated keyword.
OneOf
It can happen that in a message there will always be only one field set. In this case, TelemetryUpdate
will contain either geolocation, mileage, or fuel level information.
This can be achieved by using oneof
. Setting value to one of the fields will clear all other fields defined in oneof
.
1 message TelemetryUpdate
2 string vin = 1;
3 oneof update
4 Geolocation geolocation = 2;
5 Mileage mileage =3;
6 FuelLevel fuelLevel = 4;
7
8
9
10 message Geolocation
11 ...
12
13
14 message Mileage
15 ...
16
17
18 message FuelLevel
19 ...
20
Keep in mind backward-compatibility when removing fields. If you receive a message with oneof
that has been removed from .proto
definition, it will not set any of the values. This behavior is the same as not setting any value in the first place.
You can perform different actions based on which value is set using the getUpdateCase()
method.
1 public Optional<Object> getTelemetry(TelemetryUpdate telemetryUpdate)
2 Optional<Object> telemetry = Optional.empty();
3 switch (telemetryUpdate.getUpdateCase())
4 case MILEAGE -> telemetry = Optional.of(telemetryUpdate.getMileage());
5 case FUELLEVEL -> telemetry = Optional.of(telemetryUpdate.getFuelLevel());
6 case GEOLOCATION -> telemetry = Optional.of(telemetryUpdate.getGeolocation());
7 case UPDATE_NOT_SET -> telemetry = Optional.empty();
8
9 return telemetry;
10
Default values
In proto3
format fields will always have a value. Thanks to this proto3
can have a smaller size because fields with default values are omitted from payload. However this causes one issue – for scalar message fields, there is no way of telling if a field was explicitly set to the default value or not set at all.
In our example, speed is an optional field – some modules in a car might send speed data, and some might not. If we do not set speed, then the geolocation object will have speed with the default value set to 0. This is not the same as not having speed set on messages.
In order to deal with default values you can use official wrapper types protobuf/wrappers.proto at main · protocolbuffers/protobuf . They allow distinguishing between absence and default. Instead of having a simple type, we use Int32Value, which is a wrapper for the int32 scalar type.
1 import "google/protobuf/wrappers.proto";
2
3 message Geolocation
4 google.protobuf.Int32Value speed = 3;
5
If we do not provide speed, it will be set to nil
.
Configure with Gradle
Once you’ve defined your messages, you can use protoc
, a protocol buffer compiler, to generate classes in a chosen language. The generated class can then be used to build and retrieve messages.
In order to compile into Java code, we need to add dependency and plugin in build.gradle
1 plugins
2 id 'com.google.protobuf' version '0.8.18'
3
4
5 dependencies
6 implementation 'com.google.protobuf:protobuf-java-util:3.17.2'
7
and setup the compiler. For Mac users an osx specific version has to be used.
1 protobuf
2 protoc
3 if (osdetector.os == "osx")
4 artifact = "com.google.protobuf:protoc:$protobuf_version:osx-x86_64"
5 else
6 artifact = "com.google.protobuf:protoc:$protobuf_version"
7
8
9
Code will be generated using generateProto
task.
The code will be located in build/generated/source/proto/main/java
in a package as specified in .proto
file.
We also need to tell gradle where the generated code is located
1 sourceSets
2 main
3 java
4 srcDirs 'build/generated/source/proto/main/grpc'
5 srcDirs 'build/generated/source/proto/main/java'
6
7
8
The generated class contains all the necessary methods for building the message as well as retrieving field values.
1 Geolocation geolocation = Geolocation.newBuilder()
2 .setCoordinates(LatLng.newBuilder().setLatitude(1.2).setLongitude(1.2).build())
3 .setVin("1G2NF12FX2C129610")
4 .setOccurredOn(Timestamp.newBuilder().setSeconds(12349023).build())
5 .build();
6
7 LatLng coordinates = geolocation.getCoordinates();
8 String vin = geolocation.getVin();
Protocol Buffers – Summary
As shown protocol buffers are easily configured. The mechanism is language agnostic, and it’s easy to share the same .proto
definition across different microservices.
Protobuf is easily paired with gRPC, where methods can be defined in .proto
files and generated with gradle.
There is official documentation available Protocol Buffers | Google Developers and guides Language Guide (proto3) | Protocol Buffers | Google Developers .