How To Serialize Data Effectively with Protocol Buffers – Grape Up5 min read
In a world of microservices, we often have to pass information between applications. We serialize data into a format that can be retrieved by both sides. One of the serialization solutions is Protocol Buffers (Protobuf) – Google’s language-neutral mechanism. Messages can be interpreted by a receiver using the same or different language than a producer. Many languages are supported, such as Java, Go, Python, and C++.
A data structure is defined using neutral language through
.proto files. The file is then compiled into code to be used in applications. It is designed for performance. Protocol Buffers encode data into binary format, which reduces message size and improves transmission speed.
Defining Message Format
.proto the file represents geolocation information for a given vehicle.
1 syntax = "proto3"; 2 3 package com.grapeup.geolocation; 4 5 import "google/type/latlng.proto"; 6 import "google/protobuf/timestamp.proto"; 7 8 message Geolocation 9 string vin = 1; 10 google.protobuf.Timestamp occurredOn = 2; 11 int32 speed = 3; 12 google.type.LatLng coordinates = 4; 13 1 syntax = "proto3";
Syntax refers to Protobuf version, it can be
Package declaration prevents naming conflicts between different projects.
1 message Geolocation 2 string vin = 1; 3 google.protobuf.Timestamp occurredOn = 2; 4 int32 speed = 3; 5 google.type.LatLng coordinates = 4; 6
Message definition contains a name and a set of typed fields. Simple data types are available, such as bool, int32, double, string, etc. You can also define your own types or import them.
1google.protobuf.Timestamp occurredOn = 2;
= 2 markers identify the unique tag. Tags are a numeric representation for the field and are used to identify the field in the message binary format. They have to be unique in a message and should not be changed once the message is in use. If a field is removed from a definition that is already used, it must be
Aside from scalar types, there are many other type options when defining messages. Here are few, but you can find all of them in the Language Guide Language Guide (proto3) | Protocol Buffers | Google Developers .
Well Known Types
1 import "google/type/latlng.proto"; 2 import "google/protobuf/timestamp.proto"; 3 4 google.protobuf.Timestamp occurredOn = 2; 5 google.type.LatLng coordinates = 4;
There are predefined types available to use Overview | Protocol Buffers | Google Developers . They are known as Well Know Types and have to be imported into
LatLng represents a latitude and longitude pair.
Timestamp is a specific point in time with nanosecond precision.
1 message SingularSearchResponse 2 Geolocation geolocation = 1; 3
You can use your custom-defined type as a field in another message definition.
1 message SearchResponse 2 repeated Geolocation geolocations = 1; 3
You can define lists by using repeated keyword.
It can happen that in a message there will always be only one field set. In this case,
TelemetryUpdate will contain either geolocation, mileage, or fuel level information.
This can be achieved by using
oneof. Setting value to one of the fields will clear all other fields defined in
1 message TelemetryUpdate 2 string vin = 1; 3 oneof update 4 Geolocation geolocation = 2; 5 Mileage mileage =3; 6 FuelLevel fuelLevel = 4; 7 8 9 10 message Geolocation 11 ... 12 13 14 message Mileage 15 ... 16 17 18 message FuelLevel 19 ... 20
Keep in mind backward-compatibility when removing fields. If you receive a message with
oneof that has been removed from
.proto definition, it will not set any of the values. This behavior is the same as not setting any value in the first place.
You can perform different actions based on which value is set using the
1 public Optional<Object> getTelemetry(TelemetryUpdate telemetryUpdate) 2 Optional<Object> telemetry = Optional.empty(); 3 switch (telemetryUpdate.getUpdateCase()) 4 case MILEAGE -> telemetry = Optional.of(telemetryUpdate.getMileage()); 5 case FUELLEVEL -> telemetry = Optional.of(telemetryUpdate.getFuelLevel()); 6 case GEOLOCATION -> telemetry = Optional.of(telemetryUpdate.getGeolocation()); 7 case UPDATE_NOT_SET -> telemetry = Optional.empty(); 8 9 return telemetry; 10
proto3 format fields will always have a value. Thanks to this
proto3 can have a smaller size because fields with default values are omitted from payload. However this causes one issue – for scalar message fields, there is no way of telling if a field was explicitly set to the default value or not set at all.
In our example, speed is an optional field – some modules in a car might send speed data, and some might not. If we do not set speed, then the geolocation object will have speed with the default value set to 0. This is not the same as not having speed set on messages.
In order to deal with default values you can use official wrapper types protobuf/wrappers.proto at main · protocolbuffers/protobuf . They allow distinguishing between absence and default. Instead of having a simple type, we use Int32Value, which is a wrapper for the int32 scalar type.
1 import "google/protobuf/wrappers.proto"; 2 3 message Geolocation 4 google.protobuf.Int32Value speed = 3; 5
If we do not provide speed, it will be set to
Configure with Gradle
Once you’ve defined your messages, you can use
protoc, a protocol buffer compiler, to generate classes in a chosen language. The generated class can then be used to build and retrieve messages.
In order to compile into Java code, we need to add dependency and plugin in
1 plugins 2 id 'com.google.protobuf' version '0.8.18' 3 4 5 dependencies 6 implementation 'com.google.protobuf:protobuf-java-util:3.17.2' 7
and setup the compiler. For Mac users an osx specific version has to be used.
1 protobuf 2 protoc 3 if (osdetector.os == "osx") 4 artifact = "com.google.protobuf:protoc:$protobuf_version:osx-x86_64" 5 else 6 artifact = "com.google.protobuf:protoc:$protobuf_version" 7 8 9
Code will be generated using
The code will be located in
build/generated/source/proto/main/java in a package as specified in
We also need to tell gradle where the generated code is located
1 sourceSets 2 main 3 java 4 srcDirs 'build/generated/source/proto/main/grpc' 5 srcDirs 'build/generated/source/proto/main/java' 6 7 8
The generated class contains all the necessary methods for building the message as well as retrieving field values.
1 Geolocation geolocation = Geolocation.newBuilder() 2 .setCoordinates(LatLng.newBuilder().setLatitude(1.2).setLongitude(1.2).build()) 3 .setVin("1G2NF12FX2C129610") 4 .setOccurredOn(Timestamp.newBuilder().setSeconds(12349023).build()) 5 .build(); 6 7 LatLng coordinates = geolocation.getCoordinates(); 8 String vin = geolocation.getVin();
Protocol Buffers – Summary
As shown protocol buffers are easily configured. The mechanism is language agnostic, and it’s easy to share the same
.proto definition across different microservices.
Protobuf is easily paired with gRPC, where methods can be defined in
.proto files and generated with gradle.
There is official documentation available Protocol Buffers | Google Developers and guides Language Guide (proto3) | Protocol Buffers | Google Developers .