In this post, I’ll talk about Chromecaster, the daemon I created to control my Chromecast and a key component of my over-engineered system designed to consume Italian TV abroad.

This is the second part of a series of articles where I detail all components of this system, which consists of:

  • A custom reverse proxy written in Go that tunnels requests to Italy to bypass geoblock and supports dynamic translation of playlist files, rewriting absolute URLs and enabling any kind of request/response transformation. This is detailed in Part 1.
  • Chromecaster, a daemon in Rust that interacts with my Chromecast and exposes HTTP APIs to control volume, playback and cast media using multiple players. This is detailed in this article.
  • What’s On, a daemon in Go that exposes a high-performance HTTP API to retrieve program listings and timetables using a reverse-engineered API as a data source.
  • A frontend written in Vue.js that combines all of the services above in a beautiful interface.
Picture of the frontend I have built for TV watching. (Tap to view full screen. Channel and program names purposely redacted.)
Picture of the frontend I have built for TV watching. (Tap to view full screen. Channel and program names purposely redacted.)

Table of contents

Automating Chromecast playback

To recap, I now had a reverse proxy capable of tunneling arbitrary HTTPS traffic through Italy to bypass geoblock. However, the process to stream TV channels was still painfully manual, requiring an external app to actually play the streams on the Chromecast. It was time to automate the Chromecast playback and control as well.

Sneak Peek: A Demo

Before delving in this rabbit hole, here’s a demo of the finished project, Chromecaster. Pretty satisfying!

Buckle up, we’re diving in!

Programmatic Chromecast playback: a tale of failure

Originally, I was convinced this would have been one of the easiest parts of the whole project. This belief was amplified by how well the app I was using for manual casting worked – if the app worked fine why couldn’t it also work by just picking a random library to interface with my Chromecast and telling it to play a stream URL?

I took the first Chromecast library I found – pychromecast – and immediately tried to cast some stream URLs to it. To my surprise, this was a failure on almost all fronts – most of my stream URLs that worked fine with the aforementioned app just flat out refused to play with pychromecast.

After many tests my sanity was quickly deteriorating, especially as there was absolutely no feedback or error returned by the Chromecast when attempting to play the streams that didn’t work: playing an affected stream resulted in a brief loading screen followed by an empty player interface. I could only deduce what was going on based on the proxy’s access logs, which gave an indication on whether the player had trouble fetching the playlists/segments or parsing the actual files.1

It was time to understand more about the Chromecast ecosystem and dig deep to figure out how it worked behind the scenes.

Understanding how Chromecast playback works

The Chromecast uses the aptly named Google Cast protocol, whose usage is split between senders (clients/mobile apps) and receivers (Chromecast-enabled devices). Senders discover the receivers using mDNS, connect to the advertised IP/port pair and exchange bi-directional control messages with receivers using Protocol Buffers.

The secret sauce in the Cast protocol are receiver apps, which are nothing more than HTML5 web pages rendered in a concealed2 Chrome browser on the receiver (get it? Chrome-cast!). Receiver apps are allowed to communicate with the receiver and the connected sender using the Cast Receiver Framework and can use all the HTML5 media APIs to play content. The Cast protocol does not allow to send or stream raw binary media to be played – URLs are the only supported medium.

Usually, there is a one-to-one relationship between senders and receiver apps – i.e., a sender generally corresponds to a specific receiver app. To make things clearer, here’s what happens when you press the “Cast” button on the “Web Video Caster” app mentioned above and cast some media.

  1. The app makes sure the requested media is available over the network and without CORS restrictions. If this isn’t the case, the app starts a local HTTP server hosting or proxying the media and uses that as a target for the Chromecast.
  2. The mobile app discovers available receiver devices using mDNS.
  3. Once the user chooses a receiver, the app connects to the device, performs host validation and the handshake.3
  4. The mobile app tells the receiver to start the requested receiver app if it isn’t already started.
  5. The mobile app sends the play message with the media, as well as any other message supported by the receiver app (e.g. play, pause, or custom messages).
High-level design of the Google Cast protocol. Courtesy of Oakbits.com.
High-level design of the Google Cast protocol. Courtesy of Oakbits.com.

What actually defines an app, though? How do senders indicate what app to start?

Even if receiver apps are hosted HTML5 web pages with a specific URL, Google astutely guards Chromecasts by exposing just an opaque, 8-digit hexadecimal identifier called “app ID”. Behind the scenes, an app ID corresponds to an actual webpage URL plus some app metadata, but this mapping is never surfaced externally. Critically, only Google can allocate you an app ID should you want to develop on the Chromecast: this requires you to have a Chromecast developer account, which costs $5 at the time of writing.

But how the heck was I able to play some streams in my initial experiments with pychromecast? I definitely didn’t have to specify an app ID anywhere, so there is something else missing.

It turns out that Google provides a “public” app ID intended for basic media playback: CC1AD845, the default media receiver. pychromecast (and basically any other example sender you’ll find on the Internet) simply has this app ID hardcoded as the default for what app to start when you need media to play.

The reason why the majority of my attempts failed is because the default media receiver doesn’t support many advanced features of HLS playlists or just has outright bugs. For example:

  • In the presence of redirects, relative URL handling of the default player is broken.
    Suppose you attempted to play https://a/play which redirects to https://b/playlist.m3u8: if the playlist contains a relative segment URL, e.g. path/segment.ts, the default player will try to fetch the segment from https://a/path/segment.ts rather than https://b/path/segment.ts.
  • The default player does not support encrypted stream segments. Note that this is not DRM – just an extra security measure which embeds an AES-128 key in the main manifest with which segments are encrypted.

Phew! That was a lot to learn. I now had a clear understanding of the playback process, though it was evident that the default media receiver was not going to be enough for this project. I was at an interesting crossroad here: would it be better to attempt to create my own player, or perhaps would it be easier to try and find a working, pre-existing app ID to use?

Scouting for app IDs to use

I still wasn’t sure whether creating my own player would produce a working or satisfactory result, so I was determined to find another route. What if I could find a sender that is able to play my streams and “borrow” its app ID? Since receiver apps are not restricted to specific senders (the Cast protocol does not authenticate senders), obtaining an app ID should effectively allow me to leverage the playback logic of that app to play my own streams.

But first, how to extract an app ID from a sender? There’s multiple ways to approach this problem:

  • Senders are mainly mobile apps, so it would be possible to find a working app, decompile it and extract the app ID from it.
  • Senders can also be (Chrome-only) web apps. Extracting an app ID would just involve some scouting in minified JavaScript or strategically placed breakpoints when the code is obfuscated.
  • The simplest approach is to just ask the Chromecast! Status responses include the currently running app ID, so it would be sufficient to cast media using the sender and then ask the Chromecast for the running app ID.

I decided to look for online web players and attempt to extract app IDs from the ones that supported Chromecast playback. Many were red herrings: even if the web player had advanced playback capabilities and worked correctly with my streams, the authors did not bother to port the actual player code to the Chromecast and just used the default app ID. Others, however, had demo pages where a custom app ID was baked in!

In the end, I collected a grand total of 3 app IDs corresponding to both open source and proprietary players. By plugging them into pychromecast, I was finally able to play all of my streams! 🎉

Unfortunately, there wasn’t a single player that was able to open all streams:

  • An open source player exhibited the same broken relative URL handling as mentioned above (which likely means that the default media app uses this player behind the scenes). I was able to find an app ID corresponding to a more up-to-date version of this player which finally resolved the issue, though this player still lacked support for encrypted HLS segments.
  • A proprietary player had serious sync issues – it would occasionally start playing very far back into the HLS stream, or at the correct time but with de-synchronized audio and video.
  • A proprietary player had their own logo superimposed on the screen. In addition, streams from a certain broadcaster caused short, intermittent freezes that disrupted playback.

This wasn’t necessarily a problem, though – I could just design my playing logic to use different players for different streams.

The birth of Chromecaster

My end goal was to create a service that dynamically discovers the IP address of my Chromecast and exposes an HTTP API to control it directly from my frontend.

Whilst pychromecast proved to be an invaluable resource to ensure I could get programmatic playback to work, I’m not a huge Python fan and didn’t want to re-implement the Google Cast protocol from scratch in another language. I wasn’t satisfied with the Go libraries available for the Google Cast protocol and mDNS, so I decided to do the same search for Rust instead.

I quickly found rust_cast, a lightweight crate that exposes a neat and extensible API to interact with Chromecasts. With the mdns crate to dynamically discover the device, Chromecaster was officially born!

mDNS discovery

Chromecast devices advertise their presence using the mDNS protocol, which is basically a slightly modified, zero-configuration DNS that works on the local network using UDP multicast packets. Executing a query for the hostname _googlecast._tcp.local will discover all Chromecast devices. The associated TXT records include a variety of metadata about the Chromecast, including the device name under the fn key.

The mdns crate exposes an API to do lookups that leverages async Rust. The usage looks like this:

async fn discover_chromecast(&self) -> Result<(String, u16), Box<dyn Error>> {
  // Utility to extract the Chromecast name from the mDNS response.
  fn extract_name_from_mdns_response(res: &mdns::Response) -> Option<&str> {
    res
      .txt_records()
      .flat_map(|s| s.split_once("="))
      .find(|x| x.0 == "fn")
      .map(|x| x.1)
  }

  // Launch an mDNS query with a 5 second query interval.
  let listener = mdns::discover::all(
      "_googlecast._tcp.local",
      Duration::from_secs(5)
  )?.listen();
  futures::pin_mut!(listener);

  // Loop indefinitely until we find the Chromecast.
  loop {
    match listener.next().await {
      // If we get an mDNS response with the device name we expect...
      Some(Ok(res)) if extract_name_from_mdns_response(&res) == Some(self.device_name.as_str()) =>
          // ... and we can extract an IP/port tuple
          return match (res.ip_addr(), res.port()) {
              (Some(addr), Some(port)) => Ok((addr.to_string(), port)), // we're done!
              _ => Err("Chromecast mDNS response does not contain IP or port!".into())
          },
      // ... other cases
    }
  }
}

It’s important to note that the code above would run indefinitely if the requested device didn’t exist. Fortunately, the async_std::future::timeout utility provides a convenient way to wrap a future and stop its execution if it goes over a specified timeout. Finally, to execute this from a synchronous context, the futures::executor::block_on executor provides a simple way to block the current thread until the async function finishes its execution:

futures::executor::block_on(
  async_std::future::timeout(
    DISCOVER_TIMEOUT, // 10 secs
    self.discover_chromecast()
  )
)

Internally, Chromecaster caches the resolved IP address of the Chromecast and uses it for all future connection attempts. If a connection attempt using a cached address fails, Chromecaster erases the cached entry and re-attempts discovery.

Designing the internal API

At its core, Chromecaster supports multiple players – each corresponding to a specific app ID extracted using the methods described above. To guarantee enough flexibility, players must be able to customize the payload sent via the Cast Protocol (as not all players use the standard payload), as well as be able to execute other actions after playing succeeds.

But first, not all players support all stream types. Chromecaster distinguishes between stream types with a simple enumeration:

pub enum PlayerContent {
    Hls { playlist_url: String },
    Dash { playlist_url: String },
    DashWidevine { playlist_url: String, license_url: String }
}

Individual players are shaped using a trait:

pub trait Player<'a> {
  fn app_id(&self) -> &'static str;
  fn device(&self) -> &'a CastDevice<'a>;

  // Converts a `PlayerContent` enumeration to a rust_cast Media object. If the player is
  // not capable to play this media, it simply returns None.
  fn media_for_content(&self, content: &PlayerContent) -> Option<Media>;

  fn try_play(&self, content: &PlayerContent, receiver_status: &mut ReceiverStatus)
    -> Result<MediaStatus, Error> { /* cut */ }

  // Hook invoked when the app starts.
  fn on_start(&self, app: &Application, is_fresh_start: bool)
    -> Result<(), Error>;

  // Hook invoked when playback succeeds.
  fn on_play(&self, content: &PlayerContent, app: &Application, status: &MediaStatus)
    -> Result<(), Error>;
}

Since players hold a reference to the currently live Chromecast handle, they have to be instantiated on demand. A sequence of factories defines the available players:

let player_factories: Vec<(&str, PlayerFactory)> = vec![
  ("foo", |device| Box::new(FooPlayer::new(device))),
  ("bar", |device| Box::new(BarPlayer::new(device))),
  ("baz", |device| Box::new(BazPlayer::new(device))),
];

The definition of PlayerFactory is a fun one:

pub type PlayerFactory = for<'a> fn(&'a CastDevice) -> Box<dyn Player<'a> + 'a>;

The playback logic

When playback is requested, Chromecaster encapsulates the requested media in the PlayerContent enumeration and attempts to play it with each available player factory.

Detecting whether playback was successful or not is an interesting problem, because not all players are made equal. Some players do not return any playback status until they know whether they can play the given media, whilst others return a success message and later update the playback status in case playback fails. To make sure the round-robin approach works with the latter category as well, Chromecaster needs to wait until the status is known.

Unfortunately, rust_cast does not expose a high-level API to listen for media playback messages and hides the lower-level functionality behind a private class. A simple patch resolves the issue and can be easily used thanks to the flexibility of Cargo:

[patch.crates-io]
rust_cast = { git = "https://github.com/Robertof/rust-cast" }

The playback logic can now detect whether the player has not reported a status yet and wait for an explicit status message to come later:

// Load the media on the Chromecast. The result may or may not be trustworthy.
let result = self.device().media.load(
  app.transport_id.to_string(),
  app.session_id.to_string(),
  &media
);

if let Ok(status) = result.as_ref() {
  // Detect whether the player has returned an 'Idle' state as it processes the media.
  if status.entries.iter().any(|x| matches!(x.player_state, PlayerState::Idle)) {
    let media = &self.device().media;

    // Wait for another media message to come through.
    let message = self.device().message_manager.receive_find_map(|m| {
      if !media.can_handle(m) {
          return Ok(None);
      }

      media.parse(&m).map(|v| Some(v))
    })?;

    // Handle a failed load.
    if let MediaResponse::LoadFailed(_) = message {
      // ...
    }
  }
  // ...
}

The HTTP API

I designed the REST API of Chromecaster to be pretty straightforward, as it needs to do just a few operations:

+ POST /play?format=hls&url=<url>
+ POST /play?format=dash&url=<url>
+ POST /play?format=dash&url=<url>&widevine_license_url=<license_url>

Plays the requested livestream using the specified URL and format.
+ POST /stop

Stops whatever application is running on the Chromecast.
+  GET /volume
+ POST /volume/<level>
+ POST /mute
+ POST /unmute

Volume control.

Implementing the HTTP REST API was a breeze thanks to rouille, a web micro-framework with a light dependency footprint and an eloquent, macro-based API. In the final implementation, all operations are executed on a separate thread as interacting with the Chromecast can be slow.

Plot twist: Developing my own player

This system as described worked almost flawlessly in production for a while. What bothered me is the “almost” – occasionally, the players would hiccup, fail to level switch before buffering, show weird behaviors4 and the round-robin approach made channel-hopping a bit slow. I got pissed enough of the occasional required tinkering that I decided to build my own player.

Obviously, I wasn’t going to actually write an HLS playlist parser and all the associated player logic. For testing, I used hls.js extensively during the development of the reverse proxy and was very impressed with its speed and reliability. Based on this good experience, I decided to build a custom Chromecast receiver that leverages this library to play the streams.

After paying the $5 license fee to get my Google account enabled for Chromecast development, I immediately created my own player on the management page, pointed it to a web page I hosted and associated it to my device. This was pretty sweet!

My enthusiasm quickly faded as I discovered that the Cast Receiver Framework – the library required to create a Chromecast receiver – does not really support the use case of third-party players. Fortunately, I do not give up easily and was able to circumvent the issues around it successfully. I detailed my journey in “Using a custom player with the Cast Receiver Framework”.

Once I successfully managed to get the player working, it was trivial to integrate it within Chromecaster and start using it by default for all media. The new player significantly reduced the time elapsed between a play request and actually seeing video on screen, as well as being more reliable than the other ones. Voilà!

Conclusion

This was a multi-month journey, but a successful one. Together with the frontend that I built along with Chromecaster, my TV watching experience became significantly more enjoyable and effective. In the next and last post, I’ll detail the Vue.js-based frontend and What’s On, which provides real-time information about program listings.

Thank you for reading!

Disclaimer: this post and all the described software was completed before my employment with Google commenced. All the work described in this and other posts is the result of publicly available information and my own research/reverse engineering efforts.

Footnotes

  1. One of the reasons why this was so frustrating is that you can’t really perform any debugging on a Chromecast unless you purchase a developer license and explicitly enable your device for debugging. 

  2. The proper term is “chromeless”, but can you blame me for not wanting to write “Chromeless Chrome browser”? 

  3. This is reported for completeness, but in reality Google really wants you to use their implementation that abstract away all of the gory discovery, connection and protocol details. 

  4. Some of this is also due to a problem unrelated to the players – the wireless networking stack of my Raspberry Pi gets progressively more unreliable as the uptime goes up. I haven’t yet figured out whether the fault is on the AP or Pi side.