Over-engineering my TV watching - Part 1: bypassing geo-block with a custom reverse proxy

This is the first part of a series of posts where I will detail how I built a complex system to consume Italian TV abroad entirely from a single web app and a Chromecast.

This system consists of:

A custom reverse proxy written in Go that tunnels requests to Italy to bypass geoblock and supports dynamic translation of playlist files, rewriting absolute URLs and enabling any kind of request/response transformation.
Chromecaster, a daemon in Rust that interacts with my Chromecast and exposes HTTP APIs to control volume, playback and cast media using multiple players.
What’s On, a daemon in Go that exposes a high-performance HTTP API to retrieve program listings and timetables using a reverse-engineered API as a data source.
A frontend written in Vue.js that combines all of the services above in a beautiful interface.

Picture of the frontend I have built for TV watching. (Tap to view full screen. Channel and program names purposely redacted.)

Background

After relocating from Italy to the United Kingdom, I found myself much more inclined to watch Italian TV during the evening as a way to both have something to passively listen to and, occasionally, good content to watch.

In Italy, people typically watch their programs via terrestrial television, which requires you to have an antenna (usually on a roof or balcony) pointed towards a well-known transmitter. This solution is not practical abroad, but fortunately a massive surge of people wanting to watch TV via phones, tablets and computers led providers to implement Internet live-streaming via either DASH or HLS.

This is ideal, but has one obvious hurdle: geo-block. For digital rights reasons, Italian TV broadcasters don’t really want people outside of Italy to be able to watch their programs.

Beginning the journey: bypassing geo-block

Before my system existed, I had the following cumbersome workflow:

Enable a VPN tunnel on my device (tunneling through a network I owned in Italy)
Find the HLS playlist URL of the channel I wanted to watch from a list
Cast the playlist URL using a neat app called Web Video Caster, proxying it via my phone

As you can imagine, this was pretty inefficient and also required to tunnel the entirety of my mobile traffic which is not ideal!

Designing a reverse proxy

Once I grew tired of the overly manual process to watch TV, I decided that it was necessary to design something a bit more robust and less manual than this. There were a few high level requirements the solution had to adhere to:

Work for any stream. I don’t want the solution to be constrained to specific URLs or playlist types. It has to just work.
Not be invasive. This solution should not require to permanently devote my Chromecast to TV watching, nor tunnel the entirety of any device traffic’s via a VPN.
Transparently work with the Chromecast. The Chromecast is peculiar because it’s the device itself that pulls down the playlist and segments, not the sender. It’s also impossible to configure a proxy or VPN without doing shenanigans.

The idea is to expose a reverse proxy that:

Accepts any URL as a normal HTTP request (e.g. https://proxy_addr/<original_url>);
Transparently proxies the requested URL through a network with an Italian IP address;
Returns back the response with the appropriate content type headers.

This would perfectly match the requirements set above: since this is not a forward proxy, which requires the device to be aware of it and forward requests to it, all I needed to do was to make sure the playlist URLs had the proxy URL as a prefix. This would be completely invisible to the Chromecast and, in theory, it should just work!

Choosing the optimal way to tunnel traffic

Even with a well-defined idea, there is still a big unknown – how do you actually tunnel the traffic to Italy for a specific request?

A fancy solution could entail the usage of per-process Linux network namespaces and perhaps an always active (or on-demand, e.g. socket-activated) tunnel to a host in Italy.

However, I’ve opted for a significantly simpler (and effective!) solution. In fact, there is a commonly used service that is secure by design and also offers the ability to run an unattended tunnel between two hosts exposed as a (forward) SOCKS proxy. In case you haven’t guessed yet, I’m talking about SSH!

Simply passing the -D flag to ssh exposes a SOCKS proxy on the desired IP/port combo:

# starts a SOCKS proxy on 127.0.0.1:1337 tunneling the traffic to my-host.example.com
ssh -D 127.0.0.1:1337 my-host.example.com

Since I already had a server ready to use in Italy with SSH access, this just felt like a natural solution. Combining it with the -N flag, which disables remote command execution (and allows you to disarm the target account’s shell), and autossh to keep the tunnel up in case of power/network failure, I now had a reliable, persistent SOCKS proxy that is able to forward requests to Italy.

This only solves one part of the puzzle, though – a reverse proxy that accepts a playlist URL and silently forwards it through this proxy is still required.

Implementing the reverse proxy

I chose to implement the proxy in Go, as its standard library is surprisingly powerful for network services – even more so for this project, since it turns out that there is already a reverse proxy package that is trivial to implement for this specific use case.

Obviously, all requests forwarded by this reverse proxy need to go through the SOCKS5 forward proxy exposed by ssh. This is quite straightforward:

transport := &http.Transport{
    Proxy: http.ProxyURL(upstreamProxyUrl),
}
reverseProxy := &httputil.ReverseProxy{
    Transport: transport,
    // ...
}

The actual business logic that extracts the target URL from the request URL (i.e. http://proxy_addr/<request_url> -> <request_url>) is pretty trivial to implement, so I had a working implementation in very little time. But now… does it actually work?

Well… the proxy worked beautifully with some streams, whilst playback failed miserably with others. It was time for some debugging.

Handling the edge cases

An HLS playlist that successfully worked with the basic implementation looked like:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac",LANGUAGE="ita",URI="example/audio-chunks.m3u8"
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2793078,CODECS="avc1.77.41,mp4a.40.2",RESOLUTION=1280x720,AUDIO="aac"
example/video-chunks.m3u8

Whilst an HLS playlist that failed to play looked like:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac",LANGUAGE="ita",URI="example/audio-chunks.m3u8"
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2793078,CODECS="avc1.77.41,mp4a.40.2",RESOLUTION=1280x720,AUDIO="aac"
https://video-server.example.com/.../video-chunks.m3u8

Can you spot the difference? It’s because the latter uses absolute URLs which, with the current basic implementation, do not get rewritten and thus escape the proxy.

That’s not the only thing that the proxy needs to do, though – it turns out that there is a surprising amount of stuff that a media playlist proxy needs to be aware of:

Redirects. Many providers have a series of intermediate redirects that happen before you get the definitive playlist. The proxy has to transparently rewrite the URLs in the redirects to make sure they go through the proxy as well.
CORS. Browsers are very strict about cross-origin requests even when doing HLS/DASH media playback, so the proxy needs to set a sufficiently lax Access-Control-Allow-Origin header.
Absolute URLs in playlists. As mentioned above, HLS playlists can have both relative and absolute URLs, and some providers prefer the latter. These should be transparently rewritten by the proxy.
Bypass of referrer checks. Some providers like to block requests that come from invalid referrers, so the proxy should provide the ability to fake it out.
Deduplication. Weirdly, some providers send out duplicated playlist URLs which confuses some players. Instead of going through the effort of fixing the players, it’s easier to just let the proxy filter these out.

Some of these transformations are straightforward to implement, but some are much more complex. The discriminating factor is whether the transformation requires the response body or not: the Go reverse proxy library does not provide an ergonomic way to transform response bodies.

Efficiently transforming response bodies

The naïve way of transforming response bodies would be to read the whole original response in memory, apply transformations (line-per-line) and then return it back to the origin. This isn’t very efficient, though: whilst it’s true that playlist files are typically not very big, they are fetched very frequently, and I intend to run my reverse proxy on a low-power Raspberry Pi device. My end goal is to be able to play at least two concurrent streams.

I opted for a more advanced, streaming approach: I created a custom io.ReadCloser that wraps the original response body reader, does line buffering using bufio.Reader and applies transformations before returning the line back to the caller. The transformer definition looks like this:

type transformerReadCloser struct {
    // invoked for all read lines, returns either the line unchanged or the transformed line
    TransformFunc func([]byte) []byte
    // buffered reader attached to the original stream
    r *bufio.Reader
    // pointer to the 'Close' function of the original stream
    // (bufio.Reader does not implement Close)
    close func() error
    // temporary buf used to store extra bytes that do not fit into `Read()`'s
    // pre-allocated buf
    excess []byte
    // whether the last Read() returned an error (incl. EOF), but we haven't got it back
    // to the caller back due to the excess
    err error
}

It’s interesting to focus on the excess field in this structure. The transformer needs to conform to the bare io.ReadCloser interface, which looks like this:

type ReadCloser interface {
    Read(p []byte) (n int, err error)
    Close() error
}

The p buffer passed to the Read() call is allocated by the caller, and the caller expects that the number of bytes read (n) will be less than or equal the size of the buffer (len(p)). This is a problem for our transformer: since reads are line buffered (with additional transformations applied on top of the read lines), it’s possible that the size of the transformed lines will exceed the size of the output buffer.

To handle this gracefully, the buffer containing the transformed line is sliced to len(p), and the excess (if any) is saved to the excess buffer on the instance. When the next call to Read() is made, the excess buffer is drained (slicing it further if the output buffer is not large enough to completely hold it) and any pending error that resulted from the previous reading attempt is returned.

The end result is a neat API that allows to swap out the original response body with a transformed one with very little overhead:

transformer := NewTransformerReadCloser(res.Body) // transformer instantiated on the original body
transformer.TransformFunc = func(b []byte) []byte {
    // ... apply line transformations ...
}
res.ContentLength = -1 // remove the original content-length, as we don't know the new one
res.Header.Del("content-length")
res.Body = transformer // swap out the original io.ReadCloser with the transformed one

Once this is implemented, it becomes trivial to parse the lines of an HLS playlist and make the appropriate transformations. Interestingly, some streams are encrypted with an asymmetric key whose URL is specified in the playlist itself: this also needs to be handled as part of the transformations.

As one final optimization, not all streams need response body transformation (which incurs a performance penalty). As such, the behavior of the proxy can be customized by a series of query string parameters:

x-transform-body=y enables the body transformer which adjusts URLs as needed. When this is off, the Body reader is kept intact, avoiding the overhead of the transformer;
x-deduplicate-streams=y enables the deduplication of individual playlist URLs (which implies the usage of the body transformer);
x-referer=<referer> rewrites the Referer header of all requests.

Huzzah! The proxy now works beautifully for every stream in my list. I can finally say goodbye to the days where I had to spin up a VPN on my device to watch TV, and just let the proxy do all the hard work.

Automating Chromecast playback

After implementing the reverse proxy, it felt natural to automate the Chromecast playback and control as well, rather than relying on an external app.

Originally, I thought this would be the easiest part of the project, seeing how well the “Web Video Caster” app mentioned above worked. In Part 2, I’ll describe how my assumption was wrong and explain how the Chromecast works behind the scenes, how my preliminary attempts to cast streams directly failed, as well as how I reverse engineered existing players to “steal” their app IDs. Stay tuned!