[FIX] Broken SSH & Proxy, SO AM I!
I finally tasted just how fragile the system gets when every single proxy drops dead, and the firewall sometimes even entirely drop 443. Somewhere out here, people are trapped in this suffocating local area network, drowning in fake comfort and plastic smiles. But we should have coded for open networks, born for raw freedom. That’s the pulse we live by. And no, we sure as hell do not forget nor do we forgive.
TL;DR
Cloudflare Zero Trust/WARP dropped a whole nftables output chain that only allowed 53/80/443, which starved SSH and most nodes. After reboot it re-armed itself. To fix: purging WARP + flushing nft/iptables + resetting UFW to allow outgoing SSH.
Background
This post is for troubleshooting a broken SSH session, where outbound just suffocates, and every port but 443 is choked out.
A few days ago I updated my Pop!_OS. Right after it finished, something I’d almost forgotten about, Cloudflare Zero Trust
suddenly decided to boot itself up. I barely ever used it, so I just ignored it, and went for a full reboot to apply the update.
Then came the disaster:
- Every single node on Clash started timing out. v2ray nodes dropped dead, except the one hiding on 443, but even that one turned unstable after reboot. Plus if the firewall just drop down the port, that node would immediately die too.
- When I tried SSH into my VPS, commands would just hang like forever, only to end in
time out
orCould not resolve hostname... Name or service not known
.
Tbh, I kinda started to afraid of doing sudo apt upgrade
again.
Related bugs that showed up along the way:
Tor meltdown after failing with bridges
Tor unexpectedly exited. This might be due to a bug in Tor itself, another program on your system, or faulty hardware. Until you restart Tor, Tor Browser will not be able to reach any websites. If the problem persists, please send a copy of your Tor Log to the support team. Restarting Tor will not close your browser tabs.
I have tried:
- restart
- update
- add fresh bridges
- use system Tor with bridges, then make Tor Browser skip its bundled Tor
None of them worked. Tor just kept choking until I fixed the outbound issue. Then Tor’d just worked well.
Node on Port 443 being unstable
Likely tied to the dynamic IPv4 in the security group being stale. The cure was simple: just update the SG with the new IP.
What still works:
- SSH to my EC2 via browser Instance Connect
- SSH works fine inside LAN (both in and out).
- DNS and routing are functional
Check if DNS and routing are functional:
1 2 3 4 5 6
# Replace HOST with your VPS DNS HOST=ubuntu@<IP> getent hosts "$HOST" dig +short "$HOST" dig @1.1.1.1 +short "$HOST" # bypass local resolver
If
dig @1.1.1.1
returns an IP but the others don’t, your local DNS is borked (Clash/TUN or systemd-resolved).
Troubleshoot
1. Check MTU / Packet Size Limits
Run this to see what’s happening with large packets:
1
2
ping -M do -s 1472 bandit.labs.overthewire.org
ping -M do -s 1400 bandit.labs.overthewire.org
- If the 1472-byte ping fails but the 1400-byte one works, we’ve got an MTU mismatch.
- If both fail or both succeed, we move to the next probe.
In my condition, both hang, meaning the packets never even leave the NIC. The block is before size matters.
2. Trace the traffic mid-hang
1
2
# find your active interface, eg `enp3s0` or `wlan0`
ip route | awk '/default/ {print $5; exit}'
BIG thanks for bandit. If you do not have available VPS yet, try with bandit’s ip 56.228.72.241
. And get bandit’s authentication password on their site later!
Open one terminal:
1
2
3
# assume the active interface is `enp3s0`
# sudo apt update && sudo apt install -y tcpdump
sudo tcpdump -ni enp3s0 host 56.228.72.241 and tcp
In another, start your SSH:
1
ssh -vvv bandit0@bandit.labs.overthewire.org -p 2220
Watch if packets are flowing both ways:
- Seeing both sides but hang: keepalive or conntrack issue.
- Only your side: mid-path drop.
Example output:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
> sudo tcpdump -ni enp3s0 host 56.228.72.241 and tcp
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on enp3s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
# hangs
> ssh -vvv bandit0@bandit.labs.overthewire.org -p 2220
OpenSSH_8.9p1 Ubuntu-3ubuntu0.13, OpenSSL 3.0.2 15 Mar 2022
debug1: Reading configuration data /home/phruit/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/home/phruit/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/home/phruit/.ssh/known_hosts2'
debug2: resolving "bandit.labs.overthewire.org" port 2220
debug3: resolve_host: lookup bandit.labs.overthewire.org:2220
debug3: ssh_connect_direct: entering
debug1: Connecting to bandit.labs.overthewire.org [56.228.72.241] port 2220.
debug3: set_sock_tos: set socket 3 IP_TOS 0x10
# hangs for very very long here
debug1: connect to address 56.228.72.241 port 2220: Connection timed out
ssh: connect to host bandit.labs.overthewire.org port 2220: Connection timed out
Confirmed the kernel is not emitting SYNs at all.
3. Enable SSH keepalives
Add to ~/.ssh/config
(Create one if you don’t have this file):
Host *
ServerAliveInterval 15
ServerAliveCountMax 4
This only helps once SSH is already established.
4. Check and clear connection tracking
Sometimes conntrack tables get full and stall packets:
1
2
3
sudo apt install conntrack -y
sudo conntrack -L | head
sudo conntrack -F
That flushes all tracked connections. Then test SSH again.
Example:
1
2
3
4
5
6
7
8
9
> sudo conntrack -L | head
udp 17 19 src=192.168.1.100 dst=1.1.1.1 sport=58651 dport=53 src=1.1.1.1 dst=192.168.1.100 sport=53 dport=58651 mark=0 use=1
udp 17 14 src=192.168.1.100 dst=1.1.1.1 sport=36733 dport=53 src=1.1.1.1 dst=192.168.1.100 sport=53 dport=36733 mark=0 use=1
tcp 6 89 TIME_WAIT src=127.0.0.1 dst=127.0.0.1 sport=49068 dport=5600 src=127.0.0.1 dst=127.0.0.1 sport=5600 dport=49068 [ASSURED] mark=0 use=1
#...
> sudo conntrack -F
conntrack v1.4.6 (conntrack-tools): connection tracking table has been emptied.
> ssh bandit0@bandit.labs.overthewire.org -p 2220
# hangs
Check firewall/policy
Reboot may have re-applied some rules.
1. Dump nftables ruleset
1
sudo nft list ruleset
See if anything reappeared. Especially look for drop
/reject
under output
or forward
.
I finally found the culprit here. My nftables has a table inet cloudflare-warp
that installs a base output
chain with policy drop
and only allows a few destinations/ports (53/80/443 and some Cloudflare IPs). Everything else including SSH ports 22/2220 gets dropped.
From the nftables:
table inet cloudflare-warp {
...
chain output {
type filter hook output priority filter; policy drop;
...
udp dport 53 accept
tcp dport 80 accept
tcp dport 443 accept
}
}
That exactly matches web works, DNS works, all outbound SSH time out.
2. Dump iptables
1
2
3
sudo iptables -S
sudo iptables -t nat -S
sudo iptables -t mangle -S
Safest fix: remove WARP completely
See what’s installed / running
1 2 3
dpkg -l | egrep -i 'cloudflare|warp' systemctl status warp-svc # if present warp-cli status 2>/dev/null || true
Gracefully shut WARP down (if present)
1 2 3 4 5 6 7 8 9
# turn off any enforced "always-on" warp-cli disable-always-on 2>/dev/null || true warp-cli disconnect 2>/dev/null || true warp-cli delete 2>/dev/null || true # stop the service so it can't reapply nft rules sudo systemctl stop warp-svc 2>/dev/null || true sudo systemctl disable warp-svc 2>/dev/null || true sudo systemctl mask warp-svc 2>/dev/null || true
Uninstall WARP (Zero Trust client)
1 2
sudo apt purge -y cloudflare-warp || true sudo apt autoremove -y
Remove any lingering firewall config from it
1 2 3 4 5 6 7 8 9 10
# nuke any cloudflare nft tables if they're still around sudo nft list tables | sed -n '1,200p' # if you still see "cloudflare-warp", then: sudo nft delete table inet cloudflare-warp 2>/dev/null || true # hard reset packet filters sudo nft flush ruleset sudo iptables -F sudo iptables -t nat -F sudo iptables -t mangle -F
Put your firewall back in a clean, friendly state
1 2 3 4 5
sudo ufw reset sudo ufw default deny incoming sudo ufw default allow outgoing sudo ufw allow 22/tcp sudo ufw enable
Quick connectivity probes (no proxies)
1 2
env -i TERM=$TERM nc -vz portquiz.net 22 env -i TERM=$TERM nc -vz portquiz.net 2220
You should see a quick “succeeded” or at least a prompt, if those succeed. e.g.:
1 2 3 4
> env -i TERM=$TERM nc -vz portquiz.net 22 Connection to portquiz.net (35.180.139.74) 22 port [tcp/ssh] succeeded! > env -i TERM=$TERM nc -vz portquiz.net 2220 Connection to portquiz.net (35.180.139.74) 2220 port [tcp/*] succeeded!
Try SSH again
1 2
env -i TERM=$TERM ssh -vvv bandit0@bandit.labs.overthewire.org -p 2220 env -i TERM=$TERM ssh -vvv <your_vps>@<vps_ip>
Bonus: Update SG with new IP
People who are under CGNAT are quite unlucky thanks to their ISP. With dynamic IPv4, we’d better check the SG (Security Group) time to time. Sometimes we cannot get the real IP from ipinfo.io, just connect to the instance, run:
1
2
# a packet from your IP must reach the instance on that port
sudo ss -tnp | grep <PORT>
Then you’ll get result like this:
1
2
3
<USER>>@<IP>:~$ sudo ss -tnp | grep <PORT>
LAST-ACK 0 25 [::ffff:172.31.32.78]:<PORT> [::ffff:56.123.45.66]:8637
LAST-ACK 0 25 [::ffff:172.31.32.78]:<PORT> [::ffff:56.123.45.66]:8634
The real IP here is 56.123.45.66
. Change the IP in SG with wider netmask, like 56.123.44.0/23
, which will cover both 56.123.44.x and 56.123.45.x.
💓 owed to Bandit.