Notes

Changing SSH port and trapping bots with endlessh

I used to rely on DuckDNS to make my workstations accessible over the internet. However, I found that my auth logs were getting spammed with failed login attempts. Click to see a sample of the auth logs $ journalctl -f Feb 27 21:31:54 prometheus sshd[28079]: pam_unix(sshd:auth): check pass; user unknown Feb 27 21:31:54 prometheus sshd[28079]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=185.196.220.81 Feb 27 21:31:55 prometheus sshd[28079]: Failed password for invalid user user from 185.196.220.81 port 51248 ssh2 Feb 27 21:31:55 prometheus sshd[28077]: Failed password for root from 218.92.0.243 port 54595 ssh2 Feb 27 21:31:56 prometheus sshd[28079]: Received disconnect from 185.196.220.81 port 51248:11: end [preauth] Feb 27 21:31:56 prometheus sshd[28079]: Disconnected from invalid user user 185.196.220.81 port 51248 [preauth] Feb 27 21:31:58 prometheus sshd[28083]: Invalid user user from 185.196.220.81 port 37162 Feb 27 21:31:58 prometheus sshd[28083]: pam_unix(sshd:auth): check pass; user unknown Feb 27 21:31:58 prometheus sshd[28083]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=185.196.220.81 Feb 27 21:31:59 prometheus sshd[28077]: Failed password for root from 218.92.0.243 port 54595 ssh2 Feb 27 21:32:00 prometheus sshd[28083]: Failed password for invalid user user from 185.196.220.81 port 37162 ssh2 Feb 27 21:32:02 prometheus sshd[28077]: Failed password for root from 218.92.0.243 port 54595 ssh2 Feb 27 21:32:03 prometheus sshd[28083]: Received disconnect from 185.196.220.81 port 37162:11: end [preauth] Feb 27 21:32:03 prometheus sshd[28083]: Disconnected from invalid user user 185.196.220.81 port 37162 [preauth] Feb 27 21:32:03 prometheus sshd[28077]: Received disconnect from 218.92.0.243 port 54595:11: [preauth] Feb 27 21:32:03 prometheus sshd[28077]: Disconnected from authenticating user root 218.92.0.243 port 54595 [preauth] Feb 27 21:32:03 prometheus sshd[28077]: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.92.0.243 user=root Feb 27 21:32:04 prometheus sshd[28105]: Invalid user Admin from 185.196.220.81 port 37176 Feb 27 21:32:04 prometheus sshd[28105]: pam_unix(sshd:auth): check pass; user unknown Feb 27 21:32:04 prometheus sshd[28105]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=185.196.220.81 Feb 27 21:32:06 prometheus sshd[28105]: Failed password for invalid user Admin from 185.196.220.81 port 37176 ssh2 Feb 27 21:32:06 prometheus sshd[28105]: Received disconnect from 185.196.220.81 port 37176:11: end [preauth] Feb 27 21:32:06 prometheus sshd[28105]: Disconnected from invalid user Admin 185.196.220.81 port 37176 [preauth] Feb 27 21:32:07 prometheus sshd[28110]: Invalid user admin from 185.196.220.81 port 52236 Feb 27 21:32:07 prometheus sshd[28110]: pam_unix(sshd:auth): check pass; user unknown Feb 27 21:32:07 prometheus sshd[28110]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=185.196.220.81 Feb 27 21:32:09 prometheus sshd[28110]: Failed password for invalid user admin from 185.196.220.81 port 52236 ssh2 Feb 27 21:32:10 prometheus sshd[28110]: Received disconnect from 185.196.220.81 port 52236:11: end [preauth] Feb 27 21:32:10 prometheus sshd[28110]: Disconnected from invalid user admin 185.196.220.81 port 52236 [preauth] Feb 27 21:32:11 prometheus sshd[28122]: Invalid user admin from 185.196.220.81 port 52246 Feb 27 21:32:11 prometheus sshd[28122]: pam_unix(sshd:auth): check pass; user unknown Feb 27 21:32:11 prometheus sshd[28122]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=185.196.220.81 Feb 27 21:32:13 prometheus sshd[28122]: Failed password for invalid user admin from 185.196.220.81 port 52246 ssh2 Feb 27 21:32:14 prometheus sshd[28122]: Received disconnect from 185.196.220.81 port 52246:11: end [preauth] Feb 27 21:32:14 prometheus sshd[28122]: Disconnected from invalid user admin 185.196.220.81 port 52246 [preauth] Feb 27 21:32:15 prometheus sshd[28146]: Invalid user user from 185.196.220.81 port 52252 Feb 27 21:32:15 prometheus sshd[28146]: pam_unix(sshd:auth): check pass; user unknown Feb 27 21:32:15 prometheus sshd[28146]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=185.196.220.81 Feb 27 21:32:17 prometheus sshd[28146]: Failed password for invalid user user from 185.196.220.81 port 52252 ssh2 Feb 27 21:32:19 prometheus sshd[28146]: Received disconnect from 185.196.220.81 port 52252:11: end [preauth] Feb 27 21:32:19 prometheus sshd[28146]: Disconnected from invalid user user 185.196.220.81 port 52252 [preauth] Feb 27 21:32:35 prometheus sshd[28310]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.92.0.230 user=root Feb 27 21:32:36 prometheus sshd[28310]: Failed password for root from 218.92.0.230 port 30964 ssh2 Feb 27 21:32:38 prometheus sshd[28310]: Failed password for root from 218.92.0.230 port 30964 ssh2 Feb 27 21:32:43 prometheus sshd[28310]: Failed password for root from 218.92.0.230 port 30964 ssh2 Feb 27 21:32:43 prometheus sshd[28310]: Received disconnect from 218.92.0.230 port 30964:11: [preauth] Feb 27 21:32:43 prometheus sshd[28310]: Disconnected from authenticating user root 218.92.0.230 port 30964 [preauth] Feb 27 21:32:43 prometheus sshd[28310]: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.92.0.230 user=root Unsurprisingly, this is a common issue. So much so that there are databases of such IP addresses. For example, the last IP address from the logs above has been reported 100k+ times.. When I brought this up to my colleague Ben, he suggested using fail2ban. Looking into more options, I found endlessh to be a good solution. So here is what I did: ...

Adding a new HDD to a Linux system

As deep learning {datasets, models, scale of experiments} grow, so do the storage requirements, and we increasingly find ourselves running out of space on our SSDs. I recently added a new 8TB HDD to my workstation to act as a new scratch volume. While adding a disk would be a standard “sysadmin” work, I found the process of handling permissions for a shared research group on a domain-connected Linux machine to be a bit more involved than I expected. So, for the sake of my own documentation and on the off chance that someone else might find it useful, I’ll document the process here. Since I am using Ubuntu 22.04 on this workstation, I used the Ubuntu instrcutions as a starting point and modified them as needed. ...

Using HuggingFace Accelerate for mixed-precision training

Note: This post was originally written in 2021, but I have since updated it to reflect the latest changes in HuggingFace Accelerate (last update November 2025 using accelerate==1.11.0). For a grad course that recently concluded, the course project required me to train and evaluate a large number of models. Our school’s local SLURM cluster has new GPUs that support fp16, which meant I could take advantage of PyTorch’s Automatic Mixed Precision (AMP) training. And honestly, there is no reason not to use it: we get reduced memory usage, faster training, and all of this without virtually any loss in performance. ...

The "No-Space" Backup Solution (Streaming tar over SSH)

We recently got an email from our IT department that our workstation OSes will be getting upgraded from Ubuntu 18.04 MATE to Ubuntu 20.04 GNOME. As much as I love MATE and how lightweight it is (LinuxScoop makes wonderful OS overview videos), I also like the “visuals” of GNOME. My personal laptop already runs Ubuntu 20.04 GNOME, so I am excited to have it on my lab workstation as well. However, this OS upgrade also means that we have to backup our workstations since the drives will be wiped. Our research group has a generous storage space allocation on Compute Canada’s Cedar, so storage is not a big issue. The problem is: Cedar’s long-term storage space is a “tape-based backup system”, so there is a strict limit on the number of files we can store there. Therefore, the best strategy is to create tar archives of our data and store those on Cedar. ...