A university near me must be going through a hardware refresh, because they’ve recently been auctioning off a bunch of ~5 year old desktops at extremely low prices. The only problem is that you can’t buy just one or two. All the auction lots are batches of 10-30 units.

It got me wondering if I could buy a bunch of machines and set them up as a distributed computing cluster, sort of a poor man’s version of the way modern supercomputers are built. A little research revealed that this is far from a new idea. The first ever really successful distributed computing cluster (called Beowulf) was built by a team at NASA in 1994 using off the shelf PCs instead of the expensive custom hardware being used by other super computing projects at the time. It was also a watershed moment for Linux, then only a few yeas old, which was used to run Beowulf.

Unfortunately, a cluster like this seems less practical for a homelab than I had hoped. I initially imagined that there would be some kind of abstraction layer allowing any application to run across all computers on the cluster in the same way that it might scale to consume as many threads and cores as are available on a CPU. After some more research I’ve concluded that this is not the case. The only programs that can really take advantage of distributed computing seem to be ones specifically designed for it. Most of these fall broadly into two categories: expensive enterprise software licensed to large companies, and bespoke programs written by academics for their own research.

So I’m curious what everyone else thinks about this. Have any of you built or admind a Beowulf cluster? Are there any useful applications that would make it worth building for the average user?

  • The Bard in GreenA
    link
    fedilink
    1510 months ago

    Beowulf cluster. Beowulf. Now that’s a name I’ve not heard in a long time. A long time.

  • Snot Flickerman
    link
    fedilink
    English
    910 months ago

    The main issue I would see impeding this is power draw and heat. Unless you are rigged up to run this many machines (including appropriate UPSes) you may run into blowing fuses and needing almost as much power for air conditioning as you need to run the cluster.

    If you had like 50 SoCs maybe, because the power draw and heat might be manageable. Something like a RaspberryPi or OrangePi.

    • @plenipotentprotogod@lemmy.worldOP
      link
      fedilink
      410 months ago

      I was looking at HP mini PCs. The ones that were for sale used 7th gen i5s with a 35W TDP. They’re sold with a 65W power brick so presumably the whole system would never draw more than that. I could run a 16 node cluster flat out on a little over a kW, which is within the rating of a single residential circuit breaker. I certainly wouldn’t want to keep it running all the time, but it’s not like I’d have to get my electric system upgraded if I wanted to set one up and run it for a couple of hours as an experiment.

  • @rufus@discuss.tchncs.de
    link
    fedilink
    4
    edit-2
    10 months ago

    Kubernetes / K8s / K3s.

    If your trying to do compute / simulations on it, it depends on your workload… OpenMPI… ClusterKnoppix / LinuxPMI …

  • @carl_dungeon@lemmy.world
    link
    fedilink
    English
    710 months ago

    Docker or Kubernetes work well on a cluster. Before containers this was a lot more work to set up, but these days you just need to image them all, put them on the network, and then use some kind of container orchestration to send them containers/pods.

  • mesamune
    link
    fedilink
    510 months ago

    I did a whole ago just to see what it was all about. Then got rid of the setup a week later. It’s a cool project but I needed the other boxes. If you need a huge amount of parallel operations (and want to self host) it’s a decent option.

  • @notabot@lemm.ee
    link
    fedilink
    710 months ago

    It really depends on what sort of workload you want to run. Most programs have no concept of horizontal scaling like that, and those that do usually deal with it by just running an instance on each machine.

    That said, if you want to run lots of different workloads at the same time, you might want to have a look at something like Kubernetes. I’m not sure what you’d want to run in a homelab that would use even 10 machines, but it could be fun to find out.

    • @plenipotentprotogod@lemmy.worldOP
      link
      fedilink
      310 months ago

      I’m not sure what you’d want to run in a homelab that would use even 10 machines, but it could be fun to find out.

      Oh yeah, this is absolutely a solution in search of a problem. It all started with the discovery that these old (but not ancient, most of them are intel 7th gen) computers were being auctioned off for like $20 a piece. From there I started trying to work backwards towards something I could do with them.

      • Lettuce eat lettuce
        link
        fedilink
        4
        edit-2
        10 months ago

        There are several more practical uses for old PCs like that imo.

        You could grab a few of them, throw some 2nd hand GPUs in, clean them out and install Bazzite/ChimeraOS/Holo. Turn them into affordable Steam Consoles. Sell or give them away to friends, family, online etc.

        You could also refurbish them and donate them to a school or community center that is underfunded, which would be pretty cool.

        Use them as home media PCs, or build a homelab and use them as servers for different tasks. Use one as a NAS, another as a Hypervisor for VMs, another as a PFsense/Opnsense router/firewall, etc.

        Or just goof around and build a janky but badass cluster lol. When they are that cheap, almost anything you use them for is better value than they are as e-waste.

      • @notabot@lemm.ee
        link
        fedilink
        310 months ago

        They sound usable enough. If you’re interested in it, have you considered running a LLM or similar? I think they cluster. If they’ve got GPUs you could try Stablediffusion too.

        Mind you, at that price point I think we’re past the point of just thinking of them as compute resources. Use them as blocks, build a fort and refuse to come out unless someone comes up with a better idea.

        • @plenipotentprotogod@lemmy.worldOP
          link
          fedilink
          210 months ago

          I’ll have to look a little more into the AI stuff. It was actually my first thought, but I wasn’t sure how far I’d get without GPUs. I think they’re pretty much required for Stablediffusion. I’m pretty sure even LLMs are trained on GPUs, but maybe response generation can be done without one.

  • @knfrmity@lemmygrad.ml
    link
    fedilink
    110 months ago

    I tried migrating my personal services to Docker Swarm a while back. I have a Raspberry Pi as a 24/7 machine but some services could use a bit more power so I thought I’d try Swarm. The idea being that additional machines which are on sometimes could pick up some of the load.

    Two weeks later I gave up and rolled everything back to running specific services or instances on specific machines. Making sure the right data is available on all machines all the time, plus the networking between dependencies and in some cases specifying which service should prefer which machine was far too complex and messy.

    That said, if you want to learn Docker Swarm or Kubernetes and distributed filesystems, I can’t think of a better way.

  • 𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍
    link
    fedilink
    6
    edit-2
    10 months ago

    There are aseveral options, although some may be defunct.

    Last time I looked into this, openMosix was the most interesting, affordable, general-purpose option. It turned several computers into one big virtual computer. I ran a very small, 3-node cluster for a time. The upside was that you could run almost anything on it - unlike most HPC solutions, it didn’t require bespoke languages, libraries, or targetted solutions. The downside was performance; it turns out that to really take adventage of HPC, you really need to program for it. OpenMosix looks defunct now.

    OpenPMIx looks to have taken up the torch from OpenMosix. It looks active; I have no specific knowledge about it.

    tldp.org has some good required reading before you invest in this, in particular discussing the elephant in the room, networking latency. The short version is that, no matter how slow your computers, the bottleneck will still be the network. Unless you’re willing to invest a lot into fiber and expensive, fast switches, it’s probably not worth it.

    slurm crosses the line into modern cluster job management, like you might find in a cloud provider like AWS, which is tye direction the non-supercomputer industry took when commodity MPI turned out to be not feasible. Warewolf is another version, sort of one foot in distributed container management and lightweight MPI. Both are pretty involved, more Beowulf than OpenMosix.

    tldr, it’s probably not worth it if you’re looking for a cheap Beowulf cluster, because such a thing doesn’t exist in any practical sense. Cost, and physics, get in the way. If you want to set up a data center, or some job farm like AWS or GCS, that’s another matter. But it’s a far cry from MPI.

  • @lemmyingly@lemm.ee
    link
    fedilink
    310 months ago

    A friend and I created one years ago when we were at university made with 6 machines. We were running MATLAB simulations that would take over a day to complete on i3/i5 CPUs. Fortunately MATLAB and the simulation add-on package had been programmed to parallelize jobs, which reduced the simulation time down to just a few hours. This was done in a Windows environment with dual core HP machines with every RAM slot filled.

    I can’t imagine homelab workloads benefitting from such a set up unless something like video/3D rendering can utilise it.

  • @makeasnek@lemmy.ml
    link
    fedilink
    English
    5
    edit-2
    10 months ago

    Look into BOINC. It’s a free open source software for distributed computing (“map-reduce”-type problems). Runs on all platforms, handles computation at the petaflop scale. Large Hadron Collider (CERN) uses it to distribute computational work to volunteers. It’s also a way you can contribute your computer’s spare capacity to cancer research. !boinc@sopuli.xyz

  • Ramin Honary
    link
    fedilink
    English
    410 months ago

    Someone with more expertise can correct me if I am wrong, but the last I heard about this, I heard that cluster computing was obsoleted by modern IaaS and cloud computing technology.

    For example, the Xen project provides Unikernels as part of their Xen Cloud product. The unikernel is (as I understand it) basically a tiny guest operating system that statically links to a programming language runtime or virtual machine. So the Xen guest boots up a single executable program composed of the programming language runtime environment (like the Java virtual machine) statically linked to the unikernel, and then runs whatever high-level programming language that the virtual machine supports, like Java, C#, Python, Erlang, what have you.

    The reason for this is if you skip running Linux altogether, even a tiny Linux build like Alpine, and just boot directly into the virtual machine process, this tends to be a lot more memory efficient, and so you can fit more processes into the memory of a single physical compute node. Microsoft Azure does something similar (I think).

    To use it, basically you write a program a service in a programming language that runs on a VM and build it to run on a Xen unikernel. When you run the server, Xen allocates the computing resources for it and launches the executable program directly on the VM without an operating system, so the VM is, in effect, the operating system.