13 comments

  • blopker 1 hour ago
    Nice! I really like how many variations on this idea are coming out. MacWhisper used to be great, but is kinda of a buggy mess now.

    I'm making my own, for personal use. I did a survey of many and they all (that I could find) skip the fundamentals.

    The major issues that I've run into:

    - Crash recovery. Most of these apps are incredibly buggy and crash all the time, taking the recorded audio with them. Macwhisper is incredibly bad at this.

    - Disk space. Many of these apps save wav files to disk. After a few hours of meetings, you may end up with gigabytes eaten.

    - Microphone bleed. People don't always use headphones, the system mic will pick up the speaker sounds, causing duplicate (approximately) transcriptions.

    I've yet to find a solution that handles all these correctly, let alone having high quality transcriptions.

    Anyway, most of these apps are built around https://github.com/FluidInference/FluidAudio, if anyone is curious. Their readme has a big list of similar apps as well.

    • highmastdon 25 minutes ago
      I’m using MacParakeet these days. If your language is supported, definitely give it a try. It’s much faster and lower footprint
    • jv22222 1 hour ago
      Nice tip on FluidAudio that's the kind of thing I've been looking for. Thanks!
  • denbyc 1 hour ago
    I'd love to have a purchase option not tied to the App Store if possible. I don't use an Apple account with my Mac, but I would love to try Trace.
  • JohnBizBiz 17 hours ago
    The key moment flagging is what makes this distinct. Most transcription tools assume you'll review after the call as a cleanup pass, but what you've built is more of an annotation layer you're constructing in real time. Different mental model.

    Curious how the live recap handles latency. If it's updating every few seconds you can actually glance at it during a call, which starts to feel like in-meeting assistance rather than post-meeting review.

    I've been working on something on that end of the spectrum at livesuggest.ai, real-time suggestions during the call rather than transcript after. Same no-bot, no-cloud constraint, different moment in the workflow.

    • ZoneZealot 6 minutes ago
      HN is not the place for LLM generated advertisements
  • mushufasa 2 hours ago
    This looks like a good approach, though I would expect this to be a native macOs feature within 12 months -- this seems totally like it fits into their product roadmap.
  • nkmnz 1 hour ago
    Which Speech-to-Text is used? Is it possible to configure it? This might be crucial for supporting languages other than English - the model that comes built-in with macOS fails completely for German.
  • frabia 2 hours ago
    Super interesting! How accurate is the local model to transcribe audio compared to other cloud services? E.g. Google Meet, Otter, Granola, etc.
    • watchlight 1 hour ago
      A lot of the available models are Whisper or Faster-Whisper derived and shared across multiple apps. The tier names are often funny... "Tiny" "base" "small" "medium" "large" "large-v2" "large-v3" "large-v3-turbo" -en only variants, etc.

      In my experience, medium is often the sweet spot for English accuracy vs speed, especially if following-up with a post-processing pass. The large options are all fine, but can severely slow it down. There are some speed checks on my website if you're curious (link not posted because I don't want to hijack another post's app).

  • watchlight 5 hours ago
    Agreed with JohnBiz, the moment flagging is interesting and unusual, and a nice contrast to passive transcription. I only recently learned about MacWhisper (I'm Windows primarily) and was floored to learn how expensive the Pro option is. Nowadays it's not so hard to have some-level of DIY transcription, so crazy that it's priced with a premium.

    What's your diarization pipeline? Pyannote?

    I'd taken a different approach that used a LLM clean-up pass to summarize and progressively compress the transcript for ultra-long content, but I like the idea of targeted "pay attention here" flags.

  • nazca 2 hours ago
    I've been looking for this exact thing!
  • overflowy 2 hours ago
    Does it support multiple languages?
  • ipotapov 16 hours ago
    [dead]
  • satvikpendem 1 hour ago
    I don't see how this is different to literally the dozens of other offline transcription apps, many open source even unlike this one.
    • hmokiguess 1 hour ago
      can you share them? I'm looking for a decent open source one
      • infl8ed 41 minutes ago
        I don't mind https://matthartman.github.io/ghost-pepper/ however I do really want speaker recognition which it does have but I haven't been able to get it working.
      • jv22222 1 hour ago
        • vermilingua 1 hour ago
          I don’t see any there that are as focused as this one, perhaps except Talat which is considerably more expensive.
          • jv22222 1 hour ago
            Ah. My bad. I didn't review them I was just paying more attention to the op asking for a list of open source ones.
        • hmokiguess 57 minutes ago
          I went through the list but most feel subpar to me, and some aren't even open source (just claim they use FluidAudio I guess?)
    • jv22222 1 hour ago
      Classic HN. Thanks for keeping it real.