> The goal of the current maintenance is to fix a lot of long-standing issues with the site. The underlying infrastructure was getting very fragile as technical debt accumulated over time. A team is working very hard right now to make sure that once the site is back up, it's on much better footing and will be solid and reliable for the long term. Despite the unfortunate amount of time this is taking, it will be a major benefit to the site in the long run.
If I were a developer there I would be feeling really not very good. Just minutes of downtime on the systems I’ve worked on gets my heart rate going.
It also feels like there’s a lot being left unsaid in this statement. Normally you would work on these things in parallel to production… so something is seriously wrong.
The scenarios I have taken extended downtime for. When an OLTP's DB needed a serious overhaul for some reason and it was cheaper for rollout to plan operational downtime than risk loosing data or inconsistent transactions. Generational platform migration to complete system rewrites (something I am generally against, but that is its own soapbox). Migrating from on-prem to cloud infra, which required design changes. In all cases data integrity/consistency is the critical aspect. Migrating from one db technology to another (MySQL -> PostgreSQL).
In all those cases there is serious planning done before the migration, checklists, trial runs/validations, and validation procedures day off. If something isn't working, the leadership group evaluates the the issue and determines rollback vs go forward. Rollback needs to also be planned for, and your planned downtime window should be considered.
I agree with you, this wording implies they are making changes after this change. This could've been bad planning, a bad call day off, etc.
In one scenario, we _had_ to go forward while resolving several blockers on the fly. We had planned ahead of time developer rotation shifts. Pulling people off the line after 8-12hrs. At some point, you aren't thinking clearly understress. Don't know how big the team is over there is, but I hope they are pacing themselves, during what I am sure is a horrible moment of crisis to them.
My advice to them is, consider a roll back if needed/possible. Split responsibility between who is managing the process and dealing with specific problems. Focus on MVP. Don't try to _fix_ and replace at the same time, if something was broken before business wise, log it in your bug tracker and deal with it later. Pull people away if needed to get rest. Get upper management away from people doing the work, have them only talking to the group handling the process management.
Edit: I am also making a good faith assumption that this is planned and not an emergency response, either way, it doesn't change my general advice.
The maker people I know have been migrating away from Tindie because it has felt like a sinking ship for a long time.
I really like the idea of Tindie so I hope they can succeed. I don’t understand what sequence of events led to this being such a large problem that they can’t even keep their site online. The post says something vague about the engineering team is hoping the migration work is close to finished, but it’s been years since I remember any engineering team knocking out the entire site for days without being able to restore it during a failed migration. Are they outsourcing dev work to the type of agency that bills by the hour and perpetually churns low hourly cost work to make their money in volume fixing their own code?
Shopify, etsy, crowdsupply, a custom website. All have their problems, i’m not endorsing. I sell on tindie. Well, i don’t sell much there, but i list on tindie. Most of my sales come
thru my own store site.
that just resolves back to the original problem that Tindie solved, discoverability.
It's like saying people are fleeing ebay for Shopify. Yeah, I guess -- but that only really solves the merchant sales problem.
I buy from indie elec shops directly when I can, but the problem is that I commonly discover those shops thru tindie. Word of mouth/discord/etc isn't nearly as a great a tool as a searchable refreshing index.
It can be as simple as a terraform apply wiping out huge swaths of the backend infra, getting that back, depending on how disciplined you are, can take in the order of days/weeks.
This would indicate wherever they were hosting their site on no longer exists. 503's even on pages that should mostly be static suggest the backend no longer exists, or whatever ingress they're using in front of it disappeared. As far as I can tell every single page on their site is 503'ing.
They are putting out a lot of stuff that to me is very obvious to read between the lines what led to this because I've been brought in to clean messes like this before:
>The goal of the current maintenance is to fix a lot of long-standing issues with the site. The underlying infrastructure was getting very fragile as technical debt accumulated over time. A team is working very hard right now to make sure that once the site is back up, it's on much better footing and will be solid and reliable for the long term. Despite the unfortunate amount of time this is taking, it will be a major benefit to the site in the long run.
They are saying it was "spring cleaning" or a migration that took out the site for days. "infrastructure getting very fragile" reeks of bad or nonexistent ops practices, probably very little or unreliable IAC (if any, I've seen shops get by for 10+ years by just clicking things in console, til unfortunately it gets to this point).
This though, rubs me the wrong way:
> We want to offer a much better quality of service going foward. We understand that the lack of communication has been frustrating, and I have been closely watching social media and reporting the community's feelings up the chain, so your voices are being heard. The plan was not to have a long outage like this, but due to factors beyond the dev team's control, things have taken much longer than anticipated. Please be patient with us - I will keep updating here and on our other social media.
"Factors beyond the dev teams control." Sorry, no. If you have an ops team, you don't get to toss blame over the wall like that, and if you don't, you have no one to blame but yourselves. I feel bad for whoever the unofficial official ops dude is right now. These kind of infrastructure "tech debt" woopsies come from years of people just not giving a crap to doing things properly, it's never seen as important until it suddenly is. Hope they learn a lesson and hire an infrastructure guy properly. There's long been a persistent delusion in the pure dev world that they should be able to be completely agnostic to the hardware lying underneath their beautiful code - ideally yes, in practice almost never, unless you come from a place that has the significant resources to make something nice like that, or are willing to pay out the azz for managed cloud services or licenses.
It is entirely possible, especially in small companies in my experience, that “factors beyond the dev teams control” means “technical founder with severe myopia and decision fatigue who prevents “complexity”” as they see it, which for them means everything you discuss here as being necessary.
Unfortunate. Tindie is (was?) a pretty unique marketplace. Amusingly, a lot of what they were selling was probably illegal due to FCC rules: for the most part, you can't sell electronics without EMI certification and "I'm just a hobbyist" is not an excuse. Kits get a bit of leeway, but finished products don't.
Before the tariffs, I noticed that Chinese companies were trying to undercut them. I've gotten multiple mails asking me to start selling my designs with China-based outlets: they would make the PCBs, assemble them, and pay me some money for every item sold.
Can you share more information about the undercutting? I've heard of places like Elecrow trying to incentivize people to sell via their platform/OEM service but it sounds like you've had people asking you to license your designs?
I never followed up, but I didn't read it as some serious IP licensing thing. It sounded like they've come to the conclusion that they're making the stuff that's sold on Tindie anyway, so might as well set up a website and ship directly to your customers.
It's not likely, but if you're an expert I'm sure you could think of a few ways it would be possible. The reason we give people with pacemakers a list of machines to avoid is definitely not to waste their time because there is no possible way any of those things could be dangerous to them.
About Sunday/Monday last week right before it went down I noticed the site was supper buggy and failing to add things to cart, I emailed support and got a "we are checking the issue". Since it went down all I've heard from support is "Please be patient. Tindie will be back up soon as we are currently performing maintenance. At this time, we do not have an estimated timeframe to provide."
The fact that it wasn't communicated at all prior and not having a timeframe makes me thing this was probably an ops screw up.
I see this a lot with small independent sites with big userbases. Instead of being honest, they hide mistakes behind maintenance or blame it on hackers.
There are a number of things on Tindie that I have been unable to find anywhere else at any price. (Mostly small batch bespoke electronics.) I hope they figure this out.
The site has been on life support for a decade, ownership has changed hands a few times, basic features promised 10 years ago never shipped, API is half implemented (eg. you can download an order but you cannot mark it shipped), and they still have no mechanism to collect state sales tax nor will they submit a 1090 as required by US tax law. I jumped ship 5 years ago when this became too much of a problem and not a single thing has changed in those 5 years.
Tindie was a great place for a hacker to sell a few widgets back in the day, but legal requirements have changed since then but Tindie has not changed a line of code in at least 10 years.
Concerning, a professional development team should have been able to manage this switch with minimal to no downtime. Makes me wonder what other mistakes they're making.
I'm reluctant to trust my payment information with them in the future.
So many fairly popular apps, SaaS, etc are on skeleton crew staffing-levels. It'll probably get worse with vibe coding. Though then they'll probably launch Claude Ops, etc now that I think about it.
It's like Etsy for small-scale electronics - if you build a cool, niche electronic device as an individual, Tindie is a marketplace to sell in low volume (possibly as a kit).
Yeah this sucks, I have a bunch of hobbyist orders stuck in limbo since last week -- customers have paid, but I can't pull the orders down even through the API.
I really like Tindie as a platform and have been using it since nearly the beginning...but I'd have lost the contract if I pulled this level of nonsense on a customer's production application.
Glad I used a privacy.com burner when I bought from them. Quite a while later I found a declined purchased for pizza on the now long-deactivated burner card I used to purchase through them.
If I were a developer there I would be feeling really not very good. Just minutes of downtime on the systems I’ve worked on gets my heart rate going.
It also feels like there’s a lot being left unsaid in this statement. Normally you would work on these things in parallel to production… so something is seriously wrong.
In all those cases there is serious planning done before the migration, checklists, trial runs/validations, and validation procedures day off. If something isn't working, the leadership group evaluates the the issue and determines rollback vs go forward. Rollback needs to also be planned for, and your planned downtime window should be considered.
I agree with you, this wording implies they are making changes after this change. This could've been bad planning, a bad call day off, etc.
In one scenario, we _had_ to go forward while resolving several blockers on the fly. We had planned ahead of time developer rotation shifts. Pulling people off the line after 8-12hrs. At some point, you aren't thinking clearly understress. Don't know how big the team is over there is, but I hope they are pacing themselves, during what I am sure is a horrible moment of crisis to them.
My advice to them is, consider a roll back if needed/possible. Split responsibility between who is managing the process and dealing with specific problems. Focus on MVP. Don't try to _fix_ and replace at the same time, if something was broken before business wise, log it in your bug tracker and deal with it later. Pull people away if needed to get rest. Get upper management away from people doing the work, have them only talking to the group handling the process management.
Edit: I am also making a good faith assumption that this is planned and not an emergency response, either way, it doesn't change my general advice.
Edit: https://hackaday.social/@tindie/116427447318102919
https://hackaday.social/@tindie/116436988752373293
I really like the idea of Tindie so I hope they can succeed. I don’t understand what sequence of events led to this being such a large problem that they can’t even keep their site online. The post says something vague about the engineering team is hoping the migration work is close to finished, but it’s been years since I remember any engineering team knocking out the entire site for days without being able to restore it during a failed migration. Are they outsourcing dev work to the type of agency that bills by the hour and perpetually churns low hourly cost work to make their money in volume fixing their own code?
To what? The only alternative I know of is Lectronz.
It's like saying people are fleeing ebay for Shopify. Yeah, I guess -- but that only really solves the merchant sales problem.
I buy from indie elec shops directly when I can, but the problem is that I commonly discover those shops thru tindie. Word of mouth/discord/etc isn't nearly as a great a tool as a searchable refreshing index.
Example of a response I see:
< x-cache: Error from cloudfront < via: 1.1 bdf85d6d4811ab08c57841855a848f8a.cloudfront.net (CloudFront) < x-amz-cf-pop: LAX54-P11 < x-amz-cf-id: nTQ-y1Ut3F-04jUCDM09ordCtj0CMkVmmtZTe__BtzEr1sMJu7rKaw== < age: 76773
>The goal of the current maintenance is to fix a lot of long-standing issues with the site. The underlying infrastructure was getting very fragile as technical debt accumulated over time. A team is working very hard right now to make sure that once the site is back up, it's on much better footing and will be solid and reliable for the long term. Despite the unfortunate amount of time this is taking, it will be a major benefit to the site in the long run.
They are saying it was "spring cleaning" or a migration that took out the site for days. "infrastructure getting very fragile" reeks of bad or nonexistent ops practices, probably very little or unreliable IAC (if any, I've seen shops get by for 10+ years by just clicking things in console, til unfortunately it gets to this point).
This though, rubs me the wrong way:
> We want to offer a much better quality of service going foward. We understand that the lack of communication has been frustrating, and I have been closely watching social media and reporting the community's feelings up the chain, so your voices are being heard. The plan was not to have a long outage like this, but due to factors beyond the dev team's control, things have taken much longer than anticipated. Please be patient with us - I will keep updating here and on our other social media.
"Factors beyond the dev teams control." Sorry, no. If you have an ops team, you don't get to toss blame over the wall like that, and if you don't, you have no one to blame but yourselves. I feel bad for whoever the unofficial official ops dude is right now. These kind of infrastructure "tech debt" woopsies come from years of people just not giving a crap to doing things properly, it's never seen as important until it suddenly is. Hope they learn a lesson and hire an infrastructure guy properly. There's long been a persistent delusion in the pure dev world that they should be able to be completely agnostic to the hardware lying underneath their beautiful code - ideally yes, in practice almost never, unless you come from a place that has the significant resources to make something nice like that, or are willing to pay out the azz for managed cloud services or licenses.
Most ops guys can do dev, the inverse is absolutely not true IME.
Before the tariffs, I noticed that Chinese companies were trying to undercut them. I've gotten multiple mails asking me to start selling my designs with China-based outlets: they would make the PCBs, assemble them, and pay me some money for every item sold.
It's AM radio that gets interfered with.
The fact that it wasn't communicated at all prior and not having a timeframe makes me thing this was probably an ops screw up.
Either that or catastrophic data issues?
Otherwise so much downtime at once is pretty crazy
https://colonelpanic.tech/#products
Tindie was a great place for a hacker to sell a few widgets back in the day, but legal requirements have changed since then but Tindie has not changed a line of code in at least 10 years.
However, any downtime over an hour or two screams "migration gone wrong" to me.
Otherwise wouldn't you just roll back to get the site up to come back at it and try again later?
That means they've got zero people who know what they're doing.
I really like Tindie as a platform and have been using it since nearly the beginning...but I'd have lost the contract if I pulled this level of nonsense on a customer's production application.