Kubernetes Auto-Scaling: To resolve auto-scaling issues in Kubernetes, I rely on the Horizontal Pod Autoscaler (HPA) to manage scaling based on CPU or memory usage, while also considering custom metrics using the Kubernetes Metrics Server and Prometheus. Most of the issues, however arise due to suboptimal metric thresholds, and therefore, I fine-tune these along with relevant application metrics such as the rate of requests to ensure efficient scaling responsiveness. For node level scaling, Cluster Autoscaler adds or removes nodes for the required workload thus always optimizing the resource usage in a manner that doesn't over- or under-scale.
I do make use of AWS Auto-Scaling on AWS by configuring my auto scaling groups using CloudWatch alarms for custom metrics, say request latency or queue depth. Using auto scaling and Elastic Load Balancing in combination, the load balancer will redistribute the traffic amongst all healthy instances. I usually change the cool down period to increase the response time, ensuring that the actions of ASG are in line with the current traffic level and do not cause any unnecessary scaling cycles. I analyze historical load patterns to set proper thresholds in both AWS and Kubernetes. Fine-tuning scaling policies matches demand in both the above-mentioned environments.