Making sense of generic errors in VMware vSphere
We probably all know this situation. You did a task in VMware vSphere or any VMware product and the task fails. The only feedback you get are generic errorsc and not much of a help at all. This can be very frustrating and sometimes infuriating, at least to me. What I didn’t expect was that API would help me making sense of generic errors in VMware vSphere.
Context
To give a little context of what was at hand. Our vCenters use LDAPS connection to AD as Identity Source. Unfortunately the certificates on the AD controllers expired unnoticed and we lost SSO access to the environment. It took little time to fix the issue on the AD controllers, but now we needed to update certificates on other components in the environment as well. For VMware vSphere this really shouldn’t be too hard. First retrieve the updated certificates and save them somewhere. When done go to vCenter open Administration – SSO Configuration and select Identity Source. Choose edit for the broken connection and upload the certificates.
However, this did not work for me. In all honesty I have to say that at the moment of this writing I still ain’t sure if I made a mistake or if it actually is a system bug. For that to find out we need to recreate the issue in the test environment. The only feedback I got was this error: ‘Check the network settings and make sure you have network access to the identity source’. So eventually I decided to make a new connection. Checked all variables, checked certificates. Just whenever i tried to click save: ‘Check the network settings and make sure you have network access to the identity source’.
This made me feel like aaaargghhh
API Response in browser
Ofcourse we started searching on Google and we found a couple of possible solutions. However, most of them did not apply to our situation. So I was ready to dive a little deeper in the VMware logs when I found this brilliant remark from some guy on Reddit. This guy suggested to check out the API response to find out what was going on.
How to do it
For starters, I use Firefox. Probably this works on other browsers too, but that is not in scope of this post. Also keep in mind this works best on the HTML5 client and newer versions of vSphere. This probably works on other applications too, as well as on other and older versions of vSphere. it all depends on how well the API is written really. Since I am a VMware guy I am focussing today on vSphere 6.7 and the HTML5 Client.
In Firefox go to the three bars in the upper-right and click on Web Developer, then choose Network. This will open the Network pane and it should be waiting for input. In other words, waiting for you to do something. In our case we had all the variables filled out and ready for creating a new Identity Source. Remember, every attempt to save the config resulted in ‘Check the network settings …’. So now we are trying again but now we have this neat API 500 message appear in the network pane. We click on it and a new pane appears with some more information about the API call. So we clicked on the Response tab and now we can clearly see that the 500 error message is stating that the domain is already present in the environment. Shoot! Would it really be that simple?
So we deleted the broken connection, which was still there and yes! Working config. To me it is unclear why confusing generic errors are needed for simple issues like this. In the end though, I learned a neat little new trick and that counts as well.
Possibilities
Since my job involves working with API quite a lot I have been exploring around a little bit and I found this can also be really useful when programming. Not just for finding generic errors. Sometimes when you are constructing an API call it is not always straightforward how to construct it or how to find the right method. This little trick can actually help you. Just make the action in the UI. When you have the network pane opened it’s possible to figure out exactly what calls have been made and how they were constructed. To me personally the API References will always be leading, but this really helps in solving parts of the puzzle quicker.
Conclusion
It’s always nice to learn a new trick which can save you considerable amounts of time looking for generic errors and their meanings. No guarantee here that this trick always delivers on your awkward generic error messages, but hey, it’s at least worth a try!
What’s up next
Lately I have been working on creating some new Ansible roles. One involves a Powershell module for Ansible to delete Horizon Desktop Pools. The other one is a ‘NSX-v Rule Delete’ role. Much like I wrote about in this post. Except this role will run a lot faster and smoother because it is written on the RestAPI with the Ansible uri-module. This means no need for a Powershell jumphost. Stuff still needs to be finalized and made ready for public publishing, but I am getting there!
Thanks for reading !