How to diagnose issues if something fails
1. Health Check Endpoint
Before diagnosing issues, it is important to verify the health status of the containers. The Agriconnect platform provides an endpoint to check the health of its services:
Health Check URL: https://agriconnect.akvotest.org/api/health-check
Purpose
The health check endpoint returns the status of the containers, indicating whether they are up and running. A successful health check implies that the services are operational.
Expected Response
When all containers are healthy, the endpoint returns a JSON response similar to the following:
{
"status": "healthy",
"services": {
"rabbitmq": true,
"chromadb": true,
"database": true,
"assistant": true,
"eppo-librarian": true,
"backend": true
}
}
Response Details
-
status: Indicates the overall health of the containers. If all services are operational, the status will be "healthy".
-
services: A list of services with their respective health status. A value of true means the service is running, while false indicates a problem with that service.
2. Diagnosing Issues
If the health check indicates that one or more services are not operational (i.e., the status of a service is false), follow these steps to diagnose and resolve the issue:
a. Identify the Affected Service(s)
Review the JSON response from the health check to pinpoint which services have a false status.
b. Check Logs
Access the logs for the affected service(s) to identify any error messages or anomalies that may indicate the cause of the issue. Since we use Google Cloud, you will need to check the logs using the Google Cloud Console. Access Google Cloud Console:
-
Go to Google Cloud Console.
-
Navigate to Kubernetes Engine > Workloads.
-
Filter Logs:
-
Use the Cluster and Namespace filters to locate the relevant logs.
-
Filter by the test cluster and agriconnect-namespace namespace to view the specific workloads.
Within the workloads, identify the container with an unhealthy status. Select this container and then navigate to the "Logs" tab to view its logs. Analyzing these logs can provide insights into the issue, such as error messages or patterns that suggest the cause. A common issue might be related to an unstable Socket.IO connection, but the logs will help pinpoint the specific problem. Use this information to begin debugging and addressing the issue.
c. Restart the Service
If the issue is not apparent from the logs, restarting the affected service(s) may resolve the problem. This can be done by deleting the pods associated with the unhealthy container.
In the Google Cloud Console, navigate to the "Manage Pods" section for the specific container that is reported as unhealthy. Within this section, you can access the "Pods Detail" view. By deleting the pod from this view, the container will automatically restart, which may help restore it to a healthy state. This process is often effective in clearing transient issues that cause service disruptions.
3. Most Common Issues
Below are examples of some common issues that may occur on the Agriconnect platform, along with their related containers and explanations.
Issue |
Related Containers |
Explanation |
Assistant Messages/Whispers Not Delivered to Extension Officer Chat Window |
|
|
Extension Officer Messages Not Delivered to Farmer Device |
|
|
Farmer Messages Not Delivered to Extension Officer Chat Window |
|
No Comments