10 things to consider when preparing your agricultural training dataset


The first step toward building a computer vision solution is gathering the necessary images and preparing the training dataset. Something of extreme importance for agricultural environments as they are amongst the most challenging due to the constantly changing environmental conditions. Plant’s shape, size, characteristics, and color change throughout the growing season and are heavily dependent on environmental conditions, not to mention possible diseases. Moreover, illumination changes as the sun moves, clouds pass by, or a third object shadows the object of interest. Moreover, occlusions can be caused by plant growth or can cause the object of interest to be partially visible, while machinery movement and wind introduce blurring.

Therefore, it becomes apparent how important it is to set up a robust and representative dataset before starting model fine-tuning. 

  1. Clearly define your use case
    Constantly changing your use case will significantly slow down model development while reducing your model performance.
  2. Collect as much relevant data as possible from all available sources. 
    Our platform allows you to choose from a variety of plants, enemies and image acquisition scenarios, export annotations in the format of your choice and find the ideal dataset for your project.
  3. Make sure you have included all relevant plant growth stages.
  4. Try to gather images whose background is consistent with the one you are expected to deal with.
    Using lab-acquired images to train a model that is going to be deployed in a real agricultural environment is of little added value.
  5. Gather images with different object sizes and distances to improve your dataset variance.
  6. Clean the collected data.
    Make sure that the images are representative and relevant to your task.
  7. Try to capture and include in your dataset images that are 100% in line with your use case.
    For example, including images captured at a specific angle, using artificial lighting, etc.
  8. Include in your datasets more classes than the ones you wish to identify.
  9. Carefully annotate your images taking into account your use case particularities.
    For example, take into account whether you want to detect or no occluded objects and respectively annotate them or not.
  10. Check the number of images that you have collected and assess whether they are enough.
    A rule of thumb is 100 images per class. However, this is an assumption and can change depending on the class.

To sum up, despite agriculture being an adverse environment for deploying computer vision solutions, through careful dataset preparation most of the hurdles can be overcame. It is not by chance that one of the hottest topics right now in machine learning is data-centric AI.