ImageNet Dataset: Key Features, Limitations, and How to Get Started

Guide ML Datasets

By: Kolena Editorial Team

Oct 11, 2024

What Is the ImageNet Dataset?

The ImageNet dataset is a large-scale, annotated database of images designed for use in computer vision research. Developed by a team of researchers from Stanford University, Princeton University, and other institutions, it has been used to advance the field of computer vision in general, and more specifically, object recognition and image classification.

The dataset contains millions of images covering thousands of object categories, labeled to provide accurate training data for algorithms.

ImageNet has become a benchmark for evaluating the performance of algorithms in accurately identifying and categorizing objects within images. Its structured format and comprehensive nature have made it a popular dataset for developing and testing image recognition algorithms.

However, as computer vision algorithms become more powerful, the utility of ImageNet has diminished in recent years. While it is still useful, state of the art algorithms might need larger and more complex datasets for effective training.

Source: Stanford

This is part of a series of articles about machine learning datasets

Key Features of the ImageNet Dataset

ImageNet offers the following important features:

Diverse Categories

With over 20,000 categories, ImageNet covers a range of objects, scenes, and abstract concepts. This diversity ensures that algorithms trained on the ImageNet dataset can recognize a broad spectrum of objects in real-world settings.

WordNet Hierarchy

The ImageNet dataset utilizes the WordNet hierarchy to organize its categories. WordNet is a lexical database of English that groups words into sets of cognitive synonyms, providing short descriptions or “glosses.” Each category is linked to a WordNet synset, enabling nuanced and accurate classification tasks, helping algorithms differentiate between closely related concepts.

Extensive Collection

ImageNet contains over 14 million hand-annotated images distributed among its categories. This massive volume of data is useful for training deep learning models. The dataset’s size ensures models are exposed to a variety of lighting conditions, perspectives, and backgrounds, enhancing their ability to generalize from the training data to new, unseen images.

ImageNet Applications and Use Cases

The ImageNet dataset is useful for training the following types of projects.

Image Classification

Algorithms are trained to accurately assign a label to an image from a predefined set of categories. The dataset enables machines to recognize and understand content within images. For example, an algorithm trained on the ImageNet dataset could distinguish between images of cats and dogs, or more broadly, categorize photos as portraying natural scenes or urban environments.

Object Detection and Instance Segmentation

Object detection involves locating and identifying multiple objects within a single image. Instance segmentation goes a step further, delineating the exact boundaries of each object. Training algorithms on ImageNet prepares them to tackle complex scenes, identifying and segmenting various objects regardless of their size or overlap.

Self-Supervised Image Classification

Algorithms can be trained to generate their own labels from the unlabelled data, learning to understand and categorize images without explicit human annotation. Training with ImageNet helps refine these self-learning processes, reducing the need for hand-labeled datasets while maintaining high levels of accuracy. It helps make AI models more scalable and cost-effective.

What Is the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)?

The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is an annual competition where research teams test their algorithms against a subset of the ImageNet dataset. Launched in 2010, with the last public challenge in 2017, it aimed to advance computer vision technologies by setting various tasks, such as image classification, object detection, and object localization.

By providing a standardized benchmark, ILSVRC enables a direct comparison of algorithms, fostering innovation. Several of the algorithms that won the ILSVRC became significant milestones in computer vision research—these include AlexNet, Google Inception, VGG, and ResNet.

Limitations of the ImageNet Dataset

Over the years there have been several important criticisms of the ImageNet dataset in the research community, including:

MIT researchers identified significant “systematic annotation issues” in the dataset, highlighting a misalignment between the dataset and actual real-world observations when used for benchmarking purposes. ImageNet was found to have accuracy issues mainly related to images containing multiple objects (Tsipras, et al., 2020).
A paper evaluating the relevance of ImageNet for modern computer vision algorithms concluded that its usefulness may be nearing an end, and proposed a new set of labels that could increase the dataset’s utility (Beyer, at al., 2020).
In 2022, the ImageNet project team blurred faces in their dataset to protect individual’s privacy, which reduces the usefulness of the dataset for some purposes (Yang, et al., 2022).

Quick Tutorial: Downloading and Using the ImageNet Dataset with PyTorch

In this tutorial, we’ll cover how to acquire and set up the ImageNet dataset for use in machine learning projects via Kaggle.

Setting Up the Kaggle CLI

First, make sure you have a Kaggle account. Install the Kaggle Command Line Interface (CLI) using the Python package manager pip:

pip install kaggle

Configuring Your API Credentials

Before using the Kaggle API, you need to generate an API token through your Kaggle account. This is done by navigating to the Account section of your profile and selecting Create API Token. The resulting download of kaggle.json contains your API credentials.

This file must be placed in a specific directory (~/.kaggle/kaggle.json on Unix-like systems or C:\Users\<Windows-username>\.kaggle\kaggle.json on Windows). Optionally, you can modify this default location by setting the KAGGLE_CONFIG_DIR environment variable to a new path with the kaggle.json.

Downloading the Dataset

Downloading the dataset is straightforward but may require patience due to its substantial size of 168 GB:

Note: Ensure your hard-disk has at least 330GB free before downloading and unzipping the file.

kaggle competitions download -c 
imagenet-object-localization-challenge

Following the download, the dataset needs to be extracted to a desired directory. This step can be achieved using standard file extraction commands, depending on your operating system.

Preparing Additional Resources

Before utilizing the dataset, two additional helper files are necessary for facilitating certain operations. These files can be easily downloaded into the directory containing the ILSVRC folder using wget on Unix Systems:

cd <YOUR_FOLDER>
wget https://raw.githubusercontent.com/raghakot/keras-vis/master/resources/imagenet_class_index.json
wget https://gist.githubusercontent.com/paulgavrikov/3af1efe6f3dff63f47d48b91bb1bca6b/raw/00bad6903b5e4f84c7796b982b72e2e617e5fde1/ILSVRC2012_val_labels.json

Implementing a Dataset Class for PyTorch

Create a Dataset class compatible with PyTorch. This class facilitates the loading of images and their respective labels into the model for training or validation purposes.

The process involves parsing the downloaded class index and label files to map the synset IDs to their corresponding class IDs, as follows:

# Import required libraries and modules
import os
from torch.utils.data import Dataset
from PIL import Image
import json

# Define the custom dataset class for the ImageNet dataset
class ImageNetKaggle(Dataset):
    def __init__(self, root, split, transform=None):
        self.samples = []  # List to store paths to images
        self.targets = []  # List to store corresponding class labels for images
        self.transform = transform  # Transformation function to apply to images (e.g., normalization)
        self.syn_to_class = {}  # Dictionary to map synset IDs to class IDs

        # Load class index file and create a mapping from synset IDs to class IDs
        with open(os.path.join(root, "imagenet_class_index.json"), "rb") as f:
            json_file = json.load(f)
            for class_id, v in json_file.items():
                self.syn_to_class[v[0]] = int(class_id)

        # Load validation labels file and create a mapping from image file names to synset IDs
        with open(os.path.join(root, "ILSVRC2012_val_labels.json"), "rb") as f:
            self.val_to_syn = json.load(f)

        # Construct the path to the directory containing the images for the specified split (train/val)
        samples_dir = os.path.join(root, "ILSVRC/Data/CLS-LOC", split)
        
        # Iterate over the directory to populate the samples and targets lists with image paths and class labels
        for entry in os.listdir(samples_dir):
            if split == "train":
                # For training images, use the directory name as the synset ID
                syn_id = entry
                target = self.syn_to_class[syn_id]  # Map synset ID to class ID
                syn_folder = os.path.join(samples_dir, syn_id)
                for sample in os.listdir(syn_folder):
                    sample_path = os.path.join(syn_folder, sample)
                    self.samples.append(sample_path)
                    self.targets.append(target)
            elif split == "val":
                # For validation images, use the mapping from image file name to synset ID
                syn_id = self.val_to_syn[entry]
                target = self.syn_to_class[syn_id]  # Map synset ID to class ID
                sample_path = os.path.join(samples_dir, entry)
                self.samples.append(sample_path)
                self.targets.append(target)

    # Return the total number of samples in the dataset
    def __len__(self):
        return len(self.samples)

    # Retrieve an image and its corresponding label by index, applying any transformations if specified
    def __getitem__(self, idx):
        x = Image.open(self.samples[idx]).convert("RGB")  # Open image and convert to RGB
        if self.transform:
            x = self.transform(x)  # Apply transformations to the image
        return x, self.targets[idx]  # Return the transformed image and its label

Validating the Setup

Note: It is highly recommended to carry out this step on a machine with a GPU.

To confirm the successful setup of the dataset and its compatibility with PyTorch, run a validation epoch using a pre-trained model such as ResNet-50. Create a DataLoader to iterate over the dataset with the appropriate transformations and calculate the model’s accuracy on the validation set.

import torch
from torchvision import models, transforms
from torch.utils.data import DataLoader
import torch.nn.functional as F

# Assuming the ImageNetKaggle class is defined as above

# Define transformations for the validation data
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Initialize the ImageNetKaggle dataset for the validation split
imagenet_val_dataset = ImageNetKaggle(root='path/to/your/dataset', split='val', transform=transform)

# Create a DataLoader for the validation dataset
val_loader = DataLoader(imagenet_val_dataset, batch_size=64, shuffle=False, num_workers=4)

# Load a pre-trained ResNet-50 model
model = models.resnet50(pretrained=True)
model.eval()  # Set the model to evaluation mode

# Move model to the appropriate device (GPU if available)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Function to calculate accuracy
def calculate_accuracy(output, target):
    """Calculate accuracy given output and target"""
    with torch.no_grad():
        pred = torch.argmax(output, dim=1)
        assert pred.shape[0] == len(target)
        correct = 0
        correct += torch.sum(pred == target).item()
    return correct / len(target)

# Validate the model on the validation dataset
correct = 0
total = 0
with torch.no_grad():
    for images, labels in val_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        correct += calculate_accuracy(outputs, labels) * images.size(0)
        total += images.size(0)

# Print out the accuracy
accuracy = correct / total
print(f'Accuracy of the network on the validation images: {accuracy * 100:.2f}%')

The output should look something like this:

Testing and Evaluating Computer Vision Models with Kolena

We built Kolena to make robust and systematic ML testing easy and accessible for all organizations. With Kolena, machine learning engineers and data scientists can uncover hidden machine learning model behaviors, easily identify gaps in the test data coverage, and truly learn where and why a model is underperforming, all in minutes not weeks. Kolena’s AI / ML model testing and validation solution helps developers build safe, reliable, and fair systems by allowing companies to instantly stitch together razor-sharp test cases from their data sets, enabling them to scrutinize AI/ML models in the precise scenarios those models will be unleashed upon the real world. Kolena platform transforms the current nature of AI development from experimental into an engineering discipline that can be trusted and automated.

Among its many capabilities, Kolena also helps with feature importance evaluation, and allows auto-tagging features. It can also display the distribution of various features in your datasets.

Reach out to us to learn how the Kolena platform can help build a culture of AI quality for your team.