PyTorch : #3 DataSet and DataLoader

Deep learning models require large amounts of data for training, and organizing this data can be a complex task. PyTorch provides the DataSet and DataLoader classes to help with this task. In this blog post, we will go over how to use these classes to build custom datasets and dataloaders for deep learning models.

DataSet

The DataSet class is a PyTorch class that represents a dataset. It provides an interface to access the data samples in the dataset. You can create a custom DataSet class to represent your data.

Here is an example implementation of a custom DataSet class:

import torch
from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self, data):
        self.data = data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        sample = self.data[idx]
        x = torch.tensor(sample[0])
        y = torch.tensor(sample[1])
        return x, y

In this example, MyDataset is a custom DataSet class that takes a list of data samples as input. The __init__ method initializes the dataset with the provided data, and the __len__ method returns the length of the dataset. The __getitem__ method returns a single sample from the dataset at the given index. In this case, it returns two tensors: x and y.

DataLoader

The DataLoader class is a PyTorch class that provides an iterable over a dataset. It can be used to load data samples in parallel while training a deep learning model. The DataLoader class takes a DataSet object as input and provides an iterable over the samples in the dataset.

Here is an example implementation of a DataLoader:

from torch.utils.data import DataLoader

batch_size = 32

dataset = MyDataset(data)
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

In this example, MyDataset is the custom dataset that we created earlier. The DataLoader is created by passing the dataset as input along with the batch size and shuffle parameters. The batch_size parameter determines the number of samples to be loaded in each batch, while the shuffle parameter determines whether to shuffle the samples before loading.

Usage

Let's create a sample dataset and use it with the DataLoader.

from torch.utils.data import DataLoader

data = [(i, 2*i) for i in range(100)]

dataset = MyDataset(data)
dataloader = DataLoader(dataset, batch_size=10, shuffle=True)

for batch_idx, (x_batch, y_batch) in enumerate(dataloader):
    print("Batch", batch_idx)
    print("X:", x_batch)
    print("Y:", y_batch)

In this example, we create a dataset of 100 samples, where each sample is a pair of integers (i, 2*i). We then create a DataLoader with a batch size of 10 and shuffle the data. Finally, we iterate over the DataLoader and print each batch of samples.

Conclusion

In this blog post, we went over how to use the DataSet and DataLoader classes in PyTorch to build custom datasets and dataloaders for deep learning models. By using these classes, we can easily organize and load large datasets for training our models.

Related posts :

2023.02.22 - PyTorch : #1 Tensors

2023.02.23 - PyTorch : #2 Autograd

2023.02.25 - PyTorch : #4 Building Neural Network

'Data Science > Deep Learning' 카테고리의 다른 글

[NLP] 자연어 작업 종류 (0)	2023.03.04
PyTorch : #4 Building Neural Network (0)	2023.02.25
PyTorch : #2 Autograd (0)	2023.02.23
PyTorch : #1 Tensors (0)	2023.02.22
Image Generate AI Testing (0)	2023.02.14

PyTorch : #3 DataSet and DataLoader

DataSet

DataLoader

Usage

Conclusion

'Data Science > Deep Learning' 카테고리의 다른 글

전체 카테고리

블로그 인기글

태그

전체 방문자

티스토리툴바

DataSet

DataLoader

Usage

Conclusion

'Data Science > Deep Learning' 카테고리의 다른 글

전체 카테고리

최근 글

최근댓글

블로그 인기글

태그

전체 방문자

티스토리툴바