728x90

Deep learning models require large amounts of data for training, and organizing this data can be a complex task. PyTorch provides the DataSet and DataLoader classes to help with this task. In this blog post, we will go over how to use these classes to build custom datasets and dataloaders for deep learning models.

DataSet

The DataSet class is a PyTorch class that represents a dataset. It provides an interface to access the data samples in the dataset. You can create a custom DataSet class to represent your data.

Here is an example implementation of a custom DataSet class:

import torch
from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self, data):
        self.data = data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        sample = self.data[idx]
        x = torch.tensor(sample[0])
        y = torch.tensor(sample[1])
        return x, y

In this example, MyDataset is a custom DataSet class that takes a list of data samples as input. The __init__ method initializes the dataset with the provided data, and the __len__ method returns the length of the dataset. The __getitem__ method returns a single sample from the dataset at the given index. In this case, it returns two tensors: x and y.

DataLoader

The DataLoader class is a PyTorch class that provides an iterable over a dataset. It can be used to load data samples in parallel while training a deep learning model. The DataLoader class takes a DataSet object as input and provides an iterable over the samples in the dataset.

Here is an example implementation of a DataLoader:

from torch.utils.data import DataLoader

batch_size = 32

dataset = MyDataset(data)
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

In this example, MyDataset is the custom dataset that we created earlier. The DataLoader is created by passing the dataset as input along with the batch size and shuffle parameters. The batch_size parameter determines the number of samples to be loaded in each batch, while the shuffle parameter determines whether to shuffle the samples before loading.

 

Usage

Let's create a sample dataset and use it with the DataLoader.

from torch.utils.data import DataLoader

data = [(i, 2*i) for i in range(100)]

dataset = MyDataset(data)
dataloader = DataLoader(dataset, batch_size=10, shuffle=True)

for batch_idx, (x_batch, y_batch) in enumerate(dataloader):
    print("Batch", batch_idx)
    print("X:", x_batch)
    print("Y:", y_batch)

In this example, we create a dataset of 100 samples, where each sample is a pair of integers (i, 2*i). We then create a DataLoader with a batch size of 10 and shuffle the data. Finally, we iterate over the DataLoader and print each batch of samples.

Conclusion

In this blog post, we went over how to use the DataSet and DataLoader classes in PyTorch to build custom datasets and dataloaders for deep learning models. By using these classes, we can easily organize and load large datasets for training our models.


Related posts :

2023.02.22 - PyTorch : #1 Tensors

2023.02.23 - PyTorch : #2 Autograd

2023.02.25 - PyTorch : #4 Building Neural Network


 

728x90

'Data Science > Deep Learning' 카테고리의 다른 글

[NLP] 자연어 작업 종류  (0) 2023.03.04
PyTorch : #4 Building Neural Network  (0) 2023.02.25
PyTorch : #2 Autograd  (0) 2023.02.23
PyTorch : #1 Tensors  (0) 2023.02.22
Image Generate AI Testing  (0) 2023.02.14
  • 네이버 블러그 공유하기
  • 네이버 밴드에 공유하기
  • 페이스북 공유하기
  • 카카오스토리 공유하기
반응형