To load a dataset in one of above formats, you just need to provide the name of the format to the load_dataset() function alongwith the data_files argument that points to 1 or more file paths or urls.
Here dataset is loaded automatically as a DatasetDict object with each column in the csv file represented as a feature.
Loading from remote
Remote datasets be loaded by passing URLs to the data_files argument
Loading csv files
# Load the dataset from the URL directlydataset_url="https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"remote_csv_dataset=load_dataset("csv",data_files=dataset_url,sep=";")#sep like we pass in pandas dataframeremote_csv_dataset
Here, the data_files argument points to a url inside of a local file path
Loading raw text files
Raw text files are read line by line to build the dataset
- the other json format is by specifying a field in nested JSON
- these files basically look like one huge dictionary, so the load_dataset() allows you to specify which specific key to load