Thedatastore
function creates a datastore, which is a repository for collections of data that are too large to fit in memory. A datastore allows you to read and process data stored in multiple files on a disk, a remote location, or a database as a single entity. If the data is too large to fit in memory, you can manage the incremental import of data, create atall
array to work with the data, or use the datastore as an input tomapreduce
for further processing. For more information, seeGetting Started with Datastore.
Getting Started with Datastore
A datastore is an object for reading a single file or a collection of files or data.
Select Datastore for File Format or Application
Choose the right datastore based on the file format of your data or application.
Read and Analyze Large Tabular Text File
This example shows how to create a datastore for a large text file containing tabular data, and then read and process the data one block at a time or one file at a time.
This example shows how to create a datastore for a collection of images, read the image files, and find the images with the maximum average hue, saturation, and brightness (HSV).
Read and Analyze MAT-File with Key-Value Data
This example shows how to create a datastore for key-value pair data in a MAT-file that is the output ofmapreduce
.
Read and Analyze Hadoop Sequence File
This example shows how to create a datastore for a Sequence file containing key-value data.
Work with remote data in Amazon S3™, Microsoft®Azure®Storage Blob, or HDFS™.
Set Up Datastore for Processing on Different Machines or Clusters
Setup a datastore on your machine that can be loaded and processed on another machine or cluster.
Create a fully customized datastore for your custom or proprietary data.
Develop Custom Datastore for DICOM Data
This example shows how to develop a custom datastore that supports writing operations.
Testing Guidelines for Custom Datastores
After implementing your custom datastore, follow this test procedure to qualify your custom datastore.