Unsplash has just released “the most complete high-quality open image dataset ever”: an open-source collection of over 2 million images captured by over 200,000 photographers, now available to download in bulk for free.
On the off-chance that you’re not familiar with Unsplash, the website allows photographers to share their images with the world for free under a CC0, no rights reserved, public domain “license.” Designers, publications, bloggers, and many others use Unsplash to access and use these images as they see fit, with or without attribution.
Today’s announcement takes this one major step further, but releasing over 2 million of the images within the Unsplash library as a single, bulk-downloadable, open-source data set that scientists (or really anyone) can use as they see fit. This will become a particularly useful resource for researchers who need to train computer vision models, as all 2 million+ images come complete with:
- keyword-image conversions in search results
- community and AI generated keywords
- EXIF, location, and landmarks
- image categories and subcategories
- user generated collections and groupings of images
- image views and downloads stats
“While there are other open source image datasets that exist, they’re usually limited in size, expose low quality images, lack variability in the image data, or rely on mass labeling by 3rd party services,” writes Unsplash on its blog. “With over 200,000+ contributing global photographers and data sourced from hundreds of millions searches across a nearly unlimited number of uses and contexts, the breadth of intent and semantics contained within the Unsplash dataset opens up entirely new use cases.”
The dataset will be available in two versions: a “lite” version that contains 25,000 photos and is available for both commercial and non-commercial use, and the full version which contains all 2 million+ images, but can only be used for non-commercial purposes. The lite dataset can be downloaded immediately, but you have to request access to the larger, full dataset.