Google has announced a new, bery useful search engine named Dataset Search for scientists, data journalists and essentially anyone that relies on datasets for their work. The new search engine is said to enable easy access to the thousands of data repositories on the internet, providing access to ‘millions of datasets’ that also contains information from local and national governments around the world. The new search engine will work in the same manner as Google Scholar, which is the company’s dedicated search engine for academic studies and reports. Google states in its blog post, “Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher's site, a digital library, or an author's personal web page.”
In order to create the new Dataset search engine, Google developed guidelines for dataset providers to describe their Meta data in a way that Google and other search engines can better understand and serve online. Stating an overview of the guidelines, Natasha Noy, a Research Scientist at Google AI, says, “These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc. We then collect and link this information, analyze where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset.”
Google's Dataset search works in multiple languages and the company will soon add support for more languages. Google has also encouraged all large and small dataset providers to adopt to the open standard set by Schema.org so that maximum datasets can be crawled by the search engine. One will be able to find references to most datasets in environmental and social sciences, along with data from other sources like the government and news organizations like ProPublica.