This page provides access to and statistics about class-specific subsets of the Schema.org data contained in the October 2016 version of the Web Data Commons Microdata corpus. The datasets are part of the Web Data Commons Schema.org Data Set Series
As many users are only interested in specific types of Schema.org data (like product data, event data, or address data), we have created class-specific subsets out of the complete Microdata corpus for a selection of schema.org classes. The subsets contain all instances of a specific class as well as all other data that is found on the webpages containing these instances. For example, a page containing data about a product might also contain reviews and offers for this product; a page containing data about an event might also contain data about the location of the event and the persons involved in the event. The data is represented in N-Quads format, meaning that the forth element of each quad contains the URL of the webpage from which the data was extracted.
Please note that
Class Name | Total Number of | Top Classes (Entity Count) | Total File Size | Quad File |
---|---|---|---|---|
http://schema.org/AdministrativeArea | Quads: 12,867,181 URLs: 255,525 Hosts: 227 | http://schema.org/City (520,388)http://schema.org/AdministrativeArea (411,321)http://schema.org/CityHall (405,500)http://schema.org/Service (225,186)http://schema.org/ListItem (214,727) | 226 MB | schema_AdministrativeArea.gz (sample) |
http://schema.org/Airport | Quads: 117,446,415 URLs: 1,495,145 Hosts: 115 | http://schema.org/Airport (28,832,728)http://schema.org/Place (174,243)http://schema.org/Thing (119,340)http://schema.org/Flight (86,205)http://schema.org/Offer (86,101) | 1.9 GB | schema_Airport.gz (sample) |
http://schema.org/Book | Quads: 464,195,676 URLs: 16,736,122 Hosts: 3,872 | http://schema.org/Book (26,677,797)http://schema.org/Offer (20,073,618)http://schema.org/Person (7,463,979)http://schema.org/AggregateRating (3,549,711)http://schema.org/Organization (2,918,703) | 11.2 GB | schema_Book.gz (sample) |
http://schema.org/City | Quads: 345,260,211 URLs: 1,385,732 Hosts: 521 | http://schema.org/City (13,202,648)http://schema.org/PostalAddress (12,990,504)http://schema.org/GeoCoordinates (10,904,723)http://schema.org/LocalBusiness (10,238,380)http://schema.org/Person (8,839,107) | 4.5 GB | schema_City.gz (sample) |
http://schema.org/CollegeOrUniversity | Quads: 229,112,562 URLs: 889,160 Hosts: 498 | http://schema.org/PostalAddress (11,109,133)http://schema.org/GeoCoordinates (10,895,802)http://schema.org/LocalBusiness (10,842,093)http://schema.org/Person (10,136,115)http://schema.org/CollegeOrUniversity (7,438,404) | 3.2 GB | schema_CollegeOrUniversity.gz (sample) |
http://schema.org/Continent | Quads: 3,573,911 URLs: 79,354 Hosts: 14 | http://schema.org/City (429,631)http://schema.org/AdministrativeArea (137,014)http://schema.org/GeoCoordinates (84,724)http://schema.org/Continent (79,994)http://schema.org/Country (79,473) | 46.4 MB | schema_Continent.gz (sample) |
http://schema.org/Country | Quads: 153,811,837 URLs: 1,642,362 Hosts: 540 | http://schema.org/PostalAddress (5,614,281)http://schema.org/Person (5,283,042)http://schema.org/Rating (4,033,791)http://schema.org/Review (3,772,341)http://schema.org/Thing (3,638,809) | 2.9 GB | schema_Country.gz (sample) |
http://schema.org/CreativeWork | Quads: 984,391,890 URLs: 15,454,028 Hosts: 133,144 | http://schema.org/Person (38,425,951)http://schema.org/CreativeWork (32,003,991)http://schema.org/Comment (22,677,362)http://schema.org/SiteNavigationElement (6,893,097)http://schema.org/WPSideBar (6,089,019) | 29.3 GB | schema_CreativeWork.gz (sample) |
http://schema.org/EducationalOrganization | Quads: 23,563,769 URLs: 522,190 Hosts: 1,696 | http://schema.org/EducationalOrganization (1,360,732)http://schema.org/PostalAddress (658,804)http://schema.org/Person (472,332)http://schema.org/Place (447,112)http://schema.org/EducationEvent (407,613) | 428.7 MB | schema_EducationalOrganization.gz (sample) |
http://schema.org/Event | Quads: 321,907,077 URLs: 4,574,989 Hosts: 28,892 | http://schema.org/Event (24,483,731)http://schema.org/Place (16,398,269)http://schema.org/PostalAddress (11,438,308)http://schema.org/Offer (2,158,681)http://schema.org/ListItem (2,000,867) | 6.1 GB | schema_Event.gz (sample) |
http://schema.org/GeoCoordinates | Quads: 1,851,305,512 URLs: 19,122,471 Hosts: 40,316 | http://schema.org/GeoCoordinates (71,899,160)http://schema.org/PostalAddress (60,579,078)http://schema.org/LocalBusiness (29,218,141)http://schema.org/Person (20,567,763)http://schema.org/Offer (19,233,560) | 30.6 GB | schema_GeoCoordinates.gz (sample) |
http://schema.org/GovernmentOrganization | Quads: 2,240,980 URLs: 163,349 Hosts: 240 | http://schema.org/GovernmentOrganization (179,280)http://schema.org/PostalAddress (31,230)http://schema.org/ListItem (26,065)http://schema.org/Event (25,504)http://schema.org/CreativeWork (20,672) | 52.3 MB | schema_GovernmentOrganization.gz (sample) |
http://schema.org/Hospital | Quads: 4,008,412 URLs: 191,322 Hosts: 335 | http://schema.org/PostalAddress (230,851)http://schema.org/Hospital (214,799)http://schema.org/Physician (66,946)http://schema.org/MedicalSpecialty (49,872)http://schema.org/Place (29,715) | 76.7 MB | schema_Hospital.gz (sample) |
http://schema.org/Hotel | Quads: 766,147,285 URLs: 18,541,645 Hosts: 7,864 | http://schema.org/Hotel (61,893,699)http://schema.org/LandmarksOrHistoricalBuildings (31,170,163)http://schema.org/ListItem (19,257,013)http://schema.org/PostalAddress (14,638,100)http://schema.org/AggregateRating (12,202,799) | 15.4 GB | schema_Hotel.gz (sample) |
http://schema.org/JobPosting | Quads: 685,258,597 URLs: 7,022,639 Hosts: 6,352 | http://schema.org/JobPosting (62,403,002)http://schema.org/Place (39,055,711)http://schema.org/PostalAddress (28,237,579)http://schema.org/Organization (19,385,367)http://schema.org/Postaladdress (4,341,982) | 16.9 GB | schema_JobPosting.gz (sample) |
http://schema.org/LakeBodyOfWater | Quads: 171,345 URLs: 1,467 Hosts: 16 | http://schema.org/PostalAddress (8,476)http://schema.org/GeoCoordinates (8,436)http://schema.org/LakeBodyOfWater (2,986)http://schema.org/City (1,060)http://schema.org/Park (743) | 2.7 MB | schema_LakeBodyOfWater.gz (sample) |
http://schema.org/LandmarksOrHistoricalBuildings | Quads: 294,849,926 URLs: 2,455,599 Hosts: 130 | http://schema.org/LandmarksOrHistoricalBuildings (31,440,648)http://schema.org/Hotel (30,562,461)http://schema.org/ListItem (8,769,248)http://schema.org/BreadcrumbList (2,224,914)http://schema.org/PostalAddress (292,271) | 4.8 GB | schema_LandmarksOrHistoricalBuildings.gz (sample) |
http://schema.org/Language | Quads: 4,326,692 URLs: 96,159 Hosts: 363 | http://schema.org/Language (141,622)http://schema.org/SiteNavigationElement (141,170)http://schema.org/Thing (99,858)http://schema.org/PostalAddress (47,830)http://schema.org/Organization (45,694) | 116.4 MB | schema_Language.gz (sample) |
http://schema.org/Library | Quads: 8,251,280 URLs: 130,328 Hosts: 81 | http://schema.org/Photograph (495,244)http://schema.org/CreativeWork (363,460)http://schema.org/Organization (176,994)http://schema.org/PostalAddress (163,276)http://schema.org/Library (135,073) | 152.8 MB | schema_Library.gz (sample) |
http://schema.org/LocalBusiness | Quads: 1,838,022,274 URLs: 28,125,441 Hosts: 192,558 | http://schema.org/LocalBusiness (99,624,343)http://schema.org/PostalAddress (76,694,627)http://schema.org/Person (72,164,102)http://schema.org/GeoCoordinates (28,489,404)http://schema.org/ListItem (19,844,281) | 30 GB | schema_LocalBusiness.gz (sample) |
http://schema.org/Mountain | Quads: 260,198 URLs: 2,243 Hosts: 36 | http://schema.org/Mountain (11,717)http://schema.org/GeoCoordinates (10,429)http://schema.org/PostalAddress (10,015)http://schema.org/Review (2,975)http://schema.org/City (1,022) | 4 MB | schema_Mountain.gz (sample) |
http://schema.org/Movie | Quads: 285,580,395 URLs: 7,609,774 Hosts: 3,946 | http://schema.org/Person (19,440,026)http://schema.org/Movie (15,957,304)http://schema.org/AggregateRating (2,677,268)http://schema.org/Organization (1,276,542)http://schema.org/ImageGallery (1,166,362) | 6.5 GB | schema_Movie.gz (sample) |
http://schema.org/Museum | Quads: 4,271,553 URLs: 96,085 Hosts: 123 | http://schema.org/Painting (394,098)http://schema.org/Review (115,535)http://schema.org/Museum (100,589)http://schema.org/GeoCoordinates (96,933)http://schema.org/PostalAddress (79,377) | 92.2 MB | schema_Museum.gz (sample) |
http://schema.org/MusicAlbum | Quads: 230,666,205 URLs: 2,109,717 Hosts: 870 | http://schema.org/MusicRecording (23,707,647)http://schema.org/MusicAlbum (11,245,618)http://schema.org/Offer (7,357,181)http://schema.org/AudioObject (7,293,035)http://schema.org/CreativeWork (2,616,054) | 3.7 GB | schema_MusicAlbum.gz (sample) |
http://schema.org/MusicRecording | Quads: 272,171,578 URLs: 3,800,484 Hosts: 2,910 | http://schema.org/MusicRecording (31,315,671)http://schema.org/MusicAlbum (9,066,064)http://schema.org/Offer (7,596,965)http://schema.org/AudioObject (7,472,527)http://schema.org/MusicGroup (2,825,148) | 4.5 GB | schema_MusicRecording.gz (sample) |
http://schema.org/Organization | Quads: 1,496,400,771 URLs: 117,692,921 Hosts: 234,353 | http://schema.org/Organization (213,999,787)http://schema.org/Product (150,032,832)http://schema.org/Offer (118,232,568)http://schema.org/PostalAddress (73,540,221)http://schema.org/ListItem (49,103,678) | 148.4 GB | schema_Organization.gz (sample) |
http://schema.org/Painting | Quads: 11,613,061 URLs: 251,249 Hosts: 104 | http://schema.org/Painting (1,370,332)http://schema.org/Offer (694,764)http://schema.org/UserComments (207,747)http://schema.org/Person (92,908)http://schema.org/AggregateRating (75,420) | 247.6 MB | schema_Painting.gz (sample) |
http://schema.org/Park | Quads: 513,019 URLs: 5,177 Hosts: 65 | http://schema.org/GeoCoordinates (25,998)http://schema.org/PostalAddress (24,747)http://schema.org/Park (11,917)http://schema.org/City (2,964)http://schema.org/TouristAttraction (2,434) | 8.9 MB | schema_Park.gz (sample) |
http://schema.org/Person | Quads: 4,652,061,588 URLs: 72,005,129 Hosts: 155,740 | http://schema.org/Person (371,576,977)http://schema.org/PostalAddress (54,141,759)http://schema.org/ImageObject (51,991,923)http://schema.org/Comment (49,274,622)http://schema.org/ListItem (28,490,236) | 134 GB | schema_Person.gz (sample) |
http://schema.org/Product | Quads: 9,573,867,608 URLs: 213,426,706 Hosts: 249,947 | http://schema.org/Product (682,974,060)http://schema.org/Offer (497,856,160)http://schema.org/ListItem (98,461,674)http://schema.org/AggregateRating (91,093,293)http://schema.org/Organization (75,855,786) | 203.1 GB | schema_Product.gz (sample) |
http://schema.org/Place | Quads: 1,543,812,514 URLs: 19,112,166 Hosts: 52,323 | http://schema.org/Place (92,742,830)http://schema.org/PostalAddress (74,013,479)http://schema.org/JobPosting (38,191,134)http://schema.org/GeoCoordinates (18,231,275)http://schema.org/Offer (17,722,545) | 37 GB | schema_Place.gz (sample) |
http://schema.org/RadioStation | Quads: 3,554,366 URLs: 255,536 Hosts: 91 | http://schema.org/RadioStation (270,352)http://schema.org/PostalAddress (68,297)http://schema.org/MusicVideoObject (55,608)http://schema.org/ImageObject (55,583)http://schema.org/VideoObject (55,582) | 76.8 MB | schema_RadioStation.gz (sample) |
http://schema.org/Recipe | Quads: 182,004,171 URLs: 4,410,628 Hosts: 11,753 | http://schema.org/Recipe (8,007,856)http://schema.org/Person (2,732,926)http://schema.org/AggregateRating (2,641,258)http://schema.org/Comment (2,080,323)http://schema.org/NutritionInformation (780,104) | 5 GB | schema_Recipe.gz (sample) |
http://schema.org/Restaurant | Quads: 78,085,247 URLs: 1,445,902 Hosts: 14,198 | http://schema.org/Restaurant (3,272,390)http://schema.org/ImageObject (1,769,750)http://schema.org/PostalAddress (1,754,253)http://schema.org/Review (1,722,032)http://schema.org/Rating (1,440,898) | 1.6 GB | schema_Restaurant.gz (sample) |
http://schema.org/RiverBodyOfWater | Quads: 154,418 URLs: 1,328 Hosts: 11 | http://schema.org/PostalAddress (7,739)http://schema.org/GeoCoordinates (7,653)http://schema.org/RiverBodyOfWater (3,066)http://schema.org/City (908)http://schema.org/LakeBodyOfWater (565) | 2.4 MB | schema_RiverBodyOfWater.gz (sample) |
http://schema.org/School | Quads: 3,829,478 URLs: 92,617 Hosts: 297 | http://schema.org/School (216,514)http://schema.org/Review (134,069)http://schema.org/Person (126,790)http://schema.org/Rating (125,541)http://schema.org/ListItem (82,540) | 97.1 MB | schema_School.gz (sample) |
http://schema.org/ShoppingCenter | Quads: 1,469,275 URLs: 21,806 Hosts: 133 | http://schema.org/ShoppingCenter (50,571)http://schema.org/PostalAddress (44,224)http://schema.org/ListItem (26,238)http://schema.org/ImageObject (24,613)http://schema.org/GeoCoordinates (20,144) | 23.6 MB | schema_ShoppingCenter.gz (sample) |
http://schema.org/SkiResort | Quads: 199,460 URLs: 18,365 Hosts: 33 | http://schema.org/SkiResort (18,879)http://schema.org/AggregateRating (6,076)http://schema.org/Review (4,867)http://schema.org/Person (4,859)http://schema.org/PostalAddress (2,279) | 6.9 MB | schema_SkiResort.gz (sample) |
http://schema.org/SportsEvent | Quads: 120,489,089 URLs: 963,965 Hosts: 540 | http://schema.org/SportsEvent (13,221,448)http://schema.org/SportsTeam (2,629,759)http://schema.org/Article (999,321)http://schema.org/Place (753,869)http://schema.org/PostalAddress (448,505) | 1.9 GB | schema_SportsEvent.gz (sample) |
http://schema.org/SportsTeam | Quads: 42,213,715 URLs: 544,020 Hosts: 449 | http://schema.org/SportsTeam (3,357,128)http://schema.org/SportsEvent (1,351,172)http://schema.org/Person (1,217,202)http://schema.org/Article (1,086,876)http://schema.org/SiteNavigationElement (254,801) | 792.8 MB | schema_SportsTeam.gz (sample) |
http://schema.org/StadiumOrArena | Quads: 3,017,261 URLs: 39,171 Hosts: 70 | http://schema.org/Person (380,954)http://schema.org/PostalAddress (85,646)http://schema.org/SportsTeam (77,401)http://schema.org/StadiumOrArena (75,385)http://schema.org/SportsEvent (52,791) | 46 MB | schema_StadiumOrArena.gz (sample) |
http://schema.org/TVEpisode | Quads: 101,723,242 URLs: 1,478,776 Hosts: 273 | http://schema.org/TVEpisode (9,709,117)http://schema.org/Person (2,511,534)http://schema.org/TVSeries (1,269,228)http://schema.org/Review (1,067,343)http://schema.org/TVSeason (865,047) | 2.1 GB | schema_TVEpisode.gz (sample) |
http://schema.org/TelevisionStation | Quads: 371,960 URLs: 10,239 Hosts: 27 | http://schema.org/TelevisionStation (29,541)http://schema.org/Article (7,186)http://schema.org/AggregateRating (6,769)http://schema.org/PostalAddress (2,389)http://schema.org/Offer (1,873) | 6.9 MB | schema_TelevisionStation.gz (sample) |
In case you are interested in a particular class or set of classes which is not listed above, please get in contact with the WebDataCommons team via Mailing List or our Google Group.
The source code can be checked out from our Github repository. For more information about the framework and a detailed description how to run a own extraction visit the framework page.
Please send questions and feedback to the Web Data Commons mailing list or post them in our Web Data Commons Google Group.