Over 1.2 trillion Google searches are done per year; that’s more than 40,000 searches per second. It’s so easy to type in a few keywords or a question when searching the web. Despite the magnificence of Google and other search engines, they cover only 4% (about 8 billion pages) of the world wide web; the additional 96% goes way beyond Google…
The most significant portion of the world wide web, the invisible web, is inaccessible by search engines like Google. It’s also called the deep web. Before we show you how to research beyond Google, let’s first answer questions you’re probably asking at this point. Is the deep web also the dark web? Is it legal to access the Deep Web?
The Deep Web and the Dark Web
Search engines like Google, Internet Explorer, Safari, and Bing use web crawlers and AI bots that follow hyperlinks. The deep web is content that these search engines can’t access. Examples of the invisible internet are encrypted sites with password protection (like emails, social media profiles, subscription sites); database queries; unpublished posts; and files stored on cloud storage platforms like Dropbox.
In other words, the deep web is the aggregation of unindexed sites.
The Dark Web is the part of the Deep Web that is deliberately hidden from search engines. They conceal their DNS and IP addresses that identify the site. Dark websites are visible to the public, but it’s almost impossible to find out who or where its hosted. According to Andy Greenberg, the Dark Web is less than 0.1% of the Deep Web. While many criminals hide their activities from the light on the Deep Web, there are legitimate uses as well.
The Onion Router (TOR), Freenet, or IP2 are software that allows people to stay anonymous while searching the Deep Web and the Dark Web. Police, journalists, whistleblowers, and the military use these browsers to remain anonymous; so does the black market and illegal sites. Using the software to access the Deep Web is legal but risky because the Dark Web users also use TOR and could hack users to find out more about them.
Why Research Beyond Google?
If the visible web content represents only a fraction of what is on the web, it’s evident that a comprehensive research exercise would include research beyond Google. The fact that it’s on the internet doesn‘t make it a reliable source. Everybody can publish on the web, making it difficult sometimes to discern between truths, biased opinions, and untruths.
Many authoritative resources, however, require subscription and are therefore not visible to Google. They are part of the invisible web because their content is hidden from search engines. Ironic as it may seem, the Deep Web is where good content is waiting to be discovered. Sometimes the home page is available, but the actual data is sitting, hidden, waiting for you to discover it.
To help students and teachers, we’ve created a list of resources that may help with surfing the invisible web.
Search Engines for The Deep Web
There isn’t a single search engine that covers the Deep Web. There are, however, quality search engines that specialize in mining the invisible web for quality results.
DirectSearch provides internet listings of the invisible web. Topic listings include health, medical, libraries, weather, and search engine listings. DirectSearch is still very helpful although it’s no longer updated.
Yippy is a metasearch engine that promises quick data searches using its clustering engines to mine relevant data safely and securely.
Internet Archive is a non-profit digital library that has been archiving the internet since 1996. Access is free for students, teachers, and the public. The library provides access to 330 billion webpages, 20 million books and audios, images, videos, and software programs. As a library, Internet Archive scans over 1,000 books daily.
Tim Berners-Lee, the creator of HTML and the web, started The WWW Virtual Library. Run by a loose confederation of volunteers who are experts in their fields, it is recognized as a high-quality index of sections of the web.
Art and Museums
Most museums have their database. Here are some of the more popular and known museum online databases.
Founded by Congress in 1962, the National Portrait Gallery displays with visual arts, performing arts, and new media the men and women who shaped America. The “America’s Presidents” exhibition is the only complete portrait gallery of America’s presidents published outside the White House. Their mission is to tell the story of America with portraits, including the famous and infamous, villains and heroes, actors and scientists, and visionaries.
Home to Leonardo Da Vinci’s famous Mona Lisa painting, the Louvre Museum presents works from ancient civilizations, the Middle Ages, and Western art. One of the oldest museums in the world, the collections are divided into eight departments, each with its database.
The online Guggenheim Museum features over 1,700 works of 625 artists from their permanent collection from the late 19th century to the present. Collection Online includes selected works from the New York, Venice, and Bilbao museums that are searchable by artist, date, medium, movement, and venue.
The National Gallery of Art offers online a variety of programs and resources for the classroom. Teachers will find lessons, images, ideas, and activities to integrate into their lessons. World leading art historians present in-depth research information on the collections.
Who would have thought years ago that classic authors like Jane Austin and Fyodor Dostoevsky would be available online? Here are sites where readers download classics and other books for free.
The free online literature at Bibliomania includes over 2,000 Classic texts. Students and teachers may find the Study Section helpful. Written by Oxford and Cambridge graduates, the study guides offer summaries, commentaries, and discussions on writings around the world. Teachers may access and share resources through the Teacher Resources’ forum.
The Online Books Page was founded in 1993 when digital library planner John Mark Ockerbloom was still a student. The index includes over 3 million listings that fulfilled specific criteria. Books are searched by author, title, subject, new listings, or serial.
Michael Hart, the inventor of eBooks in 1971, founded Project Gutenberg, the first provider of free eBooks. The listings contain over 59,000 free books to read on eBook readers or Web browsers. Browse for publications by author, title, language, or recently posted.
The National Academy Press publishes reports from the National Academies of Science, Engineering, and Medicine, a non-profit organization that provides expert advice on challenges the world encounters. Most of the 8,500 digital titles in PDF format are free to download.
Hathi Trust is a collaboration of universities, research institutions, and libraries that offer millions of books and journal access from libraries around the world. Their goal is to preserve cultural records for future generations. The Hathi Trust Research center help researchers with the technical challenges they face when researching a large amount of text.
Researching science, technology, engineering, and math (STEM) subjects may require access to specialty search engines. Here is a list of STEM–related databases and search engines.
Science Research is a deep-web search engine that uses state-of-the–art federated technology. In real time it searches and compiles a page with results from over 300 authoritative science and technology collections. It removes duplicate results and weighs the relevance of each result to how closely it matches the phrase and keywords you typed.
Science.gov is a gateway to U.S. government research and development results. It gives access to over 200 million pages of science and technology information from 13 federal agencies. Federal sponsored STEM Opportunities for graduates and undergraduates to range from scholarships, research internships to graduate fellowships.
USGS produces digital geological and topographic maps of the U.S. Map topics include biology and ecosystem, coasts, the impact of energy resources, environmental health, land resources, natural hazards, planetary science, water, and geology. Over 100,000 scientific publications, multimedia gallery, and software tools are also available to provide quality science information to the public. The USGS Library is globally one of the largest earth and natural science libraries.
IEEE is deemed the trusted voice for engineering, technology, and computing information globally. IEEE Digital Library contains over 4.5 million documents with science and technology content.
IRIS, a consortium of over 120 Universities, provides a wide range of education and resources to create awareness and understanding of seismology. Videos, animations, lessons, and seismographs help teachers to teach about earthquakes. Students can explore earthquakes in real time with the seismic monitor, earthquake browser, wave visualizations, and the rapid earthquake monitor.
Britannica Digital Learning is a division of Encyclopedia Britannica that has been providing factual information to students, teachers, and the public for over 250 years. About 25 million students and 1.6 million teachers use the resources provided by Britannica Digital Learning. Britannica School Insights help students with their research and with fact–checking information found on the web. Other products are Britannica Academic and Britannica School that provides teachers with classroom material.
ArXiv offers access to over 700,000 research papers on STEM-related topics and more. Maintained by Cornell University the electronic archive and distribution server covers research on a variety of subjects including physics, mathematics, statistics, computer science, nonlinear sciences, systems science, economics, quantitative biology, quantitative finance, statistics, and electrical engineering.
Specialized Search Engines and Databases
Here is a list of some of the invisible web search engines and databases that focus on a specific field, industry, subject or category.
World Wide Science is a multilingual global science gateway that provides information science discoveries and progress around the world. A real-time search of national and international scientific databases provides the latest scientific data.
Plants Database is a comprehensive database that provides information about the United States mosses, vascular plants, hornworts, liverworts, and lichens. In addition to plant names, symbols, characteristics, crop information, and images, the database also supplies web links and references for further reading.
Find Sounds is a search engine dedicated to finding sound effects on the web. Obscenities are filtered out, making the site safe for children to use. Search for sounds or sound effects by typing the name of the sound or search by category.
Making of America is a digital library with 10,000 books and 50,000 articles that focuses on the development of American infrastructure. The American social history resources range from the antebellum period through reconstruction.
Yummly is a food platform with over 2 million recipes. Their goal is to be the smartest and most helpful by assisting people in what to eat based on their personal preferences. The search filter enables you to narrow the search to cook time, cuisine, occasion, allergies, and nutrition.
Voice of the Shuttle has over 70 pages of links to humanities and related resources. Started by Alan Liu in 1994 at the University of California, Santa Barbara, the static website became a database of humanities, literature, and culture studies for public use.
Fold3 is a database of original military records that include documents, photos, and stories of the men and women who served in the military. Search military records by war (from Revolutionary War to present) or search by name, place, date, or documents.
MedNar is a free deep-web search engine that provides medical information in real time. Filter the search by choosing a collection or category. The advanced search features allow searching by title, author, or year. It’s possible to save the research results in folders and to receive emails with new information automatically.
Pipl is globally the largest search engine to find the real people behind their online identities. Search by using a variety of parameters like name, address, or phone number. Identity resolution technology uses online and offline public sources to gather personal and professional information of a person.