7. Trapping Topic-Based Information
Monitoring General Search Engines
General search engines are large, unstructured, and generally chaotic. This makes trapping them very difficult. However, since they are the bulk of the foundations of Internet search, many people make information available through search engines that would otherwise be unavailable through RSS or special search categories. Getting the perfect query and actually trapping from search engines can be difficult though.
Google
- Google has stopped counting the number of pages it indexes, but the last time it provided a count it contained over 8 billion pages.
- Google’s search term limit is 32 terms – this limit should be pushed when building your query.
- Start building your search queries with as many terms as possible, and then narrow down later if you must. Use many syntaxes but avoid using inurl.
- To actually begin trapping, use Google Alerts or Google Alert, which we discussed before
Yahoo
- Yahoo is much more similar to a full text search engine than the search directory it began as.
- As with Google, use as much syntax and search terms in your queries as possible.
- Yahoo allows you to narrow your results through methods such as searching for results only in Creative Commons (a “some rights reserved” alternative to copyright, “all rights reserved” content) content.
- You can also search through Yahoo’s subscription based content.
- To actually begin trapping,
- Yahoo does not offer alerts
- Set Yahoo’s preferences for 100 results
- Run your search
- Monitor the page of results (www.WatchThatPage.com)
- The monitored page will look for changes in the 100s of results.
- Yahoo also allows your to create an RSS feed of results, although it is difficult
Yahoo Directory
- Yahoo Directory is a searchable subject index.
- For most directories, don’t bother building intricate queries, rather just use the directory method to narrow in on the subject you are interested in.
- In order to trap this information, check if Yahoo has an RSS feed for that category. If not you can always use a page monitor.
Ask
- To start building your search queries, go to ask.com/webadvanced.
- This page contains ways to narrow down your searches to various geographic locations, the last time pages were updated, which words must/must not appear, and so forth.
- You can use “should search” to include words that should appear, but doesn’t eliminate queries that don’t have those words.
- Ask doesn’t offer RSS or e-mail alert options, so your only option is to follow the web page monitoring technique outlined above.
Microsoft Live Search
- Microsoft is working hard to build a good search engine for trapping.
- To build your queries, consider these syntaxes:
- Contains – Looks for pages containing certain file types
- Intitle – Looks for words in titles
- Inbody – Looks only for words in the page body
- Link – Finds pages that link to that URL
- Linkfromdomain – Finds those links that are coming from a specified domain
- Prefer – Exactly the same as Ask’s “Should Search” feature
- Trapping is simple, add &format=rss to the end of the URL of any URL of search results to get it in RSS format.
Numerous other search engines can be monitored, including The Open Directory Project (through the directory method of narrowing and through web page monitors only).
Monitoring News Search Engines
News Search Engines are less necessary to monitor as General Search Engines because they are less large/complex. The following News Search Engines can be monitored:
- Yahoo News
- Google News
- MSNBC
- FindArticles
- HighBeam Library
- Hoovers
- Nothern Light
Hoovers offers saved searches and email alerts, but is expensive. The best overall option would appear to be Google News, which is free and also makes use of both the Email Alerts and RSS trapping options.
Searching Blogs
We have covered the Blog Search options:
- Feedster
- IceRocket Blogs
- Blogdigger
- Google Blog Search
- Sphere (new!)
All offer RSS feeds.
Keyword-Searchable RSS Feeds
These are feeds based on searches on the query words that you specify. These are much more specific than RSS feeds.
Kebberfegg
Sets up keyword based RSS feeds across many resources – over 3 dozens. You can generate feeds in HTML or OPML for a RSS feed reader.
- Enter in the query box the words you want to search
- Generate a keyword feed list for one category, or select multiple categories
- Choose HTML or OPML
- For each result you can add it to My Yahoo, forward it to your email via RSSFwd, or just look at the plain RSS feed
More Sources
Commercial
You can also trap information from commercial websites
- Amazon.com
- eBay
- To trap, find the bottom of your search query results page for the Tools line and then find the orange RSS icon.
Government
You can also trap information from government websites
- City sites
- Search like as follows, (“city of Springfield” Missouri)
- If no RSS feed or email alerts can be set up, try to find a “What’s New” page and set up a web page monitor
- State sites
- Use the following method to find sites
- Enter state.xx.us into your browser, where xx is the postal code of the state you are interested in**
- If no RSS feed or email alerts can be set up, try to find a “What’s New” page or a “Press Release” page to set up a web page monitor
- FirstGov
- firstgov.gov is an effort by the U.S. government to create a portal of easily accessible government info.
- The page includes an RSS Feed Page, an A-Z agency list, and a search engine.
- Informational websites can also be monitored in a similar manner to city and state websites
Questions:
- Which general web search engine, Google or Yahoo, offers email alerts?
- Explain the general idea behind Ask’s “Should Search” and Microsoft Live’s “Prefer”
- Which engines are generally easier to monitor, General Search Engines, or News Search Engines?
- Which News Search Engine could generally be considered to offer the widest variety of trapping options?
- Name three different blog search options.
- Which website would one use to trap information from Amazon search queries?
- How would one go about trapping information from an Ebay search query?
- Explain the syntax used to access a state’s webpage from your web browser.