Web crawlers, Regex for Markdown URLs, and Removing your site from Google search results

April 18, 2016
Category: TIL
Tags: Resources and Regex

Today I learned:

Web Crawlers

Need a web crawler but don’t want to write one?

Getting pages removed from Google cache

Have an old site that you need to keep live but don’t want the results to show on Google searches? Here are a few things you need to do:

  1. Change the robots.txt or password protect your site to prevent search engines from indexing.
  2. Log in to Google Webmaster Tools and submit the site to the URL Removal tool.
  3. Finish what you need the site up for ASAP and take it offline.

This matches the links above:

  • Search: ([\w\S]*[mo7b\/])$
  • Replace: [\1](\1)