Is it legal to do web scraping?
That depends on what you scrape and how you use scraped material. We are not legally oriented so for accountable legal advice please seek an attorney.
Many big web companies / university institutes scrape for content / data by advice of law consultancy.
How does copyright law affect web scraping? Are facts such as names, addresses, prices, etc copyrightable?
In most cases, no. Copyright law has a strict constitutional requirement for originality. The author of a copyrightable work must have created it. A fact is not created by an author even if they are the first to record it. In that case, they are not the creator of fact, they are merely the discoverer of fact.
I want to extract content from a website for internal analysis purposes. None of the content will be republished. Is there anything the website owner can do to stop me?
Yes, but not with copyright law. Basically if a website owner does not want you to access their site, they can ask you to stop. If you do not comply, they can invoke Trespass to Chattel as in the eBay v. Bidder's Edge case of 2000.
Are compilations of facts such as phone directories copyrightable?
Yes, but very thinly. Facts themselves are not copyrightable, and compilations can only be copyrighted for the originality of their selection and arrangement. That said, of course, if the compilation has elements in it (such as news articles) that are not facts and are indeed copyrightable themselves then those elements must be left out. The court specifically denies "sweat of brow" arguments relating to the efforts of the compiler in the 1976 Copyright act. For a complete discussion of this topic see Feist Publications v. Rural Telephone Service.
Some scraping activities do arouse legal issues.
Actually, many amateur scrapers do it in an illegal way, such as scraping copyrighted articles or private data that's restricted from being distributed. Even paragraphs or just a few lines of them can make you qualified for a court summons or DMCA penalty by Google.
However, scraping and keeping these data for private personal uses only, you may be absolved of any copyright guilt.
So I'll keep away from copyright infringements, big deal!
Even if you are innocent of copyright infringements by scraping only non-copyrighted / public domain works, you may still be held liable for committing trespass to chattels.
What is safe to scrape then? I don't want any trouble.
With straight facts that cannot be copyrighted, web scraping of such content is legal in the vast majority of cases as long as the data is broken down and not in the original arrangement. For instance, business information directory may be copyrighted as a database but not so as broken pieces because they are facts.
Other than that, public domain works and non-copyrighted material (such as works by people who have been dead for more than 70 years and those released by the government) can be scraped and used legally in any way possible.