Scrapy

Scrapy
Developer(s)	Zyte (formerly Scrapinghub)
Initial release	26 June 2008
Written in	Python
Operating system	Windows, macOS, Linux
Type	Web crawler
License	BSD License

Short description: Python web-crawling framework

Scrapy (/ˈskreɪpaɪ/^[1] SKRAY-peye) is a free and open-source web-crawling framework written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler.^[2] It is currently maintained by Zyte (formerly Scrapinghub), a web-scraping development and services company.

Scrapy project architecture is built around "spiders", which are self-contained crawlers that are given a set of instructions. Following the spirit of other don't repeat yourself frameworks, such as Django,^[3] it makes it easier to build and scale large crawling projects by allowing developers to reuse their code.

Some well-known companies and products using Scrapy are: Lyst,^[4]^[5] Parse.ly,^[6] Sayone Technologies,^[7] Sciences Po Medialab,^[8] Data.gov.uk’s World Government Data site.^[9]

History

Scrapy was born at London-based web-aggregation and e-commerce company Mydeco, where it was developed and maintained by employees of Mydeco and Insophia (a web-consulting company based in Montevideo, Uruguay). The first public release was in August 2008 under the BSD license, with a milestone 1.0 release happening in June 2015.^[10] In 2011, Zyte (formerly Scrapinghub) became the new official maintainer.^[11]^[12]

References

↑ Commit 975f150
↑ Scrapy at a glance.
↑ "Frequently Asked Questions" (in en-US). http://doc.scrapy.org/en/latest/faq.html#did-scrapy-steal-x-from-django.
↑ Bell, Eddie; Heusser, Jonathan. "Scalable Scraping Using Machine Learning". http://talks.lystit.com/dsl-scraping-presentation/#/4.
↑ Scrapy | Companies using Scrapy
↑ Montalenti, Andrew (October 27, 2012). "Web Crawling & Metadata Extraction in Python" (in en-US). https://speakerdeck.com/amontalenti/web-crawling-and-metadata-extraction-in-python.
↑ "Scrapy Companies". https://scrapy.org/companies/.
↑ Hyphe v0.0.0: the first release of our new webcrawler is out!
↑ Ben Firshman [@bfirsh] (21 January 2010). "World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords bit.ly/5jU3La #opendata #datastore". https://twitter.com/bfirsh/status/8025368963.
↑ Medina, Julia (19 June 2015). "Scrapy 1.0 official release out!". scrapy-users (Mailing list).
↑ Hoffman, Pablo (2013) (in en-US). List of the primary authors & contributors. https://github.com/scrapy/scrapy/blob/master/AUTHORS. Retrieved 18 November 2013.
↑ Interview Scraping Hub.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Scrapy. Read more

[1] Commit 975f150

[2] Scrapy at a glance.

[3] "Frequently Asked Questions" (in en-US). http://doc.scrapy.org/en/latest/faq.html#did-scrapy-steal-x-from-django.

[4] Bell, Eddie; Heusser, Jonathan. "Scalable Scraping Using Machine Learning". http://talks.lystit.com/dsl-scraping-presentation/#/4.

[5] Scrapy | Companies using Scrapy

[6] Montalenti, Andrew (October 27, 2012). "Web Crawling & Metadata Extraction in Python" (in en-US). https://speakerdeck.com/amontalenti/web-crawling-and-metadata-extraction-in-python.

[7] "Scrapy Companies". https://scrapy.org/companies/.

[8] Hyphe v0.0.0: the first release of our new webcrawler is out!

[9] Ben Firshman [@bfirsh] (21 January 2010). "World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords bit.ly/5jU3La #opendata #datastore". https://twitter.com/bfirsh/status/8025368963.

[10] Medina, Julia (19 June 2015). "Scrapy 1.0 official release out!". scrapy-users (Mailing list).

[list-11] Hoffman, Pablo (2013) (in en-US). List of the primary authors & contributors. https://github.com/scrapy/scrapy/blob/master/AUTHORS. Retrieved 18 November 2013.

[12] Interview Scraping Hub.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

Scrapy

Topic: Software

History

References