Scraping data from news/social media sites?

Sanrith Descartes

Veteran of a thousand threadban wars
<Aristocrat╭ರ_•́>
41,469
107,524
Anyone have a good guide to set up scraping apps to hit news/social media sites?
 
  • 1Like
Reactions: 1 user

ExPatriot

This is the end...
<Gold Donor>
442
1,268
Is this for lead generation / market analysis or are you just aggravating information to fuck with liberals?
 

Mist

Eeyore Enthusiast
<Gold Donor>
30,410
22,189

A friend of mine wanted to scrape every biomedical device patent off the FDA website. Took me like 2 hours to teach myself how to do it.
 
  • 3Like
Reactions: 2 users

ExPatriot

This is the end...
<Gold Donor>
442
1,268
I am a news junkie and am just too damn busy these days to keep up.
Gotchu...

I have a friend in the Web ad business and he talks about this stuff all the time so I have picked up a bit of understanding...

Will be interested to hear what you end with for a solution!
 
  • 1Like
Reactions: 1 user

Bandwagon

Kolohe
<Silver Donator>
22,725
59,582
Is there someone that can help me scrap all the tutorials/blog posts/whatever off of this website? Supposedly they're taking all of the terrasolid documents offline in about 2 weeks and I was hoping there was an easy way to have all of them stored?

 

ShakyJake

<Donor>
7,627
19,255
Is there someone that can help me scrap all the tutorials/blog posts/whatever off of this website? Supposedly they're taking all of the terrasolid documents offline in about 2 weeks and I was hoping there was an easy way to have all of them stored?

There used to be a way to make a website available offline. Could you do that instead? Is this all static content?
 

Bandwagon

Kolohe
<Silver Donator>
22,725
59,582
  • 1Like
Reactions: 1 user

Haus

<Silver Donator>
11,043
41,724
Depending on how you go about analyzing and what use you're going to do with it, have you considered a tool that simply crawls the site and stores a local copy for "offline viewing" then go at it from that front? I'm looking at a similar idea because I want to keep an offline archive of certain sites to use with an LLM model I'm working on as it's "knowledge store".

(I chime in almost a year later...)
 
  • 1Like
Reactions: 1 user