Scraping data from news/social media sites?

Sanrith Descartes

Veteran of a thousand threadban wars
<Aristocrat╭ರ_•́>
41,471
107,524
Anyone have a good guide to set up scraping apps to hit news/social media sites?
 
  • 1Like
Reactions: 1 user

ExPatriot

This is the end...
<Gold Donor>
442
1,268
Is this for lead generation / market analysis or are you just aggravating information to fuck with liberals?
 

Mist

Eeyore Enthusiast
<Gold Donor>
30,410
22,190

A friend of mine wanted to scrape every biomedical device patent off the FDA website. Took me like 2 hours to teach myself how to do it.
 
  • 3Like
Reactions: 2 users

ExPatriot

This is the end...
<Gold Donor>
442
1,268
I am a news junkie and am just too damn busy these days to keep up.
Gotchu...

I have a friend in the Web ad business and he talks about this stuff all the time so I have picked up a bit of understanding...

Will be interested to hear what you end with for a solution!
 
  • 1Like
Reactions: 1 user

Bandwagon

Kolohe
<Silver Donator>
22,726
59,583
Is there someone that can help me scrap all the tutorials/blog posts/whatever off of this website? Supposedly they're taking all of the terrasolid documents offline in about 2 weeks and I was hoping there was an easy way to have all of them stored?

 

ShakyJake

<Donor>
7,627
19,256
Is there someone that can help me scrap all the tutorials/blog posts/whatever off of this website? Supposedly they're taking all of the terrasolid documents offline in about 2 weeks and I was hoping there was an easy way to have all of them stored?

There used to be a way to make a website available offline. Could you do that instead? Is this all static content?
 

Bandwagon

Kolohe
<Silver Donator>
22,726
59,583
  • 1Like
Reactions: 1 user

Haus

<Silver Donator>
11,043
41,724
Depending on how you go about analyzing and what use you're going to do with it, have you considered a tool that simply crawls the site and stores a local copy for "offline viewing" then go at it from that front? I'm looking at a similar idea because I want to keep an offline archive of certain sites to use with an LLM model I'm working on as it's "knowledge store".

(I chime in almost a year later...)
 
  • 1Like
Reactions: 1 user