Scraping data from news/social media sites?

Sanrith Descartes

You have insufficient privileges to reply here.
<Aristocrat╭ರ_•́>
43,971
117,670
Anyone have a good guide to set up scraping apps to hit news/social media sites?
 
  • 1Like
Reactions: 1 user

Sanrith Descartes

You have insufficient privileges to reply here.
<Aristocrat╭ರ_•́>
43,971
117,670
Are you looking to scrape the HTML?
I have zero experience with this. Looking to automatically aggregate news from specific news sites and personalities' posts from social media.
 
  • 1Like
Reactions: 1 user

ExPatriot

This is the end...
<Gold Donor>
678
2,051
Is this for lead generation / market analysis or are you just aggravating information to fuck with liberals?
 

Mist

REEEEeyore
<Gold Donor>
31,084
23,420

A friend of mine wanted to scrape every biomedical device patent off the FDA website. Took me like 2 hours to teach myself how to do it.
 
  • 3Like
Reactions: 2 users

ExPatriot

This is the end...
<Gold Donor>
678
2,051
I am a news junkie and am just too damn busy these days to keep up.
Gotchu...

I have a friend in the Web ad business and he talks about this stuff all the time so I have picked up a bit of understanding...

Will be interested to hear what you end with for a solution!
 
  • 1Like
Reactions: 1 user

Bandwagon

Kolohe
<Silver Donator>
24,134
64,566
Is there someone that can help me scrap all the tutorials/blog posts/whatever off of this website? Supposedly they're taking all of the terrasolid documents offline in about 2 weeks and I was hoping there was an easy way to have all of them stored?

 

ShakyJake

<Donor>
7,884
19,826
Is there someone that can help me scrap all the tutorials/blog posts/whatever off of this website? Supposedly they're taking all of the terrasolid documents offline in about 2 weeks and I was hoping there was an easy way to have all of them stored?

There used to be a way to make a website available offline. Could you do that instead? Is this all static content?
 

Bandwagon

Kolohe
<Silver Donator>
24,134
64,566
  • 1Like
Reactions: 1 user

Haus

<Silver Donator>
12,578
48,907
Depending on how you go about analyzing and what use you're going to do with it, have you considered a tool that simply crawls the site and stores a local copy for "offline viewing" then go at it from that front? I'm looking at a similar idea because I want to keep an offline archive of certain sites to use with an LLM model I'm working on as it's "knowledge store".

(I chime in almost a year later...)
 
  • 1Like
Reactions: 1 user