Need advice for creating Usenet indexer site

  • Guest, it's time once again for the hotly contested and exciting FoH Asshat Tournament!



    Go here and fill out your bracket!
    Who's been the biggest Asshat in the last year? Once again, only you can decide!

ShakyJake

<Donor>
7,597
19,181
I want to create a website that's essentially a usenet indexer for movies.

As an example, most movie and TV-show sites (like Nzbs.org and others) have Title, Actor, Director, Genre, etc. etc. filters. Plus, nice little coverart and scene clips.

My question is, which technologies do I need to learn to create something like the Nzbs.org? Like, how does one effectively access external resources to query for movie and actor info to build up one's database? Where does the coverart and screenclips come from? (they do not appear to be part of the newsgroup archive posting).

As for my own capabilities, I'm pretty well practiced in C# and confident I can create a website using ASP.NET with a MSSQL database. But I'm real fuzzy on how to collect information from outside resources for actor, directory, and movie details.

Now, someone is going to say "you'll never create something as nice as existing sites" -- and that's probably true. But this is more of personal project i want to do to help in learning HTML5/CCS3/JS, plus other various web technologies. It'll never be something "commercial" (not to mention it's illegal anyway so won't be public).
 

Psypher_sl

shitlord
83
0
These are just some examples, but for movies you'd be querying imdb.com and for TV shows thetvdb.com. You can get all the meta data for movies and tv shows from there. There are other sites as well, but these are the top 2. You'll be dealing with usenet headers, so you'll need to become familiar with the protocol. You'll also be working a lot with regex. That's about all I'll tell you, but should be enough to get you pointed in the right direction
smile.png
Goodluck!
 

ShakyJake

<Donor>
7,597
19,181
These are just some examples, but for movies you'd be querying imdb.com and for TV shows thetvdb.com. You can get all the meta data for movies and tv shows from there. There are other sites as well, but these are the top 2. You'll be dealing with usenet headers, so you'll need to become familiar with the protocol. You'll also be working a lot with regex. That's about all I'll tell you, but should be enough to get you pointed in the right direction
smile.png
Goodluck!
After thinking some more, I believe what I truly want to do just simply query sites for the data. Nzbs.org, for example, has an API for getting NZB data so I'd actually do that rather than indexing Usenet newsgroups myself (not to mention the storage requirements for such thing is massive from what I understand).

So, having said that, where do I get information on querying sites like Imdb, etc? Are there standard APIs for this? IF so, what are they called? What are the common libraries used for this? etc.
 

LennyLenard_sl

shitlord
195
1
After thinking some more, I believe what I truly want to do just simply query sites for the data. Nzbs.org, for example, has an API for getting NZB data so I'd actually do that rather than indexing Usenet newsgroups myself (not to mention the storage requirements for such thing is massive from what I understand).

So, having said that, where do I get information on querying sites like Imdb, etc? Are there standard APIs for this? IF so, what are they called? What are the common libraries used for this? etc.
Depends on a site by site basis. In the particular case of IMDB,Stack Overflowhad this question already answered. You can also look at the IMDB for some limitations on how they allow the site to be used vialicensing page. They provide the data itself, available inplain text.
 

ShakyJake

<Donor>
7,597
19,181
Yeah, I'm just going to pull NZB info off of Nzbs.org and process the .nfo files. I managed to scrape data off of a web site's actor page using HtmlAgilityPack for C#. My original idea was to use a search service (i.e. Google) for actor data and try to parse it, but that would be too difficult. Just going to target a specific site or sites for now.

So far looking good. Got the database built and can gather actor data.
 

a_skeleton_03

<Banned>
29,948
29,762
Yeah, I'm just going to pull NZB info off of Nzbs.org and process the .nfo files. I managed to scrape data off of a web site's actor page using HtmlAgilityPack for C#. My original idea was to use a search service (i.e. Google) for actor data and try to parse it, but that would be too difficult. Just going to target a specific site or sites for now.

So far looking good. Got the database built and can gather actor data.
Just curious why you are starting from scratch when you could implement a lot of what NewzNab does and all their stuff is open source iirc. Why reinvent the wheel?
biggrin.png
 

ShakyJake

<Donor>
7,597
19,181
Just curious why you are starting from scratch when you could implement a lot of what NewzNab does and all their stuff is open source iirc. Why reinvent the wheel?
biggrin.png
To practice the various Microsoft technologies (C#, ASP.NET, EntityFramework, MSSQL, etc). Newznab is written in PHP, I believe.
 

a_skeleton_03

<Banned>
29,948
29,762
To practice the various Microsoft technologies (C#, ASP.NET, EntityFramework, MSSQL, etc). Newznab is written in PHP, I believe.
I think they are PHP and Python, you could pull a lot of their API calls out of it though.

I worked with it a lot in the past from a non coding aspect and it has all the functionality it sounds like you want.