Robots?

  • Thread starter Deleted member 131833
  • Start date
Page may contain affiliate links. Please see terms for details.
D

Deleted member 131833

Guest
can someone explain what a robot is as in people logged onto site, guests, visitors and robots

Are we being watched?:rolleyes:
 
You will no doubt get a more technical response but it's something like automatic scanning by search engines - I think.
 
I assume you refer to sites where you are asked to complete a "captcha"?

If indeed it is that, then it is purely a mechanism to stop automated (usually computer generated) logins more commonly referred to as "bots". Yes you are being watched all the time, never assume that you are not. I am often surprised just how many people are surprised at what others either know, or can find out about them, within seconds and little effort, just by sitting down in front of the keyboard. This despite numerous warnings being generated. Your right to privacy is non-existent when you step onto the internet, or your companies computer system(s). This is usually most evident when people apply for a job. The very first task most companies now undertake on receipt of your CV is to send it out to a paid electronic screening agency. Many are dismayed to find that their prospective employer is now aware about their party antics and mahoosive alcohol intake. Many never figure out why they just can't seem to get an invite to an actual interview ;^)

Person (a) sends off a CV with their accurate details.
Person (b) creates a CV profile that has no electronic footprint.

Person (b) gets an invite..
 
Robot or 'Bots are automated programmes that surf the web capturing web page data for caching
 
If you use the Google Chrome web browser, type Sign in - Google Accounts into your Chrome web browser and hit enter and you can control your privacy settings and see your web history.


Robots (Bots) are programs that crewel all the page on the web and index pages that the find so you can find things when you do a search.


Dec
 
Google, Bing, Yandex and all the other search engines, plus government agents (looking for terrorist activity for example) and some private individuals :cool:, deploy software programmes called robots (located on computers attached to the Internet) who's job it is to index the web. Indexing is the process of searching all publicly accessible websites and then storing a list of everything that's been found, alongside a location of where it's located in the search engines' huge database(s).

Important and, or busy sites, can be indexed hundreds of times a day by any given robot, or in the case of smaller sites, blogs etc, might only be indexed every few days. The better the index, of which there can be hundreds describing any given data set, the more quickly an item can be found by folks typing search queries into search engines.

To achieve the above, robots are fed a list of web sites, and then navigate to each site in turn and try to find a pair of files called (iirc) robots.txt and sitemap.xml. These files, if found, contain instructions on what the site owner does or does not want to be indexed by the robot. Even if these files don't exist, the search engine robot will visit each home page and read any links on the home page (in text or navigation menus) and then read those pages in turn and any other pages linked to from those pages... and so on. Before you know it, every page on your website has been indexed simply because the robot followed all the links. Not all files or pages on a website are pointed to by links - hence why the sitemap file is used to instruct the robot where to find non obvious content.

But, lots of sites get caught out by unscrupulous robots by indicating where a page is on the site, followed by a no index request (don't add this to your index PLEASE!). The robot can decide whether to obey the noindex request, or do the complete opposite, sometimes unscrupulously listing these files first in searches, given the assumption they might be hiding something 'interesting'. Google etc doesn't do this, but sites are visited by hundreds of robots, not all of them with good intentions.

Additionally, Some sites use snippets, short pieces of structured code and content, which the robot can interpret as instructions on how to display content from a site, within the search results, without having to visit the site itself. These are most widely used for things like shopping - so for example when you search for a camera, Google can show who's selling it, whether it's in stock, the price and also ratings for the product and for the shop itself - as long as the website has gone to the trouble of 'marking up' their content using a set of predefined rules which the robot/search engine understands (defined at schema.org). Alongside product listings, things likes flight times, trains, cinemas, videos, etc, will all push their products and information into the engines in this way.

It's not just pages that are indexed - in fact anything that can be retrieved via a url (web address) can be found and indexed for use in search engine results. Incidentally, there's a big industry centred on something called SEO (search engine optimisation). Companies set themselves up as 'experts' and sell services to website owners promising to get them on page 1, or better, towards the top of page 1 of search results. They do this by adding key words and meta information (data describing data) to web pages, plus they tune robots.txt and sitemap.xml based on so called 'knowledge' of search engines indexing algorithms.

There's much more to it than I've set out above, but that's the basics.

To see what Google 'knows' about MBC, click the first link for pages and documents and the second for images.

Pages: site:forums.mbclub.co.uk

Images: site:forums.mbclub.co.uk

To try this yourself, goto Google, type site: followed by the address of a website, hit return/enter and you'll see what Google has indexed for that site. Click on the image tab to see the images.
 
Google, Bing, Yandex and all the other search engines, plus government agents (looking for terrorist activity for example) and some private individuals :cool:, deploy software programmes called robots (located on computers attached to the Internet) who's job it is to index the web. Indexing is the process of searching all publicly accessible websites and then storing a list of everything that's been found, alongside a location of where it's located in the search engines' huge database(s).

Important and, or busy sites, can be indexed hundreds of times a day by any given robot, or in the case of smaller sites, blogs etc, might only be indexed every few days. The better the index, of which there can be hundreds describing any given data set, the more quickly an item can be found by folks typing search queries into search engines.

To achieve the above, robots are fed a list of web sites, and then navigate to each site in turn and try to find a pair of files called (iirc) robots.txt and sitemap.xml. These files, if found, contain instructions on what the site owner does or does not want to be indexed by the robot. Even if these files don't exist, the search engine robot will visit each home page and read any links on the home page (in text or navigation menus) and then read those pages in turn and any other pages linked to from those pages... and so on. Before you know it, every page on your website has been indexed simply because the robot followed all the links. Not all files or pages on a website are pointed to by links - hence why the sitemap file is used to instruct the robot where to find non obvious content.

But, lots of sites get caught out by unscrupulous robots by indicating where a page is on the site, followed by a no index request (don't add this to your index PLEASE!). The robot can decide whether to obey the noindex request, or do the complete opposite, sometimes unscrupulously listing these files first in searches, given the assumption they might be hiding something 'interesting'. Google etc doesn't do this, but sites are visited by hundreds of robots, not all of them with good intentions.

Additionally, Some sites use snippets, short pieces of structured code and content, which the robot can interpret as instructions on how to display content from a site, within the search results, without having to visit the site itself. These are most widely used for things like shopping - so for example when you search for a camera, Google can show who's selling it, whether it's in stock, the price and also ratings for the product and for the shop itself - as long as the website has gone to the trouble of 'marking up' their content using a set of predefined rules which the robot/search engine understands (defined at schema.org). Alongside product listings, things likes flight times, trains, cinemas, videos, etc, will all push their products and information into the engines in this way.

It's not just pages that are indexed - in fact anything that can be retrieved via a url (web address) can be found and indexed for use in search engine results. Incidentally, there's a big industry centred on something called SEO (search engine optimisation). Companies set themselves up as 'experts' and sell services to website owners promising to get them on page 1, or better, towards the top of page 1 of search results. They do this by adding key words and meta information (data describing data) to web pages, plus they tune robots.txt and sitemap.xml based on so called 'knowledge' of search engines indexing algorithms.

There's much more to it than I've set out above, but that's the basics.

To see what Google 'knows' about MBC, click the first link for pages and documents and the second for images.

Pages: site:forums.mbclub.co.uk

Images: site:forums.mbclub.co.uk

To try this yourself, goto Google, type site: followed by the address of a website, hit return/enter and you'll see what Google has indexed for that site. Click on the image tab to see the images.

Not enough detail.
 
Ah, will search for immaculate 300SL for £100 and see what I get.

Thanks for a VERY detailed answer.
 
Google, Bing, Yandex and all the other search engines, plus government agents (looking for terrorist activity for example) and some private individuals :cool:, deploy software programmes called robots (located on computers attached to the Internet) who's job it is to index the web. Indexing is the process of searching all publicly accessible websites and then storing a list of everything that's been found, alongside a location of where it's located in the search engines' huge database(s).

Important and, or busy sites, can be indexed hundreds of times a day by any given robot, or in the case of smaller sites, blogs etc, might only be indexed every few days. The better the index, of which there can be hundreds describing any given data set, the more quickly an item can be found by folks typing search queries into search engines.

To achieve the above, robots are fed a list of web sites, and then navigate to each site in turn and try to find a pair of files called (iirc) robots.txt and sitemap.xml. These files, if found, contain instructions on what the site owner does or does not want to be indexed by the robot. Even if these files don't exist, the search engine robot will visit each home page and read any links on the home page (in text or navigation menus) and then read those pages in turn and any other pages linked to from those pages... and so on. Before you know it, every page on your website has been indexed simply because the robot followed all the links. Not all files or pages on a website are pointed to by links - hence why the sitemap file is used to instruct the robot where to find non obvious content.

But, lots of sites get caught out by unscrupulous robots by indicating where a page is on the site, followed by a no index request (don't add this to your index PLEASE!). The robot can decide whether to obey the noindex request, or do the complete opposite, sometimes unscrupulously listing these files first in searches, given the assumption they might be hiding something 'interesting'. Google etc doesn't do this, but sites are visited by hundreds of robots, not all of them with good intentions.

Additionally, Some sites use snippets, short pieces of structured code and content, which the robot can interpret as instructions on how to display content from a site, within the search results, without having to visit the site itself. These are most widely used for things like shopping - so for example when you search for a camera, Google can show who's selling it, whether it's in stock, the price and also ratings for the product and for the shop itself - as long as the website has gone to the trouble of 'marking up' their content using a set of predefined rules which the robot/search engine understands (defined at schema.org). Alongside product listings, things likes flight times, trains, cinemas, videos, etc, will all push their products and information into the engines in this way.

It's not just pages that are indexed - in fact anything that can be retrieved via a url (web address) can be found and indexed for use in search engine results. Incidentally, there's a big industry centred on something called SEO (search engine optimisation). Companies set themselves up as 'experts' and sell services to website owners promising to get them on page 1, or better, towards the top of page 1 of search results. They do this by adding key words and meta information (data describing data) to web pages, plus they tune robots.txt and sitemap.xml based on so called 'knowledge' of search engines indexing algorithms.

There's much more to it than I've set out above, but that's the basics.

To see what Google 'knows' about MBC, click the first link for pages and documents and the second for images.

Pages: site:forums.mbclub.co.uk

Images: site:forums.mbclub.co.uk

To try this yourself, goto Google, type site: followed by the address of a website, hit return/enter and you'll see what Google has indexed for that site. Click on the image tab to see the images.
As someone who appears to know sumfink about computas one question . Is it worth having a VPN ? not really for security , its just that I travel overseas a lot and whenever I want to watch ITV (rugby) or BBC iPlayer (for unbiased political news reports...not) or SKy go I am in 'the wrong' region. Would having a VPN sort this ? is their a monthly cost ? cheers.
 
As someone who appears to know sumfink about computas one question . Is it worth having a VPN ? not really for security , its just that I travel overseas a lot and whenever I want to watch ITV (rugby) or BBC iPlayer (for unbiased political news reports...not) or SKy go I am in 'the wrong' region. Would having a VPN sort this ? is their a monthly cost ? cheers.

It used to be, but the BBC now block multiple use IP addresses as a precaution against accessing Iplayer through a VPN (accessing Iplayer outside UK is a "rights issue" apparently, irrespective of whether or not you have a TV licence). ITV hub and Channel 4 (ALL4) are still ok and accessible. Most good VPNs do have a cost. I use purevpn and have 4 year contract. It's reasonably cheap but without Iplayer access it's not what it was.

Oops, realise your question was directed elsewhere but you'll probably get the same answer :)
 
Last edited:
Reported
 

Users who are viewing this thread

Back
Top Bottom