Friday 17 March 2017

How Crawl and Indexing works in SharePoint?

Remember - For Search to work, SharePoint would first Index the Content Sources.

Main thing is that it can index, it can crawl anything stored in a server, in any format PDF, Zip, word, excel , txt, HTML, RTF, MS-Office etc. But to index the content other than office or Microsoft the index process is little more complicated but interesting. But the initial or the process is same for all. So let’s have a look on it.
1) When the scheduler for the crawl or index run, it will search for every place you have defined or what you called it a content source.
2) When it find a file in that, it will look on it extension. It will check in SharePoint SSP whether the type is defined to be indexed or not.
3) Now when the SharePoint confirms of file type it will look for a software/Ifilter to read this file. Ifilter is a software which will read a file. Every file need its own ifilter.
4) If SharePoint finds a ifilter for it. It will start opening this file and start scanning the file. It will remove certain words that are not required in search or not need to be indexed ex: 1 ,2 numerals etc.
5) After scanning the whole file it will index the content in index file with the pointer of name and location of the file.
6) Once a file is completed with full process. It will start for next file and with the same process as above.
Now to search or index any file of our use like PDF etc. we need to install the ifilter of every such types, which do not come by default. We can also put the images of such file type in SharePoint images file(12 hive) so that in search document come with their images. ðŸ™‚
- Blog Credit
Ashish Banga

How Search Works in the Back End

Hey Friends, 
Very Basic, yet very Important Blog..
How Search Works?
a) When a user enter any query in the search column of a page. Web service of a web server will get this request.
b) Web server will send this request to search server/service. If in a farm webserver and search server lies in same server. Web service will send request to search service in same server itself.
c) Search server will look for that query in it index files.
Note: Index files are propagated to searc*h server after the indexing/crawl done by index server and propagated to search server at this default location: C:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications. It can be changed acc to requirement.
Indexing/crawling concept can be check at my blog: How crawl works in SharePoint-how indexing work-basic concept
d) If search server get the result of the query in the index file.
e) It will pick the result of documents ,images etc. from the SQL , NAS etc.
f) Stream of result are provided in XML format to the Web service(web server).
g) Web service will convert the xml to html and return the result back to the client.