| |
Search
Search technology is a huge subject, encompassing:
- networking (spidering the web),
- string and markup-language manipulation (parsing HTML)
- language and text-parsing (finding words & sentences in documents, stemming and other
linguistic analysis),
- algorithms (finding matches, AND/OR queries, combining multiple word results),
and
- performance (both increasing spidering speed, and making large catalogs fast to search).
In addition to the articles and code below, these search-related links
might be interesting or useful.
Searcharoo.NET - Version 7
LATEST !
|
|
Highlight search terms in a proper 'document summary' on the results page
|
Searcharoo.NET - Version 6
|
|
Search/index/catalog IMAGES, and GPS coordinates!
|
Searcharoo.NET - Version 5
|
|
Remove Binary Serialization to solve Medium Trust problem; index OpenXML document formats
|
Searcharoo.NET - Version 4
|
|
Refactored codebase and ability to index and search Microsoft Word, Excel, PowerPoint and Acrobat PDFs. Little improvements like robots.txt and excluding regions of HTML also added.
|
Searcharoo.NET - Version 3
|
|
Add disk-based catalog persistence, frameset/iframe spidering,
paged results, stemming, stop-words and more!
|
|
Searcharoo.NET - Version 2
|
|
Extend Searcharoo to populate its search
catalog by Spidering HTML pages - follow links and imagemaps
to process both static and dynamicly generated pages!
You can also search for multiple words.
|
Searcharoo.NET - Version 1
|
|
How to build a simple, extensible search engine using ASP.NET that
can crawl files and create a searchable catalog by processing the
text from HTML source.
|
|
|
Useful links
searcharoo.net
On Search,
the Series
Lucene.net [Open Source]
Nata1 [Open Source]
SiteSearchEngine [article]
What is Stemming?
Robots.txt
more links »
|