Ishamael
Throne of Decay LVL: 777

posted 10 October 2004 01:50 AM
Questions about the 'Relevancy' ratings on search.
Curently, it pulls all results with the given term in them out of the database. The relevancy algorithim sorts them by occurence of the search term. Then the outliers are discarded (this is determined by anything with a p value less then 0.05 that's more than 2 s.d's out of the mean). Then the mean is recalculated. The value 2 s.d's mathematical north of the mean is taken as 100, without any scaling. So now we've come out with a theortical maximum occurence number. After this, every posts' occurence count is divided by the maximum (The outliers are reinserted as a value of 100, they were just discareded for our theo. max. In reality they clock in over 100).
As it stands, I'm not sure this is the best way to approach it. Mainly cause I'm wrecked. I don't know if defining outliers as beyond the original 2 s.d's is the best way to go, as perhaps we should just find the median, find the min/max, and determine an interval?
Not sure about the best way to proceed, any input would be nice. Or maybe i'll just sober up. the first is more likely.
Posts: 793  ID: 7  Aligned: iniquity  House: decay 