Loading pre-trained spanBERT from ./pretrained_spanbert ____ Parameters: Client key = XXXXXX Engine key = XXXXXX Gemini key = XXXXXX Method = spanbert Relation = Schools_Attended Threshold = 0.7 Query = sergey brin stanford # of Tuples = 10 Loading necessary libraries; This should take a minute or so ...) =========== Iteration: 0 - Query: sergey brin stanford =========== URL ( 1 / 10): https://engineering.stanford.edu/about/heroes/2014-heroes/sergey-brin Fetching text from url ... Webpage length (num characters): 6736 Annotating the webpage using spacy... Extracted 38 sentences. Processing each sentence one by one to check for presence of right pair of named entity types; if so, will run the second pipeline ... Processed 5 / 38 sentences Processed 10 / 38 sentences Processed 15 / 38 sentences Processed 20 / 38 sentences === Extracted Relation === Input tokens: ['Future', 'Main', 'content', 'start', 'Sergey', 'Brin', '—', 'Google', 'co', '-', 'founder', 'Sergey', 'Brin', 'co', '-', 'founded', 'web', '-', 'search', 'giant', 'Google', 'Inc.', 'in', '1998', 'with', 'fellow', 'Stanford', 'student', 'Larry', 'Page', '.'] Output Confidence: 0.687599 ; Subject: Sergey Brin ; Object: Stanford ; Confidence is lower than threshold confidence. Ignoring this. ========== === Extracted Relation === Input tokens: ['Google', 'co', '-', 'founder', 'Sergey', 'Brin', 'co', '-', 'founded', 'web', '-', 'search', 'giant', 'Google', 'Inc.', 'in', '1998', 'with', 'fellow', 'Stanford', 'student', 'Larry', 'Page', '.'] Output Confidence: 0.82693744 ; Subject: Larry Page ; Object: Google ; Adding to set of extracted relations ========== === Extracted Relation === Input tokens: ['founder', 'Sergey', 'Brin', 'co', '-', 'founded', 'web', '-', 'search', 'giant', 'Google', 'Inc.', 'in', '1998', 'with', 'fellow', 'Stanford', 'student', 'Larry', 'Page', '.'] Output Confidence: 0.8182789 ; Subject: Sergey Brin ; Object: Stanford ; Adding to set of extracted relations ========== === Extracted Relation === Input tokens: ['search', 'giant', 'Google', 'Inc.', 'in', '1998', 'with', 'fellow', 'Stanford', 'student', 'Larry', 'Page', '.'] Output Confidence: 0.95134175 ; Subject: Larry Page ; Object: Stanford ; Adding to set of extracted relations ========== === Extracted Relation === Input tokens: ['Brin', 'earned', 'his', 'master', '’s', 'degree', 'in', 'computer', 'science', 'at', 'Stanford', ','] Output Confidence: 0.77572674 ; Subject: Brin ; Object: Stanford ; Adding to set of extracted relations ========== === Extracted Relation === Input tokens: ['Brin', 'earned', 'his', 'master', '’s', 'degree', 'in', 'computer', 'science', 'at', 'Stanford', ',', 'where', 'he', 'and', 'Page', 'developed', 'the', '“'] Output Confidence: 0.9301693 ; Subject: Page ; Object: Stanford ; Adding to set of extracted relations ========== Processed 25 / 38 sentences Processed 30 / 38 sentences Processed 35 / 38 sentences Extracted annotations for 2 out of total 38 sentences Relations extracted from this website: 5 (Overall: 6) URL ( 2 / 10): http://infolab.stanford.edu/~sergey/ Fetching text from url ... Webpage length (num characters): 4579 Annotating the webpage using spacy... Extracted 58 sentences. Processing each sentence one by one to check for presence of right pair of named entity types; if so, will run the second pipeline ... === Extracted Relation === Input tokens: [' ', 'Sergey', 'Brin', 'Sergey', 'Brin', "'s", 'Home', 'Page', 'Ph.D.', 'student', 'in', 'Computer', 'Science', 'at', 'Stanford', '-', 'sergey@cs.stanford.edu', 'Research', 'Currently', 'I', 'am', 'at', 'Google', '.'] Output Confidence: 0.58440787 ; Subject: Sergey Brin Sergey Brin's ; Object: Stanford - sergey@cs.stanford.edu Research ; Confidence is lower than threshold confidence. Ignoring this. ========== === Extracted Relation === Input tokens: [' ', 'Sergey', 'Brin', 'Sergey', 'Brin', "'s", 'Home', 'Page', 'Ph.D.', 'student', 'in', 'Computer', 'Science', 'at', 'Stanford', '-', 'sergey@cs.stanford.edu', 'Research', 'Currently', 'I', 'am', 'at', 'Google', '.'] Output Confidence: 0.48315975 ; Subject: Sergey Brin Sergey Brin's ; Object: Google ; Confidence is lower than threshold confidence. Ignoring this. ========== Processed 5 / 58 sentences Processed 10 / 58 sentences Processed 15 / 58 sentences Processed 20 / 58 sentences Processed 25 / 58 sentences Processed 30 / 58 sentences Processed 35 / 58 sentences Processed 40 / 58 sentences === Extracted Relation === Input tokens: ['Together', 'with', 'James', 'Davis', '(', 'another', 'Ph.D.', 'student', 'here', ')', ',', 'we', 'developed', 'COPS', ',', 'the', 'COpyright', 'Protection', 'System', '.'] Output Confidence: 0.5346523 ; Subject: James Davis ; Object: the COpyright Protection System ; Confidence is lower than threshold confidence. Ignoring this. ========== Processed 45 / 58 sentences Processed 50 / 58 sentences Processed 55 / 58 sentences Extracted annotations for 2 out of total 58 sentences Relations extracted from this website: 0 (Overall: 3) URL ( 3 / 10): https://en.wikipedia.org/wiki/Sergey_Brin Fetching text from url ... Trimming webpage content from 39153 to 10000 characters Webpage length (num characters): 10000 Annotating the webpage using spacy... Extracted 73 sentences. Processing each sentence one by one to check for presence of right pair of named entity types; if so, will run the second pipeline ... Processed 5 / 73 sentences Processed 10 / 73 sentences Processed 15 / 73 sentences Processed 20 / 73 sentences Processed 25 / 73 sentences === Extracted Relation === Input tokens: ['Mikhail', 'and', 'Eugenia', 'Brin', ',', 'both', 'graduates', 'of', 'Moscow', 'State', 'University', '('] Output Confidence: 0.99278957 ; Subject: Mikhail ; Object: Moscow State University ; Adding to set of extracted relations ========== === Extracted Relation === Input tokens: ['Mikhail', 'and', 'Eugenia', 'Brin', ',', 'both', 'graduates', 'of', 'Moscow', 'State', 'University', '('] Output Confidence: 0.9908487 ; Subject: Eugenia Brin ; Object: Moscow State University ; Adding to set of extracted relations ========== Processed 30 / 73 sentences Processed 35 / 73 sentences Processed 40 / 73 sentences === Extracted Relation === Input tokens: ['Eric', 'Schmidt', ',', 'Sergey', 'Brin', 'and', 'Larry', 'Page', ',', '2008', 'During', 'an', 'orientation', 'for', 'new', 'students', 'at', 'Stanford', ','] Output Confidence: 0.48997855 ; Subject: Eric Schmidt ; Object: Stanford ; Confidence is lower than threshold confidence. Ignoring this. ========== Processed 45 / 73 sentences Processed 50 / 73 sentences Processed 55 / 73 sentences Processed 60 / 73 sentences Processed 65 / 73 sentences Processed 70 / 73 sentences Extracted annotations for 2 out of total 73 sentences Relations extracted from this website: 2 (Overall: 3) URL ( 4 / 10): https://www.quora.com/What-was-it-like-to-be-at-Stanford-with-Sergey-Brin-and-Larry-Page Fetching text from url ... Webpage length (num characters): 194 Annotating the webpage using spacy... Extracted 4 sentences. Processing each sentence one by one to check for presence of right pair of named entity types; if so, will run the second pipeline ... Extracted annotations for 0 out of total 4 sentences Relations extracted from this website: 0 (Overall: 0) URL ( 5 / 10): https://snap.stanford.edu/class/cs224w-readings/Brin98Anatomy.pdf Fetching text from url ... Unable to fetch URL. Continuing. URL ( 6 / 10): https://news.stanford.edu/2022/08/05/wastewaterscan-will-monitor-wastewater-covid-19-monkeypox-diseases/ Fetching text from url ... Webpage length (num characters): 7438 Annotating the webpage using spacy... Extracted 39 sentences. Processing each sentence one by one to check for presence of right pair of named entity types; if so, will run the second pipeline ... Processed 5 / 39 sentences Processed 10 / 39 sentences === Extracted Relation === Input tokens: ['said', 'Alexandria', 'Boehm', ',', 'professor', 'of', 'civil', 'and', 'environmental', 'engineering', 'at', 'Stanford', '.'] Output Confidence: 0.7337611 ; Subject: Alexandria Boehm ; Object: Stanford ; Adding to set of extracted relations ========== Processed 15 / 39 sentences Processed 20 / 39 sentences Processed 25 / 39 sentences Processed 30 / 39 sentences Processed 35 / 39 sentences Extracted annotations for 1 out of total 39 sentences Relations extracted from this website: 1 (Overall: 1) URL ( 7 / 10): https://cee.stanford.edu/news/stanford-school-engineering-names-new-engineering-heroes Fetching text from url ... Trimming webpage content from 10595 to 10000 characters Webpage length (num characters): 10000 Annotating the webpage using spacy... Extracted 62 sentences. Processing each sentence one by one to check for presence of right pair of named entity types; if so, will run the second pipeline ... Processed 5 / 62 sentences Processed 10 / 62 sentences === Extracted Relation === Input tokens: ['Irmgard', 'Flügge', '-', 'Lotz', ',', 'Stanford', '’s', 'first', 'female', 'professor', 'of', 'engineering', ','] Output Confidence: 0.51495796 ; Subject: Irmgard Flügge-Lotz ; Object: Stanford ; Confidence is lower than threshold confidence. Ignoring this. ========== Processed 15 / 62 sentences Processed 20 / 62 sentences Processed 25 / 62 sentences Processed 30 / 62 sentences Processed 35 / 62 sentences Processed 40 / 62 sentences === Extracted Relation === Input tokens: ['While', 'pursuing', 'a', 'PhD', 'at', 'Stanford', ',', 'Page', 'and', 'fellow', 'student', 'Sergey', 'Brin', 'developed', 'a', '"'] Output Confidence: 0.9735057 ; Subject: Sergey Brin ; Object: Stanford ; Adding to set of extracted relations ========== Processed 45 / 62 sentences Processed 50 / 62 sentences Processed 55 / 62 sentences Processed 60 / 62 sentences Extracted annotations for 2 out of total 62 sentences Relations extracted from this website: 1 (Overall: 2) URL ( 8 / 10): https://news.stanford.edu/report/2022/10/26/campus-curiosities/ Fetching text from url ... Trimming webpage content from 16144 to 10000 characters Webpage length (num characters): 10000 Annotating the webpage using spacy... Extracted 78 sentences. Processing each sentence one by one to check for presence of right pair of named entity types; if so, will run the second pipeline ... Processed 5 / 78 sentences Processed 10 / 78 sentences Processed 15 / 78 sentences Processed 20 / 78 sentences Processed 25 / 78 sentences === Extracted Relation === Input tokens: ['From', 'the', 'Huang', 'Engineering', 'Center', 'Innovations', 'Tour', ':', 'As', 'electrical', 'engineering', 'undergraduates', 'and', 'graduate', 'students', 'at', 'Stanford', 'in', 'the', '1930s', ',', 'William', 'Hewlett', 'and', 'David', 'Packard', 'encountered', 'three', 'influences', 'that', 'would', 'guide', 'the', 'rest', 'of', 'their', 'lives', ':'] Output Confidence: 0.73303163 ; Subject: David Packard ; Object: the Huang Engineering Center Innovations Tour ; Adding to set of extracted relations ========== === Extracted Relation === Input tokens: ['As', 'electrical', 'engineering', 'undergraduates', 'and', 'graduate', 'students', 'at', 'Stanford', 'in', 'the', '1930s', ',', 'William', 'Hewlett', 'and', 'David', 'Packard', 'encountered', 'three', 'influences', 'that', 'would', 'guide', 'the', 'rest', 'of', 'their', 'lives', ':'] Output Confidence: 0.66579115 ; Subject: William Hewlett ; Object: Stanford ; Confidence is lower than threshold confidence. Ignoring this. ========== === Extracted Relation === Input tokens: ['As', 'electrical', 'engineering', 'undergraduates', 'and', 'graduate', 'students', 'at', 'Stanford', 'in', 'the', '1930s', ',', 'William', 'Hewlett', 'and', 'David', 'Packard', 'encountered', 'three', 'influences', 'that', 'would', 'guide', 'the', 'rest', 'of', 'their', 'lives', ':'] Output Confidence: 0.9672189 ; Subject: David Packard ; Object: Stanford ; Adding to set of extracted relations ========== Processed 30 / 78 sentences Processed 35 / 78 sentences Processed 40 / 78 sentences Processed 45 / 78 sentences Processed 50 / 78 sentences Processed 55 / 78 sentences Processed 60 / 78 sentences Processed 65 / 78 sentences Processed 70 / 78 sentences Processed 75 / 78 sentences Extracted annotations for 1 out of total 78 sentences Relations extracted from this website: 2 (Overall: 3) URL ( 9 / 10): https://graphics.stanford.edu/~dk/google_name_origin.html Fetching text from url ... Webpage length (num characters): 1815 Annotating the webpage using spacy... Extracted 12 sentences. Processing each sentence one by one to check for presence of right pair of named entity types; if so, will run the second pipeline ... Processed 5 / 12 sentences Processed 10 / 12 sentences Extracted annotations for 0 out of total 12 sentences Relations extracted from this website: 0 (Overall: 0) URL ( 10 / 10): https://about.google/intl/ALL_us/our-story/ Fetching text from url ... Webpage length (num characters): 3996 Annotating the webpage using spacy... Extracted 27 sentences. Processing each sentence one by one to check for presence of right pair of named entity types; if so, will run the second pipeline ... Processed 5 / 27 sentences === Extracted Relation === Input tokens: ['Larry', 'Page', 'was', 'considering', 'Stanford', 'for', 'grad', 'school', 'and', 'Sergey', 'Brin', ','] Output Confidence: 0.53660536 ; Subject: Larry Page ; Object: Stanford ; Confidence is lower than threshold confidence. Ignoring this. ========== Processed 10 / 27 sentences Processed 15 / 27 sentences Processed 20 / 27 sentences Processed 25 / 27 sentences Extracted annotations for 1 out of total 27 sentences Relations extracted from this website: 0 (Overall: 1) ================== ALL RELATIONS for per:schools_attended ( 10 ) ================= Confidence: 0.99278957 | Subject: Mikhail | Object: Moscow State University Confidence: 0.9908487 | Subject: Eugenia Brin | Object: Moscow State University Confidence: 0.9735057 | Subject: Sergey Brin | Object: Stanford Confidence: 0.9672189 | Subject: David Packard | Object: Stanford Confidence: 0.95134175 | Subject: Larry Page | Object: Stanford Confidence: 0.9301693 | Subject: Page | Object: Stanford Confidence: 0.82693744 | Subject: Larry Page | Object: Google Confidence: 0.77572674 | Subject: Brin | Object: Stanford Confidence: 0.7337611 | Subject: Alexandria Boehm | Object: Stanford Confidence: 0.73303163 | Subject: David Packard | Object: the Huang Engineering Center Innovations Tour Total # of iterations = 1