Data-Assessing-and-Cleaning

	patient_id	assigned_sex	given_name	surname	address	city	state	zip_code	country	contact	birthdate	weight	height	bmi
0	1	female	Zoe	Wellish	576 Brown Bear Drive	Rancho California	California	92390.0	United States	951-719-9170ZoeWellish@superrito.com	7/10/1976	121.7	66	19.6
1	2	female	Pamela	Hill	2370 University Hill Road	Armstrong	Illinois	61812.0	United States	PamelaSHill@cuvox.de+1 (217) 569-3204	4/3/1967	118.8	66	19.2
2	3	male	Jae	Debord	1493 Poling Farm Road	York	Nebraska	68467.0	United States	402-363-6804JaeMDebord@gustr.com	2/19/1980	177.8	71	24.8
3	4	male	Liêm	Phan	2335 Webster Street	Woodbridge	NJ	7095.0	United States	PhanBaLiem@jourrapide.com+1 (732) 636-8246	7/26/1951	220.9	70	31.7
4	5	male	Tim	Neudorf	1428 Turkey Pen Lane	Dothan	AL	36303.0	United States	334-515-7487TimNeudorf@cuvox.de	2/18/1928	192.3	27	26.1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
498	499	male	Mustafa	Lindström	2530 Victoria Court	Milton Mills	ME	3852.0	United States	207-477-0579MustafaLindstrom@jourrapide.com	4/10/1959	181.1	72	24.6
499	500	male	Ruman	Bisliev	494 Clarksburg Park Road	Sedona	AZ	86341.0	United States	928-284-4492RumanBisliev@gustr.com	3/26/1948	239.6	70	34.4
500	501	female	Jinke	de Keizer	649 Nutter Street	Overland Park	MO	64110.0	United States	816-223-6007JinkedeKeizer@teleworm.us	1/13/1971	171.2	67	26.8
501	502	female	Chidalu	Onyekaozulu	3652 Boone Crockett Lane	Seattle	WA	98109.0	United States	ChidaluOnyekaozulu@jourrapide.com1 360 443 2060	2/13/1952	176.9	67	27.7
502	503	male	Pat	Gersten	2778 North Avenue	Burr	Nebraska	68324.0	United States	PatrickGersten@rhyta.com402-848-4923	5/3/1954	138.2	71	19.3

	given_name	surname	auralin	novodra	hba1c_start	hba1c_end	hba1c_change
0	veronika	jindrová	41u - 48u	-	7.63	7.20	NaN
1	elliot	richardson	-	40u - 45u	7.56	7.09	0.97
2	yukitaka	takenaka	-	39u - 36u	7.68	7.25	NaN
3	skye	gormanston	33u - 36u	-	7.97	7.62	0.35
4	alissa	montez	-	33u - 29u	7.78	7.46	0.32
...	...	...	...	...	...	...	...
275	albina	zetticci	45u - 51u	-	7.93	7.73	0.20
276	john	teichelmann	-	49u - 49u	7.90	7.58	NaN
277	mathea	lillebø	23u - 36u	-	9.04	8.67	0.37
278	vallie	prince	31u - 38u	-	7.64	7.28	0.36
279	samúel	guðbrandsson	53u - 56u	-	8.00	7.64	0.36

	given_name	surname	adverse_reaction
0	berta	napolitani	injection site discomfort
1	lena	baer	hypoglycemia
2	joseph	day	hypoglycemia
3	flavia	fiorentino	cough
4	manouck	wubbels	throat irritation
5	jasmine	sykes	hypoglycemia
6	louise	johnson	hypoglycemia
7	albinca	komavec	hypoglycemia
8	noe	aranda	hypoglycemia
9	sofia	hermansen	injection site discomfort
10	tegan	johnson	headache
11	abel	yonatan	cough
12	abdul-nur	isa	hypoglycemia
13	leon	scholz	injection site discomfort
14	gabriele	saenger	hypoglycemia
15	jia li	teng	nausea
16	jakob	jakobsen	hypoglycemia
17	christopher	woodward	nausea
18	ole	petersen	hypoglycemia
19	finley	chandler	headache
20	anenechi	chidi	hypoglycemia
21	miłosław	wiśniewski	injection site discomfort
22	lixue	hsueh	injection site discomfort
23	merci	leroux	hypoglycemia
24	kang	mai	injection site discomfort
25	elliot	richardson	hypoglycemia
26	clinton	miller	throat irritation
27	idalia	moore	hypoglycemia
28	xiuxiu	chang	hypoglycemia
29	alex	crawford	hypoglycemia
30	monika	lončar	hypoglycemia
31	steven	roy	headache
32	cecilie	nilsen	hypoglycemia
33	krisztina	magyar	hypoglycemia

	patient_id	assigned_sex	given_name	surname	address	city	state	zip_code	country	contact	birthdate	weight	height	bmi
0	1	female	Zoe	Wellish	576 Brown Bear Drive	Rancho California	California	92390.0	United States	951-719-9170ZoeWellish@superrito.com	7/10/1976	121.7	66	19.6
1	2	female	Pamela	Hill	2370 University Hill Road	Armstrong	Illinois	61812.0	United States	PamelaSHill@cuvox.de+1 (217) 569-3204	4/3/1967	118.8	66	19.2
2	3	male	Jae	Debord	1493 Poling Farm Road	York	Nebraska	68467.0	United States	402-363-6804JaeMDebord@gustr.com	2/19/1980	177.8	71	24.8
3	4	male	Liêm	Phan	2335 Webster Street	Woodbridge	NJ	7095.0	United States	PhanBaLiem@jourrapide.com+1 (732) 636-8246	7/26/1951	220.9	70	31.7
4	5	male	Tim	Neudorf	1428 Turkey Pen Lane	Dothan	AL	36303.0	United States	334-515-7487TimNeudorf@cuvox.de	2/18/1928	192.3	27	26.1

	patient_id	assigned_sex	given_name	surname	address	city	state	zip_code	country	contact	birthdate	weight	height	bmi
498	499	male	Mustafa	Lindström	2530 Victoria Court	Milton Mills	ME	3852.0	United States	207-477-0579MustafaLindstrom@jourrapide.com	4/10/1959	181.1	72	24.6
499	500	male	Ruman	Bisliev	494 Clarksburg Park Road	Sedona	AZ	86341.0	United States	928-284-4492RumanBisliev@gustr.com	3/26/1948	239.6	70	34.4
500	501	female	Jinke	de Keizer	649 Nutter Street	Overland Park	MO	64110.0	United States	816-223-6007JinkedeKeizer@teleworm.us	1/13/1971	171.2	67	26.8
501	502	female	Chidalu	Onyekaozulu	3652 Boone Crockett Lane	Seattle	WA	98109.0	United States	ChidaluOnyekaozulu@jourrapide.com1 360 443 2060	2/13/1952	176.9	67	27.7
502	503	male	Pat	Gersten	2778 North Avenue	Burr	Nebraska	68324.0	United States	PatrickGersten@rhyta.com402-848-4923	5/3/1954	138.2	71	19.3

	given_name	surname	auralin	novodra	hba1c_start	hba1c_end	hba1c_change
60	onyekachukwu	obinna	37u - 46u	-	7.58	7.12	NaN
54	oles	zhdanov	54u - 67u	-	7.52	7.11	NaN
255	jia li	teng	48u - 54u	-	7.66	7.32	0.34
143	nora	nyborg	55u - 59u	-	7.83	7.48	0.35
271	leo	vieira	-	30u - 33u	7.74	7.36	NaN

	patient_id	zip_code	weight	height	bmi
count	503.000000	491.000000	503.000000	503.000000	503.000000
mean	252.000000	49084.118126	173.434990	66.634195	27.483897
std	145.347859	30265.807442	33.916741	4.411297	5.276438
min	1.000000	1002.000000	48.800000	27.000000	17.100000
25%	126.500000	21920.500000	149.300000	63.000000	23.300000
50%	252.000000	48057.000000	175.300000	67.000000	27.200000
75%	377.500000	75679.000000	199.500000	70.000000	31.750000
max	503.000000	99701.000000	255.900000	79.000000	37.700000

	hba1c_start	hba1c_end	hba1c_change
count	280.000000	280.000000	171.000000
mean	7.985929	7.589286	0.546023
std	0.568638	0.569672	0.279555
min	7.500000	7.010000	0.200000
25%	7.660000	7.270000	0.340000
50%	7.800000	7.420000	0.380000
75%	7.970000	7.570000	0.920000
max	9.950000	9.580000	0.990000

	patient_id	assigned_sex	given_name	surname	address	city	state	zip_code	country	contact	birthdate	weight	height	bmi
9	10	female	Sophie	Cabrera	3303 Anmoore Road	New York	New York	10011.0	United States	SophieCabreraIbarra@teleworm.us1 718 795 9124	12/3/1930	194.7	64	33.4
35	36	female	Kamila	Pecinová	3558 Longview Avenue	New York	New York	10004.0	United States	718-501-0503KamilaPecinova@dayrep.com	12/23/1985	198.9	62	36.4
84	85	female	Nương	Vũ	465 Southern Street	New York	NY	10001.0	United States	VuCamNuong@fleckens.hu516-720-5094	2/1/1981	138.2	63	24.5
129	130	female	Rebecca	Jephcott	989 Wayback Lane	New York	NY	10004.0	United States	631-370-7406RebeccaJephcott@armyspy.com	8/1/1966	203.3	65	33.8
142	143	male	Finley	Chandler	2754 Westwood Avenue	New York	New York	10001.0	United States	516-740-5280FinleyChandler@dayrep.com	10/25/1936	150.9	70	21.6
152	153	male	Christopher	Woodward	3450 Southern Street	New York	NY	10004.0	United States	ChristopherWoodward@jourrapide.com+1 (516) 630...	9/4/1984	212.2	66	34.2
188	189	male	Søren	Sørensen	2397 Bell Street	New York	NY	10011.0	United States	SrenSrensen@superrito.com1 212 201 3108	12/31/1942	157.1	67	24.6
213	214	female	Onyemaechi	Onwughara	685 Duncan Avenue	New York	NY	10013.0	United States	917-622-9142OnyemaechiOnwughara@einrot.com	3/8/1989	131.1	69	19.4
215	216	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
229	230	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
237	238	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
244	245	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
247	248	male	Tuukka	Leppäluoto	1886 Bicetown Road	New York	NY	10011.0	United States	917-408-8855TuukkaLeppaluoto@teleworm.us	3/7/1978	211.0	73	27.8
251	252	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
263	264	female	Julia	Carvalho	3662 Shinn Street	New York	NY	10036.0	United States	JuliaAzevedoCarvalho@superrito.com+1 (212) 782...	4/11/1931	171.8	61	32.5
277	278	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
301	302	female	Onyekachukwu	Obinna	2970 Forest Avenue	New York	NY	10004.0	United States	OnyekachukwuObinna@teleworm.us646-982-6609	1/24/1997	154.7	65	25.7
461	462	male	Cannan	Cabrera	2102 Geraldine Lane	New York	NY	10014.0	United States	646-289-4177CannanCabreraOrdonez@superrito.com	10/12/1980	209.7	71	29.2

	patient_id	assigned_sex	given_name	surname	address	city	state	zip_code	country	contact	birthdate	weight	height	bmi
9	10	female	Sophie	Cabrera	3303 Anmoore Road	New York	New York	10011.0	United States	SophieCabreraIbarra@teleworm.us1 718 795 9124	12/3/1930	194.7	64	33.4
35	36	female	Kamila	Pecinová	3558 Longview Avenue	New York	New York	10004.0	United States	718-501-0503KamilaPecinova@dayrep.com	12/23/1985	198.9	62	36.4
84	85	female	Nương	Vũ	465 Southern Street	New York	NY	10001.0	United States	VuCamNuong@fleckens.hu516-720-5094	2/1/1981	138.2	63	24.5
129	130	female	Rebecca	Jephcott	989 Wayback Lane	New York	NY	10004.0	United States	631-370-7406RebeccaJephcott@armyspy.com	8/1/1966	203.3	65	33.8
142	143	male	Finley	Chandler	2754 Westwood Avenue	New York	New York	10001.0	United States	516-740-5280FinleyChandler@dayrep.com	10/25/1936	150.9	70	21.6
152	153	male	Christopher	Woodward	3450 Southern Street	New York	NY	10004.0	United States	ChristopherWoodward@jourrapide.com+1 (516) 630...	9/4/1984	212.2	66	34.2
188	189	male	Søren	Sørensen	2397 Bell Street	New York	NY	10011.0	United States	SrenSrensen@superrito.com1 212 201 3108	12/31/1942	157.1	67	24.6
213	214	female	Onyemaechi	Onwughara	685 Duncan Avenue	New York	NY	10013.0	United States	917-622-9142OnyemaechiOnwughara@einrot.com	3/8/1989	131.1	69	19.4
215	216	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
229	230	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
237	238	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
244	245	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
247	248	male	Tuukka	Leppäluoto	1886 Bicetown Road	New York	NY	10011.0	United States	917-408-8855TuukkaLeppaluoto@teleworm.us	3/7/1978	211.0	73	27.8
251	252	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
263	264	female	Julia	Carvalho	3662 Shinn Street	New York	NY	10036.0	United States	JuliaAzevedoCarvalho@superrito.com+1 (212) 782...	4/11/1931	171.8	61	32.5
277	278	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
301	302	female	Onyekachukwu	Obinna	2970 Forest Avenue	New York	NY	10004.0	United States	OnyekachukwuObinna@teleworm.us646-982-6609	1/24/1997	154.7	65	25.7
461	462	male	Cannan	Cabrera	2102 Geraldine Lane	New York	NY	10014.0	United States	646-289-4177CannanCabreraOrdonez@superrito.com	10/12/1980	209.7	71	29.2

	patient_id	assigned_sex	given_name	surname	address	city	state	zip_code	country	contact	birthdate	weight	height	bmi
209	210	female	Lalita	Eldarkhanov	NaN	NaN	NaN	NaN	NaN	NaN	8/14/1950	143.4	62	26.2
219	220	male	Mỹ	Quynh	NaN	NaN	NaN	NaN	NaN	NaN	4/9/1978	237.8	69	35.1
230	231	female	Elisabeth	Knudsen	NaN	NaN	NaN	NaN	NaN	NaN	9/23/1976	165.9	63	29.4
234	235	female	Martina	Tománková	NaN	NaN	NaN	NaN	NaN	NaN	4/7/1936	199.5	65	33.2
242	243	male	John	O'Brian	NaN	NaN	NaN	NaN	NaN	NaN	2/25/1957	205.3	74	26.4
249	250	male	Benjamin	Mehler	NaN	NaN	NaN	NaN	NaN	NaN	10/30/1951	146.5	69	21.6
257	258	male	Jin	Kung	NaN	NaN	NaN	NaN	NaN	NaN	5/17/1995	231.7	69	34.2
264	265	female	Wafiyyah	Asfour	NaN	NaN	NaN	NaN	NaN	NaN	11/3/1989	158.6	63	28.1
269	270	female	Flavia	Fiorentino	NaN	NaN	NaN	NaN	NaN	NaN	10/9/1937	175.2	61	33.1
278	279	female	Generosa	Cabán	NaN	NaN	NaN	NaN	NaN	NaN	12/16/1962	124.3	69	18.4
286	287	male	Lewis	Webb	NaN	NaN	NaN	NaN	NaN	NaN	4/1/1979	155.3	68	23.6
296	297	female	Chỉ	Lâm	NaN	NaN	NaN	NaN	NaN	NaN	5/14/1990	181.1	63	32.1

	patient_id	assigned_sex	given_name	surname	address	city	state	zip_code	country	contact	birthdate	weight	height	bmi
29	30	male	Jake	Jakobsen	648 Old Dear Lane	Port Jervis	New York	12771.0	United States	JakobCJakobsen@einrot.com+1 (845) 858-7707	8/1/1985	155.8	67	24.4
219	220	male	Mỹ	Quynh	NaN	NaN	NaN	NaN	NaN	NaN	4/9/1978	237.8	69	35.1
229	230	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
230	231	female	Elisabeth	Knudsen	NaN	NaN	NaN	NaN	NaN	NaN	9/23/1976	165.9	63	29.4
234	235	female	Martina	Tománková	NaN	NaN	NaN	NaN	NaN	NaN	4/7/1936	199.5	65	33.2
237	238	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
242	243	male	John	O'Brian	NaN	NaN	NaN	NaN	NaN	NaN	2/25/1957	205.3	74	26.4
244	245	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
249	250	male	Benjamin	Mehler	NaN	NaN	NaN	NaN	NaN	NaN	10/30/1951	146.5	69	21.6
251	252	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
257	258	male	Jin	Kung	NaN	NaN	NaN	NaN	NaN	NaN	5/17/1995	231.7	69	34.2
264	265	female	Wafiyyah	Asfour	NaN	NaN	NaN	NaN	NaN	NaN	11/3/1989	158.6	63	28.1
269	270	female	Flavia	Fiorentino	NaN	NaN	NaN	NaN	NaN	NaN	10/9/1937	175.2	61	33.1
277	278	male	John	Doe	123 Main Street	New York	NY	12345.0	United States	johndoe@email.com1234567890	1/1/1975	180.0	72	24.4
278	279	female	Generosa	Cabán	NaN	NaN	NaN	NaN	NaN	NaN	12/16/1962	124.3	69	18.4
282	283	female	Sandy	Taylor	2476 Fulton Street	Rainelle	WV	25962.0	United States	304-438-2648SandraCTaylor@dayrep.com	10/23/1960	206.1	64	35.4
286	287	male	Lewis	Webb	NaN	NaN	NaN	NaN	NaN	NaN	4/1/1979	155.3	68	23.6
296	297	female	Chỉ	Lâm	NaN	NaN	NaN	NaN	NaN	NaN	5/14/1990	181.1	63	32.1
502	503	male	Pat	Gersten	2778 North Avenue	Burr	Nebraska	68324.0	United States	PatrickGersten@rhyta.com402-848-4923	5/3/1954	138.2	71	19.3

	given_name	surname	auralin	novodra	hba1c_start	hba1c_end	hba1c_change
0	veronika	jindrová	41u - 48u	-	7.63	7.20	NaN
1	elliot	richardson	-	40u - 45u	7.56	7.09	0.97
2	yukitaka	takenaka	-	39u - 36u	7.68	7.25	NaN
3	skye	gormanston	33u - 36u	-	7.97	7.62	0.35
4	alissa	montez	-	33u - 29u	7.78	7.46	0.32

	given_name	surname	auralin	novodra	hba1c_start	hba1c_end	hba1c_change
345	rovzan	kishiev	32u - 37u	-	7.75	7.41	0.34
346	jakob	jakobsen	-	28u - 26u	7.96	7.51	0.95
347	bernd	schneider	48u - 56u	-	7.74	7.44	0.30
348	berta	napolitani	-	42u - 44u	7.68	7.21	NaN
349	armina	sauvé	36u - 46u	-	7.86	7.40	NaN

	given_name	surname	hba1c_start	hba1c_end	hba1c_change	treatment	dose
0	veronika	jindrová	7.63	7.20	0.43	auralin	41u - 48u
1	elliot	richardson	7.56	7.09	0.47	auralin	-
2	yukitaka	takenaka	7.68	7.25	0.43	auralin	-
3	skye	gormanston	7.97	7.62	0.35	auralin	33u - 36u
4	alissa	montez	7.78	7.46	0.32	auralin	-

	given_name	surname	hba1c_start	hba1c_end	hba1c_change	treatment	dose
0	veronika	jindrová	7.63	7.20	0.43	auralin	41u - 48u
3	skye	gormanston	7.97	7.62	0.35	auralin	33u - 36u
6	sophia	haugen	7.65	7.27	0.38	auralin	37u - 42u
7	eddie	archer	7.89	7.55	0.34	auralin	31u - 38u
9	asia	woźniak	7.76	7.37	0.39	auralin	30u - 36u

	given_name	surname	hba1c_start	hba1c_end	hba1c_change	treatment	dose	dose_start	dose_end
0	veronika	jindrová	7.63	7.20	0.43	auralin	41u - 48u	41u	48u
3	skye	gormanston	7.97	7.62	0.35	auralin	33u - 36u	33u	36u
6	sophia	haugen	7.65	7.27	0.38	auralin	37u - 42u	37u	42u
7	eddie	archer	7.89	7.55	0.34	auralin	31u - 38u	31u	38u
9	asia	woźniak	7.76	7.37	0.39	auralin	30u - 36u	30u	36u
...	...	...	...	...	...	...	...	...	...
688	christopher	woodward	7.51	7.06	0.45	novodra	55u - 51u	55u	51u
690	maret	sultygov	7.67	7.30	0.37	novodra	26u - 23u	26u	23u
694	lixue	hsueh	9.21	8.80	0.41	novodra	22u - 23u	22u	23u
696	jakob	jakobsen	7.96	7.51	0.45	novodra	28u - 26u	28u	26u
698	berta	napolitani	7.68	7.21	0.47	novodra	42u - 44u	42u	44u

	given_name	surname	hba1c_start	hba1c_end	hba1c_change	treatment	dose_start	dose_end
0	veronika	jindrová	7.63	7.20	0.43	auralin	41u	48u
3	skye	gormanston	7.97	7.62	0.35	auralin	33u	36u
6	sophia	haugen	7.65	7.27	0.38	auralin	37u	42u
7	eddie	archer	7.89	7.55	0.34	auralin	31u	38u
9	asia	woźniak	7.76	7.37	0.39	auralin	30u	36u

	given_name	surname	hba1c_start	hba1c_end	hba1c_change	treatment	dose_start	dose_end	adverse_reaction
0	veronika	jindrová	7.63	7.20	0.43	auralin	41u	48u	NaN
1	skye	gormanston	7.97	7.62	0.35	auralin	33u	36u	NaN
2	sophia	haugen	7.65	7.27	0.38	auralin	37u	42u	NaN
3	eddie	archer	7.89	7.55	0.34	auralin	31u	38u	NaN
4	asia	woźniak	7.76	7.37	0.39	auralin	30u	36u	NaN
...	...	...	...	...	...	...	...	...	...
345	christopher	woodward	7.51	7.06	0.45	novodra	55u	51u	nausea
346	maret	sultygov	7.67	7.30	0.37	novodra	26u	23u	NaN
347	lixue	hsueh	9.21	8.80	0.41	novodra	22u	23u	injection site discomfort
348	jakob	jakobsen	7.96	7.51	0.45	novodra	28u	26u	hypoglycemia
349	berta	napolitani	7.68	7.21	0.47	novodra	42u	44u	injection site discomfort

	patient_id	given_name	surname
0	1	Zoe	Wellish
1	2	Pamela	Hill
2	3	Jae	Debord
3	4	Liêm	Phan
4	5	Tim	Neudorf
...	...	...	...
498	499	Mustafa	Lindström
499	500	Ruman	Bisliev
500	501	Jinke	de Keizer
501	502	Chidalu	Onyekaozulu
502	503	Pat	Gersten

	patient_id	given_name	surname
0	1	zoe	Wellish
1	2	pamela	Hill
2	3	jae	Debord
3	4	liêm	Phan
4	5	tim	Neudorf
...	...	...	...
498	499	mustafa	Lindström
499	500	ruman	Bisliev
500	501	jinke	de Keizer
501	502	chidalu	Onyekaozulu
502	503	pat	Gersten

	patient_id	given_name	surname
0	1	zoe	wellish
1	2	pamela	hill
2	3	jae	debord
3	4	liêm	phan
4	5	tim	neudorf
...	...	...	...
498	499	mustafa	lindström
499	500	ruman	bisliev
500	501	jinke	de keizer
501	502	chidalu	onyekaozulu
502	503	pat	gersten

	patient_id	given_name	surname	hba1c_start	hba1c_end	hba1c_change	treatment	dose_start	dose_end	adverse_reaction
0	1	zoe	wellish	7.71	7.30	0.41	novodra	33u	33u	NaN
1	2	pamela	hill	9.53	9.10	0.43	novodra	27u	29u	NaN
2	4	liêm	phan	7.58	7.10	0.48	novodra	43u	48u	NaN
3	6	rafael	costa	7.73	7.34	0.39	auralin	50u	60u	NaN
4	7	mary	adams	7.65	7.26	0.39	novodra	32u	33u	NaN

	patient_id	hba1c_start	hba1c_end	hba1c_change	treatment	dose_start	dose_end	adverse_reaction
0	1	7.71	7.30	0.41	novodra	33u	33u	NaN
1	2	9.53	9.10	0.43	novodra	27u	29u	NaN
2	4	7.58	7.10	0.48	novodra	43u	48u	NaN
3	6	7.73	7.34	0.39	auralin	50u	60u	NaN
4	7	7.65	7.26	0.39	novodra	32u	33u	NaN
...	...	...	...	...	...	...	...	...
344	495	8.90	8.59	0.31	novodra	26u	24u	NaN
345	497	7.71	7.35	0.36	auralin	35u	38u	NaN
346	499	7.92	7.60	0.32	novodra	35u	33u	NaN
347	500	7.72	7.39	0.33	auralin	46u	53u	NaN
348	502	7.54	7.27	0.27	novodra	42u	41u	NaN

	patient_id	assigned_sex	given_name	surname	address	city	state	zip_code	country	birthdate	weight	height	bmi	phone_number	email
4	5	male	Tim	Neudorf	1428 Turkey Pen Lane	Dothan	AL	36303	United States	2/18/1928	192.3	72	26.1	334-515-7487	TimNeudorf@cuvox.de

	patient_id	assigned_sex	given_name	surname	address	city	state	zip_code	country	birthdate	weight	height	bmi	phone_number	email
8	9	male	David	Gustafsson	1790 Nutter Street	Kansas City	MO	64105	United States	3/6/1937	163.9	66	26.5	816-265-9578	DavidGustafsson@armyspy.com

	patient_id	assigned_sex	given_name	surname	address	city	state	zip_code	country	birthdate	weight	height	bmi	phone_number	email
24	25	male	Jakob	Jakobsen	648 Old Dear Lane	Port Jervis	NY	12771	United States	1985-08-01	155.8	67	24.4	18458587707	JakobCJakobsen@einrot.com
432	433	female	Karen	Jakobsen	1690 Fannie Street	Houston	TX	77020	United States	1962-11-25	185.2	67	29.0	19792030438	KarenJakobsen@jourrapide.com

	patient_id	assigned_sex	given_name	surname	address	city	state	zip_code	country	birthdate	weight	height	bmi	phone_number	email
97	98	male	Patrick	Gersten	2778 North Avenue	Burr	NE	68324	United States	1954-05-03	138.2	71	19.3	14028484923	PatrickGersten@rhyta.com

	patient_id	assigned_sex	given_name	surname	address	city	state	zip_code	country	birthdate	weight	height	bmi	phone_number	email
131	132	female	Sandra	Taylor	2476 Fulton Street	Rainelle	WV	25962	United States	1960-10-23	206.1	64	35.4	13044382648	SandraCTaylor@dayrep.com
426	427	male	Rogelio	Taylor	4064 Marigold Lane	Miami	FL	33179	United States	1992-09-02	186.6	69	27.6	13054346299	RogelioJTaylor@teleworm.us

¶ Oral Insulin Phase II Clinical Trial

¶ Table of contents

¶ Our dataset: Auralin and Novodra Trials

¶ Gathering Data

¶ Assessing Data

¶ A) Visual Assessment: Acquaint Yourself

¶ Quality

¶ patients table

¶ treatments table

¶ adverse_reactions table

¶ Data Quality Dimensions

¶ B) Programmatic Assessment

¶ Try .head and .tail on the patients table.

¶ Try .sample on the treatments table.

¶ Try .info on the all tables.

¶ Try .describe on the patients & treatment tables.

¶ Try .value_counts on the adverse_reaction column of the adverse_reactions table.

¶ Try selecting the records in the patients table for patients that are from the city New York.

¶ How many patients in the patients table are from the city New York? Hint: len() might come in handy.

¶ Check for missing data in address column with .isnull() which returns a list of rows with empty data

¶ Quality

¶ patients table

¶ treatments table

¶ adverse_reactions table

¶ Further Discussion: Category vs. Object

¶ Try .value_counts on the surname and address columns of the patients table.

¶ Try .duplicated on the address column of the patients table.

¶ Try .sort_values on the weight column of the patients table.

¶ Try .isnull on the auraline and novodra columns of the treatments table.

¶ Quality

¶ patients table

¶ treatments table

¶ adverse_reactions table

¶ Tidiness

¶ Where is the Missing Data?

¶ How Many Tables?

¶ Find duplicate column names in the three tables using pandas, use the following code:

¶ Cleaning Data

¶ A) Creating Dataframe Copies

¶ B) Cleaning Missing Data

¶ treatments: Missing records (280 instead of 350)

¶ Define

¶ Code

¶ Test

¶ treatments: Missing HbA1c changes and Inaccurate HbA1c changes (leading 4s mistaken as 9s)

¶ Define

¶ Code

¶ Test

¶ C) Cleaning Tidiness Issues

¶ Contact column in patients table contains two variables: phone number and email

¶ Define

¶ Code

¶ Test

¶ Three variables in two columns in treatments table (treatment, start dose and end dose)

¶ Define

¶ Code

¶ Test

¶ Adverse reaction should be part of the treatments table

¶ Define

¶ Code

¶ Test

¶ Given name and surname columns in patients table duplicated in treatments and adverse_reactions tables and Lowercase given names and surnames

¶ Define

¶ Code

¶ Test

¶ D) Claening Quality Issues

¶ Zip code is a float not a string and Zip code has four digits sometimes

¶ Define

¶ Code

¶ Test

¶ Tim Neudorf height is 27 in instead of 72 in

¶ Define

¶ Code

¶ Test

¶ Full state names sometimes, abbreviations other times

¶ Define

¶ Code

¶ Test

¶ Dsvid Gustafsson

¶ Define

¶ `patients` table

¶ `treatments` table

¶ `adverse_reactions` table

¶ Try `.head` and `.tail` on the `patients` table.

¶ Try `.sample` on the `treatments` table.

¶ Try `.info` on the all tables.

¶ Try `.describe` on the `patients` & `treatment` tables.

¶ Try `.value_counts` on the adverse_reaction column of the `adverse_reactions` table.

¶ Try selecting the records in the `patients` table for patients that are from the city New York.

¶ `patients` table

¶ `treatments` table

¶ `adverse_reactions` table

¶ Try `.value_counts` on the surname and address columns of the `patients` table.

¶ Try `.duplicated` on the address column of the `patients` table.

¶ Try `.sort_values` on the weight column of the `patients` table.

¶ Try `.isnull` on the auraline and novodra columns of the `treatments` table.

¶ `patients` table

¶ `treatments` table

¶ `adverse_reactions` table

¶ `treatments`: Missing records (280 instead of 350)

¶ `treatments`: Missing HbA1c changes and Inaccurate HbA1c changes (leading 4s mistaken as 9s)

¶ Contact column in `patients` table contains two variables: phone number and email

¶ Three variables in two columns in `treatments` table (treatment, start dose and end dose)

¶ Adverse reaction should be part of the `treatments` table

¶ Given name and surname columns in `patients` table duplicated in `treatments` and `adverse_reactions` tables and Lowercase given names and surnames